lxc / cgmanager Goto Github PK
View Code? Open in Web Editor NEWControl Group manager
Home Page: https://linuxcontainers.org/cgmanager
License: GNU Lesser General Public License v2.1
Control Group manager
Home Page: https://linuxcontainers.org/cgmanager
License: GNU Lesser General Public License v2.1
# Deprecated Please note that the CGManager project has been deprecated in favor of using the kernel's CGroup Namespace or lxcfs' simulated cgroupfs. See https://s3hh.wordpress.com/2016/06/18/whither-cgmanager/ for details. === Intro === This is a motivation, description and explanation of the cgmanager design. The original design RFC was described here: http://lwn.net/Articles/575672/ http://lwn.net/Articles/575683/ And much of it still holds (and is cut-pasted, though edited, here). === Cgmanager Design === One of the driving goals is to enable nested lxc as simply and safely as possible. If this project is a success, then a large chunk of code can be removed from lxc. I'm considering this project a part of the larger lxc project, but given how central it is to systems management that doesn't mean that I'll consider anyone else's needs as less important than our own. This document consists of two parts. The first describes how I intend the daemon (cgmanager) to be structured and how it will enforce the safety requirements. The second describes the commands which clients will be able to send to the manager. The list of controller keys which can be set is very incomplete at this point, serving mainly to show the approach I was thinking of taking. === Summary === Each 'host' (identified by a separate instance of the linux kernel) has exactly one running daemon to manage control groups. This daemon answers cgroup management requests over a dbus socket, located at /sys/fs/cgroup/cgmanager/sock. The /sys/fs/cgroup/cgmanager directory can be bind-mounted into various containers, so that one daemon can support the whole system. (Bind-mounting the directory rather than the socket itself allows a container to proceed if the cgmanager is restarted, creating a new socket.) Outline: . A single manager, cgmanager, is started on the host, very early during boot. It has very few dependencies, and requires only /proc, /run, and /sys to be mounted, with /etc ro. It mounts the cgroup hierarchies in a private namespace and set defaults for clone_children and use_hierarchy. It opens a Unix socket at /sys/fs/cgroup/cgmanager/sock. . A client (requestor 'r') can make cgroup requests over /sys/fs/cgroup/cgmanager/sock using dbus calls. Detailed privilege requirements for r are listed below. . The client request will pertain an existing or new cgroup A. r's privilege over the cgroup must be checked. r is said to have privilege over A if A is owned by r's uid, or if A's owner is mapped into r's user namespace, and r is root in that user namespace. . The client request may pertain a victim task v, which may be moved to a new cgroup. In that case r's privilege over both the cgroup and v must be checked. r is said to have privilege over v if v is mapped in r's pid namespace, v's uid is mapped into r's user ns, and r is root in its userns. Or if r and v have the same uid and v is mapped in r's pid namespace. . r's credentials will be taken from socket's peercred, ensuring that pid and uid are translated. . A request to chown a cgroup requires a uid U and gid G. . If r is in the same pid and user namespaces as the cgmanager, then v, U and G can be passed as integer arguments over the D-Bus requests. . If r is not in the same namespaces as the cgmanager, then V, U and G must be passed as SCM_CREDENTIALs so that the cgmanager receives the translated global pid/uid/gid. Since D-Bus does not support sending SCM_CREDENTIALs as part of a D-Bus message, the D-Bus arguments include a file descriptor. The SCM_CREDENTIALs are sent over the file descriptor after the D-Bus transaction completes, and the final result is sent over the same file descriptor. . It is desirable that all transactions can be accomplished with simple D-Bus transactions. Therefore a cgroup manager proxy (cgproxy) is provided. This will move /sys/fs/cgroup/cgmanager to /sys/fs/cgroup/cgmanager.lower, then serve as a proxy translating D-Bus requests received on /sys/fs/cgroup/cgmanager/sock into SCM-enhanced D-Bus requests on /sys/fs/cgmanager/cgmanager.lower/sock. . In plain D-Bus transactions, the requestor r's credentials are read from the socket. . In SCM-enhanced D-Bus transactions, the proxy p's credentials are read from the socket. The requestor's credential is sent as an SCM_CREDENTIAL. Privilege requirements by action: * Requestor of an action (r) over a socket may only make changes to cgroups over which it has privilege. * Requestors may be limited to a certain #/depth of cgroups (to limit memory usage). This is not yet implemented. * Cgroup hierarchy is responsible for resource limits. To this end, a request to chown cgroup A to uid U will only chown the directory itself (allowing child cgroup creation) and the tasks and cgroup.procs file. * A requestor must either be uid 0 in its userns with victim mapped ito its userns, or the same uid and in same/ancestor pidns as the victim * If r requests creation of cgroup '/x', /x will be interpreted as relative to r's cgroup. r cannot make changes to cgroups not under its own current cgroup. * Root in the cgmanager's pid namespace may 'escape' to the cgmanager's cgroup with a special MovePidAbs command. * A proxy may move a task over which it has privilege to the proxy's own cgroup. This allows the proxy to mimic the cgmanager's special root-may-escape semantics in its own container. * If r requests creation of cgroup '/x', it must have write access to its own cgroup. * if r requests setting a limit under /x, then . either r must be root in its own userns, and UID(/x) be mapped into its userns, or else UID(r) == UID(/x) . /x must not be / (not strictly necessary, all users know to ensure an extra cgroup layer above '/') . setns(UIDNS(r)) would not work, due to in-kernel capable() checks which won't be satisfied. Therefore we'll need to do privilege checks ourselves, then perform the write as the host root user. (see devices.allow/deny). Further we need to support older kernels which don't support setns for pid. Types of requests: * r requests creating cgroup A'/A . lmctfy/cli/commands/create.cc . Verify that UID(r) mapped to 0 in r's userns . R=cgroup_of(r) . Verify that UID(R) is mapped into r's userns . Create R/A'/A . chown R/A'/A to UID(r) * r requests to move task x to cgroup A. . lmctfy/cli/commands/enter.cc . r must send PID(x) as ancillary message . Verify that UID(r) mapped to 0 in r's userns, and UID(x) is mapped into that userns (is it safe to allow if UID(x) == UID(r))? . R=cgroup_of(r) . Verify that R/A is owned by UID(r) or UID(x)? (not sure that's needed) . echo PID(x) >> /R/A/tasks * r requests chown of cgroup A to uid X . X is passed in ancillary message * ensures it is valid in r's userns * maps the userid to host for us . Verify that UID(r) mapped to 0 in r's userns . R=cgroup_of(r) . Chown R/A to X * r requests cgroup A's 'property=value' . Verify that either * A != '' * UID(r) == 0 on host In other words, r in a userns may not set root cgroup settings. . Verify that UID(r) mapped to 0 in r's userns . R=cgroup_of(r) . Set property=value for R/A * Expect kernel to guarantee hierarchical constraints * r requests deletion of cgroup A . lmctfy/cli/commands/destroy.cc (without -f) . same requirements as setting 'property=value' * r requests purge of cgroup A . lmctfy/cli/commands/destroy.cc (with -f) . same requirements as setting 'property=value' Long-term we will want the cgroup manager to become more intelligent - to place its own limits on clients, to address cpu and device hotplug, etc. Since we will not be doing that in the first prototype, the daemon will not keep any state about the clients. === Another look at the safety of requests === Notes: 1. In a plain D-Bus call, the proxy is the requestor. 2. If a client does an SCM call to the cgmanager socket, then the proxy is the requestor. 3. In any call over a proxy, the proxy won't be able to make changes outside its own cgroups. If it misbehaves, damage is contained so it only damages itself.. 4. Chained proxying is not supported. If a proxy gets a request where proxy != requestor, the call is rejected. 5. The identity of the proxy (which may be the requestor) cannot be forged; it is taken from the socket credential. A more privileged user must not allow a less privileged task to have access to the opened DBus socket, as the credential will be that at the time of connect(). On newer kernels, cgmanager can tell whether a proxy or requestor is in the same namespace as itself. On older kernels, it cannot. . for Create, this is ok. We have the proxy's real pid and can constrain create under its cgroup. . for getPidCgroup, we can ensure that only results under the parent's cgroup are returned. we can NOT ensure that results will make sense for plain DBus calls, as we cannot guarantee that proxy is in the same ns as cgmanager. However, this is not unsafe. When we can and do detect that p is in a different pid namespace, then we reject the call, because the result cannot be sensible. . for chmod: We constrain under proxy's cgroup, so this is safe. . for chown: on older kernel we cannot guarantee that the uid/gid make sense on the host; However . root on the root host - no translation necessary . root in a non-user-ns container: no translation necessary . root in a unprivileged container: won't have privilege to do any chown without going through a proxy. Therefore rejecting calls from another namespace is not necessary. The worst it will do is to give -EPERM for calls which for root in a unprivileged container otherwise would be allowed to do. . movepid: . root on root host - fine . root in a non-user-ns container: we can only ensure that the victim be under the proxy's cgroup. If that is the case, then root (which is also root on the host) is allowed to move the task. When we can and do detect a different pid namespace, then we reject the call because the results cannot make sense. . MovePidAbs: On an older kernel, or if the task is in a different namespace, then this requires a proxy. The cgmanager will only allow escaping up to the level of the proxy. . root on root host - allowed to escape. . root in a non-user-ns container: allowed to escape up to the proxy's level. If the host misconfigures the container so that the host's proxy is in the container, then root can escape completely. . if root tries to mimick a proxy, then it can only escape to the proxy's level - it's own. So it cannot escape at all.
I am using a Ubuntu 14.04.2, it provides cgmanager 0.24 but the tag is missing from the repository. It is probably this commit 9c54196
The order of declarations in the configure.ac script leads to the generated configure script trying to execute programs in my home directory due to it using an as-yet undeclared shell variable ac_aux_dir. This is the root cause of the complaint issued by the configure script during execution along the lines of:
/bin/sh: /home/stewart/missing: No such file or directory
configure: WARNING: 'missing' script is too old or missing
The generated script does: am_aux_dir=cd $ac_aux_dir && pwd
but since ac_aux_dir is not defined, this ends up pointing at my home directory. It then attempts to run 'missing' from that directory.
One possible simple fix might be to move the AM_INIT_AUTOMAKE to just before AC_GNU_SOURCE. I'm not sure if this is the best fix (but it certainly solves the problem for me)
People really want a libcgroup-style boot time configuration. Come up with a configuration file format and implement the setup-at-startup.
Nov 19 00:49:43 xor systemd[1]: Started Cgroup management daemon.
Nov 19 00:49:43 xor cgmanager[9666]: cgmanager: Error mounting unified: No such file or directory
Nov 19 00:49:43 xor cgmanager[9666]: cgmanager: failed to collect cgroup subsystems
Nov 19 00:49:43 xor systemd[1]: cgmanager.service: Main process exited, code=exited, status=1/FAILURE
Nov 19 00:49:43 xor systemd[1]: cgmanager.service: Unit entered failed state.
Nov 19 00:49:43 xor systemd[1]: cgmanager.service: Failed with result 'exit-code'.
Nov 19 00:49:43 xor systemd[1]: cgmanager.service: Service hold-off time over, scheduling restart.
Nov 19 00:49:43 xor systemd[1]: Stopped Cgroup management daemon.
Not really sure what this is about. I'm on kernel 4.8.8 and systemd ver 232 (if that's relevant)
Sorry, i'm quite new at this, I'm surely make a mistake
But I can not find configure script nor autogen.sh, as I use to in other packages
So i cant ./configure the cgmanager as the INSTALL tell me to do
Do you have a suggestion?
I would like to point out that an identifier like "__frontend_h
" does not fit to the expected naming convention of the C language standard.
Would you like to adjust your selection for unique names?
When no release agent is present, if you prune a cgroup which has tasks, then when those tasks die the cgroup is not removed.
Unfortunately the cgroup.populated file which would be ideal for watching for cgroups which become empty is only available in the default unified hierarchy. So we will need to do something different like periodically go over a list of to-be-removed cgroups and check whether they are now empty.
Found weird behaviour: when cgmanager is running, memory restrictions for containers don't work.
Any changes in mounting cgroups to container influence this behaviour, I try lxc.mount.auto
with cgroup:mixed
, cgroup:ro
and cgroup:rw
. With any value, it loose restrictions. Containers without systemd (tested with ubuntu 10.04) not affected.
After some debug I found that problem is unsharing cgmanagers mounts (fs.c: setup_cgroup_mounts()).
lxc.cgroup.memory.limit_in_bytes = 4G
cat /sys/fs/cgroup/memory/lxc/container/memory.limit_in_bytes
, it would be 4G as we set.Host OS: Arch Linux (kernel 3.19.3-3-apparmor)
Guest OS: Arch Linux
lxc version: 1.0.6.61.ga44cafd-1 (also try 1.1.2.9.g17f48b9-1)
cgmanager version: 0.37-1
full lxc config:
# Config for container 'atest'
lxc.utsname = atest
lxc.rootfs = /var/lib/heaver/instances/abox.atest
lxc.aa_profile = lxc-container
lxc.console = /var/lib/lxc/atest/console
# mounts
lxc.mount.entry = /var/lib/heaver/instances/abox.atest/binds/dev /var/lib/heaver/instances/abox.atest/dev none bind 0 0
lxc.mount.entry = devpts /var/lib/heaver/instances/abox.atest/dev/pts devpts newinstance,ptmxmode=0666,nosuid,noexec 0 0
lxc.mount.auto = cgroup:mixed proc:mixed sys:rw
# net devices
lxc.network.type = veth
lxc.network.link = br0
lxc.network.name = eth0
lxc.network.flags = up
lxc.network.hwaddr = 2:85:c0:a8:97:35
lxc.network.ipv4 = 192.168.151.53/18
lxc.network.ipv4.gateway = 192.168.128.1
# limits
lxc.cgroup.cpu.cfs_period_us = 100000
lxc.cgroup.cpu.cfs_quota_us = 400000
lxc.cgroup.cpuset.cpus = 1,2,3
lxc.cgroup.memory.limit_in_bytes = 4G
# raw lxc config values
lxc.tty = 0
lxc.kmsg = 0
lxc.cgroup.devices.deny = a
lxc.cap.drop = sys_time
lxc.pts = 1024
lxc.cgroup.memory.oom_control = 1
lxc.cgroup.devices.allow = c 1:3 rwm
lxc.cgroup.devices.allow = c 1:5 rwm
lxc.cgroup.devices.allow = c 5:0 rwm
lxc.cgroup.devices.allow = c 4:0 rwm
lxc.cgroup.devices.allow = c 1:8 rwm
lxc.cgroup.devices.allow = c 1:9 rwm
lxc.cgroup.devices.allow = c 136:* rwm
lxc.cgroup.devices.allow = c 5:2 rwm
lxc.cgroup.devices.allow = c 10:200 rwm
lxc.cgroup.devices.allow = c 10:229 rwm
lxc.cgroup.devices.allow = c 254:0 rwm
I am trying cgmanager and lxcfs on archlinuxarm using this systemd units:
/usr/lib/systemd/system/cgmanager.service
[Unit]
Description=Cgroup management daemon
ConditionVirtualization=!container
Before=cgproxy.service
After=local-fs.target
[Service]
Type=simple
ExecStart=/usr/bin/cgmanager -m name=systemd
KillMode=process
Restart=on-failure
[Install]
WantedBy=multi-user.target
/usr/lib/systemd/system/lxcfs.service
[Unit]
Description=FUSE filesystem for LXC
ConditionVirtualization=!container
Before=lxc.service
After=cgmanager.service
Requires=cgmanager.service
[Service]
ExecStart=/usr/bin/lxcfs -f -s -o allow_other /var/lib/lxcfs
KillMode=process
Restart=on-failure
ExecStopPost=-/bin/fusermount -u /var/lib/lxcfs
[Install]
WantedBy=multi-user.target
But when I start lxcfs.service, cgmaner.service report next error:
Could not get password database information for UID of current process: User โ???โ unknown or no memory to allocate password entry
I don't get cgmanager started in daemon mode.
cgmanager is used by consolekit2 to track sessions.
/run/cgmanager.pid is missing, yet cgmanager is running.as process.
This strangely only occurs on real hardware, and it somehows works out in vbox.
If I start on real hardware from livecd, cgmanager starts properly with pid, after install, the pid goes missing.
OS: Arch/Manjaro
Kernel: 4.1
init: openrc + eudev
cpu: i3 sandybridge
cgmanager-0.37, I also tried 0.38, same problem.
strace -f -o/tmp/strace.cgm.out /usr/bin/cgmanager -m name=systemd --debug > /tmp/cgm.out 2>&1 &
If /
fs is mounted with --make-rprivate
flag (it's needed for lxc), cmganager holds mounts. Problem appears in cgmanager 0.35. In this version this happends only if cgmanager started after mounting fs (not shure how it works in previous versions). Here steps to reproduce problem:
# mount --make-rprivate /
# dd if=/dev/zero of=/tmp/test-drive bs=1M count=100
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 0.0506839 s, 2.1 GB/s
# mkfs.ext4 /tmp/test-drive
mke2fs 1.42.11 (09-Jul-2014)
Discarding device blocks: done
Creating filesystem with 102400 1k blocks and 25688 inodes
Filesystem UUID: bf07cf73-65b2-4bcf-bce3-f005e83e1c93
Superblock backups stored on blocks:
8193, 24577, 40961, 57345, 73729
Allocating group tables: done
Writing inode tables: done
Creating journal (4096 blocks): done
Writing superblocks and filesystem accounting information: don
# cgmanager&
[1] 2038
# mount /tmp/test-drive /mnt
# umount /mnt
# grep loop /proc/*/mounts | wc -l
0
# kill 2038
# mount /tmp/test-drive /mnt
# cgmanager&
[1] 2157
# umount /mnt
# grep loop /proc/*/mounts | wc -l
1
# grep loop /proc/*/mounts
/proc/2157/mounts:/dev/loop0 /mnt ext4 rw,relatime,data=ordered 0 0
# ps 2157
PID TTY STAT TIME COMMAND
2157 pts/0 SN 0:00 cgmanager
Last version (0.41) was released more than 3 years ago.
Since then cgmanager gained important fixes, related to
compatibility with 4.5+ kernels and performing operations on cgroups v2.
Some new functionality was also added since, for example PAM_CGM
module was made much more configurable.
Due to the above, it would be good to make a new cgmanager release.
it should be possible to say a cgroup should have "all" or "50%" of memory or cpus. When a cpu hotplug occurs, any cgroup which has all or a % of cpus should get its cpuset recalculated.
Cgmanager on centos 7 doesn't work . Has anyone tried compiling it ?
cgmanager wants to install parts of itself in /usr/share, even when ./configure --prefix=... flag tells it something else:
$ ./configure --prefix=$PWD/_install
...
$ make
$ make install
...
/nix/store/wc472nw0kyw0iwgl6352ii5czxd97js2-coreutils-8.23/bin/mkdir -p '/home/bfo/code/forks/cgmanager/_install/bin'
/nix/store/r5sxfcwq9324xvcd1z312kb9kkddqvld-bash-4.3-p30/bin/bash ./libtool --mode=install /nix/store/wc472nw0kyw0iwgl6352ii5czxd97js2-coreutils-8.23/bin/install -c cgm '/home/bfo/code/forks/cgmanager/_install/bin'
libtool: install: /nix/store/wc472nw0kyw0iwgl6352ii5czxd97js2-coreutils-8.23/bin/install -c .libs/cgm /home/bfo/code/forks/cgmanager/_install/bin/cgm
/nix/store/wc472nw0kyw0iwgl6352ii5czxd97js2-coreutils-8.23/bin/mkdir -p /usr/share/cgmanager/tests
/nix/store/wc472nw0kyw0iwgl6352ii5czxd97js2-coreutils-8.23/bin/mkdir: cannot create directory '/usr/share': Permission denied
The last two cgmanager releases (0.38, 0.39) fail configuration if pam is not found, which is the case for (at least) Slackware. I am able to remove the pam related stuff from configure.ac & Makefile.am, after which bootstrap.sh/configure/make all works as before. However I'm unsure whether this is safe - I'd feel better if there were a --without-pam configuraion option, or even allowing --with-pamdir=none. Is this possible please?
Thanks,
chris
It seems that there is no way to move all the threads of a process to a cgroup when not in unified mode since cgmanager write in that case the PID (that could be a TGID) to tasks
and not to cgroup.procs
. In unified mode the PID is written to cgroup.procs
.
Could it be added a way in non-unified mode to write to cgroup.procs
or to make it the default?
Just for easy reference, the relevant part of cgroups.txt:
- tasks: list of tasks (by PID) attached to that cgroup. This list
is not guaranteed to be sorted. Writing a thread ID into this file
moves the thread into this cgroup.- cgroup.procs: list of thread group IDs in the cgroup. This list is
not guaranteed to be sorted or free of duplicate TGIDs, and userspace
should sort/uniquify the list if this property is required.
Writing a thread group ID into this file moves all threads in that
group into this cgroup.
Using cgmanager-0.39, this occurs:
root@liberty:
# cgmanager --daemon name=cgmanager# cgmanager: Unable to write pid file: No such file or directory
root@liberty:
I found #11 but that appears to have been fixed after 0.37, and I confirmed that the lines removed in that commit are indeed not present in 0.39, so it's unclear what's happening here. The cgmanager daemon is starting up:
root@liberty:~# ps aux | grep [c]gmanager
root 22472 0.0 0.0 15312 140 ? S 23:41 0:00 cgmanager --daemon name=cgmanager
but no pidfile is showing up in /run.
I've searched through the 0.39 code and I don't even see that a pidfile location is defined anywhere, so I'm guessing that it somehow comes from libnih - however, it looks like that would be done with nih_main_set_pidfile() and that's not present anywhere in cgmanager code either, best I can tell.
What am I missing here?
Reason are explained here: https://s3hh.wordpress.com/2016/06/18/whither-cgmanager/ ...so a link to this post would be fine.
Hi,
I just installed the package cgmanager 0.41 from ppa:ubuntu-lxc/cgmanager-stable
.
However, when I run cgm --version
, it says 0.29
.
I beleive this is due to cgm.c, line 493:
void print_version(void)
{
printf("0.29");
exit(0);
}
I tried to run lxcfs v0.10
and cgmanager v0.39
with the unified cgroup hierarchy with systemd 226
by passing systemd.unified_cgroup_hierarchy=1
to systemd
. This doesn't seem to work. The assertions in cgmanager.c
: nih_assert(server != NULL);
fails. This might have something to do with how systemd 226
starts the dbus
session but I am not sure:
"systemd now supports the concept of user buses replacing
session buses, if used with dbus-1.10 (and enabled via dbus
--enable-user-session). It previously only supported this on
kdbus-enabled systems, and this release expands this to
'dbus-daemon' systems." (http://lists.freedesktop.org/archives/systemd-devel/2015-September/034177.html)
Hi,
Upgrading to cgmanager 0.41 rendered in breaking openrc systems that use unprivileged lxcs. I build upstream code using --with-init-script=openrc, and I execute the command line like:
`# ./cgmanager -m name=systemd
cgmanager: Error mounting unified: No such file or directory
cgmanager: failed to collect cgroup subsystems`
Dropping the systemd piece doesn't yield any difference:
`# ./cgmanager
cgmanager: Error mounting unified: No such file or directory
cgmanager: failed to collect cgroup subsystems`
Thanks!
I, and others, have been using our own init scripts on Slackware for some time. Since I noticed that they can be supported in cgmanager code base itself, I have made some init scripts and autotools changes to incorporate them. Please refer to pull request #15.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.