projectatomic / atomic-site Goto Github PK
View Code? Open in Web Editor NEWSource code for projectatomic.io
Home Page: projectatomic.io
License: Other
Source code for projectatomic.io
Home Page: projectatomic.io
License: Other
Docker has added a new feature to allow alternate logging drivers.
Currently in docker-1.6 they support json-file, which is the old default, syslog or even none.
If you ran your docker daemon with the --log-driver=syslog
option, any output from a container would go directly to system syslog daemon, and usually end up in /var/log/messages.
I added a patch to support journald as a logging driver, and it was recently merged.
In docker-1.7 you will have the option to use --log-driver=journald
on either your docker daemon to cause all of your container logging to go directly to the journal, or you could use the --log-driver=journald on a docker run/create line, so you could get individual containers running with different logging drivers.
To test this I setup my docker daemon to run with the following options.
docker -d --selinux-enabled --log-driver=journald
Now I run a simple container.
docker run fedora echo "Dan Walsh was here"
Then I look up the container ID for the newly created container, and I can use journalctl to examine the content.
journalctl MESSAGE_ID=cdf02c627e27
-- Logs begin at Mon 2015-04-06 16:06:42 EDT, end at Fri 2015-04-24 08:40:38 EDT. --
Apr 20 15:06:39 dhcp-10-19-62-196.boston.devel.redhat.com docker[27792]: Dan Walsh was here
Currently docker logs command only works with json-file backend. But I hope to eventually get the docker daemon to communicate with the journald to get this information. Of course if anyone wants to take a stab at this, I would welcome your effort.
Although a feed is generated, it's not found in the website's HTML code, making it harder to add to RSS readers.
There doesn't appear to be any documentation at http://www.projectatomic.io/docs/gettingstarted/ or http://www.projectatomic.io/docs/cockpit/ or elsewhere with information about how to initially log in to cockpit.
Page http://www.projectatomic.io/docs/atomicapp/Providers.asciidoc does not exist.
From page: http://www.projectatomic.io/docs/atomicapp/ under "Provider description".
Taking a look at http://www.projectatomic.io/docs/ I cant find any link to http://www.projectatomic.io/docs/nulecule/
need to add license info to all README files and also a document in each repo
Since you said you wanted to work on this, I'm assigning to you. Lemme know if you want to toss it back my way.
Summarizing a couple weeks of emails:
The team writing Container Best Practices wants to enable an automated CI push for docs to show up at /docs/container-best-practices.html. This is a reasonable request, so we'd like to accomodate it. Here's my idea on getting it to work:
The problem is that I'm a bit stuck on how to accomplish some of the above steps, due to my unfamiliarity with middleman and OpenShift Online v2 permissions. Help?
I have Fedora 22 Atomic mostly working as a guest on Windows 8.1 Client Hyper-V. If there's any interest, I can contribute documentation on how to do it.
The way it is now:
If I type a subdir without the "www", I get redirected to the projectatomic home page:
The way it should be:
I should get redirected to the page indicated.
This isn't a big deal, but then it might not be that hard to fix, either.
We recently add support for no_new_priv to docker.
It was also earlier added to runc and the Open Container Initiative
spec.
This security feature was added to the Linux kernel back in 2012. A process can set the no_new_priv bit
in the kernel that persists across fork, clone and execve. The no_new_priv bit ensures that the process or its
children processes do not gain any additional privileges. A process isn't allowed to unset the no_new_priv bit
once it is set. Process with no_new_privs are not allows to change uid/gid or gain any other capabilities. Even
if the process executes setuid binaries or executables with file capability bits set. no_new_priv also, prevents LSMs like SELinux from transitioning to process labels that have access not allowed to the current process. This means an SELinux process is only allowed to transition to a process type with less privileges.
For more details see kernel documentation
Here is an example showcasing how it helps in docker:
Create a setuid binary that displays the effective uid
[$ dockerfiles]# cat testnnp.c
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
int main(int argc, char *argv[])
{
printf("Effective uid: %d\n", geteuid());
return 0;
}
[$ dockerfiles]# make testnnp
cc testnnp.c -o testnnp
Now we will add the binary to a docker image
[$ dockerfiles]# cat Dockerfile
FROM fedora:latest
ADD testnnp /root/testnnp
RUN chmod +s /root/testnnp
ENTRYPOINT /root/testnnp
[$ dockerfiles]# docker build -t testnnp .
Sending build context to Docker daemon 12.29 kB
Step 1 : FROM fedora:latest
---> 760a896a323f
Step 2 : ADD testnnp /root/testnnp
---> 6c700f277948
Removing intermediate container 0981144fe404
Step 3 : RUN chmod +s /root/testnnp
---> Running in c1215bfbe825
---> f1f07d05a691
Removing intermediate container c1215bfbe825
Step 4 : ENTRYPOINT /root/testnnp
---> Running in 5a4d324d54fa
---> 44f767c67e30
Removing intermediate container 5a4d324d54fa
Successfully built 44f767c67e30
Now we will create and run a container without no-new-privileges.
[$ dockerfiles]# docker run -it --rm --user=1000 testnnp
Effective uid: 0
This shows that even though you requested a non privileged user (UID=1000) to run your container,
that user would be able to become root by executing the setuid app on the container image.
Running with no-new-privileges prevents the uid transition while running a setuid binary
[$ dockerfiles]# docker run -it --rm --user=1000 --security-opt=no-new-privileges testnnp
Effective uid: 1000
As you can see above the container process is still running as UID=1000, meaning that even if the
image has dangerous code in it, we can stil prevent the user from escalating privs.
If you want to allow users to run images as a non privilege UID, in most cases you would want to
prevent them from becoming root. no_new_privileges is a great tool for guaranteeing this.
The kubernetes guestbook example was updated with the following a week ago : "Rewrite guestbook example to use kube-dns instead of host:port"
kubernetes/kubernetes@4679a37
This means that currently, following the getting started guide results in a non-functional demo system. To make it work, I had to create a replication controller and service by editing the templated yaml.in files from kubernetes/cluster/addons/dns/, then modify KUBELET_ARGS to include cluster_dns and cluster_domain arguments.
This post will walk you through the steps to running a runc container using OCI configuration.
We will walk through two examples. One for running a fedora container and another for running a redis container.
There are 3 steps to running a runc container:
ocitools are a bunch of utilities to work with the OCI specification. We are going to make use of the generate utility. It helps generate a OCI configuration for runc with a command line that is similar to docker run.
$ export GOPATH=/some/dir
$ go get github.com/opencontainers/ocitools
$ cd $GOPATH/src/github.com/opencontainers/ocitools
$ make && make install
Note: This will be available as a package on Fedora/RHEL as a package soon just like runc.
There are various ways to construct a rootfs. Ultimately, it is just a directory with a bunch of file that will be visible
and used inside your container. We will use dnf to construct a rootfs.
First we create a directory for our container:
$ mkdir /runc/containers/fedora/rootfs
$ cd /runc/containers/fedora
$ dnf install --installroot /runc/containers/fedora/rootfs bash coreutils procps-ng iptools
Next we generate a configuration using ocitools.
$ ocitools generate --args bash
This create a config.json in the current directory. By passing --args
we change the command to be run to bash
from the default.
And finally we start a container.
$ runc start fedora
bash-4.3# ps -ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 18:02 ? 00:00:00 bash
root 7 1 0 18:03 ? 00:00:00 ps -ef
bash-4.3# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
bash-4.3# ls -l
total 48
lrwxrwxrwx. 1 root root 7 Feb 3 22:10 bin -> usr/bin
dr-xr-xr-x. 2 root root 4096 Feb 3 22:10 boot
drwxr-xr-x. 5 root root 360 Apr 8 18:02 dev
drwxr-xr-x. 29 root root 4096 Apr 8 17:45 etc
drwxr-xr-x. 2 root root 4096 Feb 3 22:10 home
lrwxrwxrwx. 1 root root 7 Feb 3 22:10 lib -> usr/lib
lrwxrwxrwx. 1 root root 9 Feb 3 22:10 lib64 -> usr/lib64
drwxr-xr-x. 2 root root 4096 Feb 3 22:10 media
drwxr-xr-x. 2 root root 4096 Feb 3 22:10 mnt
drwxr-xr-x. 2 root root 4096 Feb 3 22:10 opt
dr-xr-xr-x. 287 root root 0 Apr 8 18:02 proc
dr-xr-x---. 2 root root 4096 Apr 8 17:43 root
drwxr-xr-x. 5 root root 4096 Apr 8 17:45 run
lrwxrwxrwx. 1 root root 8 Feb 3 22:10 sbin -> usr/sbin
drwxr-xr-x. 2 root root 4096 Feb 3 22:10 srv
dr-xr-xr-x. 13 root root 0 Apr 7 22:19 sys
drwxrwxrwt. 2 root root 4096 Apr 8 17:45 tmp
drwxr-xr-x. 12 root root 4096 Apr 8 17:42 usr
drwxr-xr-x. 19 root root 4096 Apr 8 17:45 var
bash-4.3#
We can list the running containers by using runc list in another terminal.
$ runc list
ID PID STATUS BUNDLE CREATED
fedora 7770 running /runc 2016-04-08T18:02:12.186900248Z
We can exec another process in the container using runc exec
[root@dhcp-16-129 ~]# runc exec fedora sh
sh-4.3# ps -ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 18:02 ? 00:00:00 bash
root 10 0 0 18:04 ? 00:00:00 sh
root 16 10 0 18:04 ? 00:00:00 ps -ef
sh-4.3# exit
exit
We create a rootfs using dnf just like we did for Fedora.
$ mkdir /runc/containers/redis/rootfs
$ cd /runc/containers/redis
$ dnf install --installroot /runc/containers/redis/rootfs redis
Next, we generate a configuration using ocitools
$ ocitools generate --args /usr/bin/redis-server --network host
We customize the args to start the redis-server and set network to host which means that we will use the host network stack
instead of creating a new network namespace for the container.
Next, we start the redis container
$ runc start redis
1:C 08 Apr 18:08:22.665 # Warning: no config file specified, using the default config. In order to specify a config file use /usr/bin/redis-server /path/to/redis.conf
1:M 08 Apr 18:08:22.667 # You requested maxclients of 10000 requiring at least 10032 max file descriptors.
1:M 08 Apr 18:08:22.667 # Redis can't set maximum open files to 10032 because of OS error: Operation not permitted.
1:M 08 Apr 18:08:22.667 # Current maximum open files is 1024. maxclients has been reduced to 992 to compensate for low ulimit. If you need higher maxclients increase 'ulimit -n'.
_._
_.-``__ ''-._
_.-`` `. `_. ''-._ Redis 3.0.6 (00000000/0) 64 bit
.-`` .-```. ```\/ _.,_ ''-._
( ' , .-` | `, ) Running in standalone mode
|`-._`-...-` __...-.``-._|'` _.-'| Port: 6379
| `-._ `._ / _.-' | PID: 1
`-._ `-._ `-./ _.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' | http://redis.io
`-._ `-._`-.__.-'_.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' |
`-._ `-._`-.__.-'_.-' _.-'
`-._ `-.__.-' _.-'
`-._ _.-'
`-.__.-'
1:M 08 Apr 18:08:22.669 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1:M 08 Apr 18:08:22.669 # Server started, Redis version 3.0.6
1:M 08 Apr 18:08:22.670 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
1:M 08 Apr 18:08:22.670 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
1:M 08 Apr 18:08:22.670 * DB loaded from disk: 0.001 seconds
1:M 08 Apr 18:08:22.670 * The server is now ready to accept connections on port 6379
We have our redis-server up and running!
We can try to connect to it from a redis-cli by exec'ing into the container.
$ runc exec redis redis-cli
127.0.0.1:6379>
127.0.0.1:6379> set name mrunal
OK
127.0.0.1:6379> get name
"mrunal"
127.0.0.1:6379> quit
These are some simple examples to get started with OCI. ocitools generate allows configuring the config with flags.
We use the host networking stack as a convenience. In a future post, we will delve into OCI hooks that allow
setting up container networking.
A developer contacted me about building a container which will run as a log aggregator for
fluentd
. This container needed to be a SPC container that would manage parts of the host system, namely the log files under /var/logs.
Being a good conscientious developer, he wanted to run his application as securely as possible.
The option he wanted to avoid was running the container in --privileged
mode, removing all security
from the container. When he ran his container SELinux
complained about the container processes trying to read log files.
He asked me if there was a way to run a container where SELinux would allow the access but the container process could still be confined. I suggested that he could disable SELinux protections for just this container, leaving SELinux enforcing on for the other containers and for the host.
docker run -d --security-opt label:disable -v /var/log:/var/log fluentd
We did not like this solution. I believe SELinux provides the best security separation currently available for containers.
Another option would talked about was relabeling the content in the /var/log directories
docker run -d -v /var/log:/var/log:Z fluentd
The problem with this is that all of the files under /var/log would not be labeled with a container
specific label (svirt_sandbox_file_t
) Other parts of the host system like Logrotate, and log scanners would now be blocked from access the log files.
The best option we came up with was to generate a new type
to run the container with.
We need to write a little bit of writing policy to make this happen. Here is what I came up with:
cat container_logger.te
policy_module(container_logger, 1.0)
virt_sandbox_domain_template(container_logger)
##############################
# virt_sandbox_net_domain(container_logger_t)
gen_require(`
attribute sandbox_net_domain;
')
typeattribute container_logger_t sandbox_net_domain;
##############################
logging_manage_all_logs(container_logger_t)
Compile and install the policy
make -f /usr/selinux/devel/Makefile container_login.pp
semodule -i container_login.pp
Run the container with the new policy.
docker run -d -v /var/log:/var/log --security-opt label:type:container_logger_t -n logger fluentd
Exec into the container to make sure you can read/write the log files.
docker exec -ti logger cat /var/log/messages
docker exec -ti logger touch /var/log/foobar
docker exec -ti logger rm /var/log/foobar
Everything works!
policy_module(container_logger, 1.0)
policy_module
names the policy and also brings in all standard definitions of policy. All policy type
enforcement files start with this.
virt_sandbox_domain_template(container_logger)
virt_sandbox_domain_template
is a template macro that actually creates the container_logger_t type, and
sets up all of the policy so that the docker process (docker_t
) can transition to it. It also defines
rules that allow it to manage svirt_sandbox_file_t files and sets it up to be MCS Separated. Meaning it
will only be able to use its content and no other containers content, whether or not the container is running
as the default type svirt_lxc_net_t
or a custom type.
##############################
# virt_sandbox_net_domain(container_logger_t)
gen_require(`
attribute sandbox_net_domain;
')
typeattribute container_logger_t sandbox_net_domain;
##############################
This section will eventually be an interface virt_sandbox_net_domain
. (I sent a patch to upstream
selinux-policy package to add this interface) This new interface just assigns an attribute to container_logger_t
. Attributes bring in lots of policy rules, basically this attribute gives full network access to the container_logger_t
processes. If your container did not need access to the network or you wanted to tighten the network ports that container_logger_t
would be able to listen on or connect to, you would not use this interface.
logging_manage_all_logs(container_logger_t)
This last interface logging_manage_all_logs
gives container_logger_t the ability to manage all of the log
file types. SELinux interfaces are defined and shipped under /usr/share/selinux/devel.
Adding a fairly simple policy module allows us to run the container as securely as possible and still able to get the job done.
Documentation for Atomic Registry is being developed based on the OpenShift Docs repo. We want to publish these nightly from the master branch at docs.projectatomic/registry
following the 'getting started instructions' at http://www.projectatomic.io/docs/gettingstarted/, i noticed an error when pulling the docker registry image.
e94834ac9522: Error pulling image (latest) from docker.io/registry, ApplyLayer exit status 1 stdout: stderr: unexpected EOF FATA[0038] Error pulling image (latest) from docker.io/registry, ApplyLayer exit status 1 stdout: stderr: unexpected EOF [centos@megacore01 ~]$ sudo docker create -p 5000:5000 -v /var/lib/local-registry:/srv/registry -e STANDALONE=false -e MIRROR_SOURCE=https://registry-1.docker.io -e MIRROR_SOURCE_INDEX=https://index.docker.io -e STORAGE_PATH=/srv/registry --name=local-registry registry Unable to find image 'registry:latest' locally latest: Pulling from docker.io/registry
i just think the notes need to be updated for the following instructions: https://docs.docker.com/registry/deploying/. this will deploy the docker registry 2.0 server, from what i understand. when i create that container, it seems the issue clears up. thus, i ended up using registry 2.0. (i hope that's correct?)
[centos@megacore01 ~]$ sudo docker create -p 5000:5000 \ > -v /var/lib/local-registry:/srv/registry \ > -e STANDALONE=false \ > -e MIRROR_SOURCE=https://registry-1.docker.io \ > -e MIRROR_SOURCE_INDEX=https://index.docker.io \ > -e STORAGE_PATH=/srv/registry \ > --name=local-registry registry:2 Unable to find image 'registry:2' locally 2: Pulling from docker.io/registry d3a1f33e8a5a: Pull complete c22013c84729: Pull complete d74508fb6632: Pull complete 91e54dfb1179: Pull complete 5c3e6bcaa8b0: Pull complete a5b8dc690ce7: Pull complete e4aee72fc6c3: Pull complete 76b7062ceb9a: Pull complete 6228a99f9630: Pull complete e024fb496e6b: Pull complete 1e847b14150e: Already exists docker.io/registry:2: The image you are pulling has been verified. Important: image verification is a tech preview feature and should not be relied on to provide security. Digest: sha256:0631faa22077d0494e9ac3d7e90bac3eeb6fd9f579cbf7b87dab1cda85c86f73 Status: Downloaded newer image for docker.io/registry:2 eebd0f819ac4352913e94a654d31a9bc2ea6575e2f7a17632a419430b2697274 [centos@megacore01 ~]$
Service seems to be running fine...
[centos@megacore01 ~]$ sudo systemctl status local-registry local-registry.service - Local Docker Mirror registry cache Loaded: loaded (/etc/systemd/system/local-registry.service; enabled) Active: active (running) since Mon 2015-08-31 18:59:24 UTC; 18s ago Main PID: 2751 (docker) CGroup: /system.slice/local-registry.service └─2751 /usr/bin/docker start -a local-registry Aug 31 18:59:24 megacore01.jinkit.com.novalocal systemd[1]: Started Local Docker Mirror registry cache. Aug 31 18:59:26 megacore01.jinkit.com.novalocal docker[2751]: time="2015-08-31T18:59:26Z" level=warning msg="No HTTP secret provided - generated random secret. This may cause problems with upl... Aug 31 18:59:26 megacore01.jinkit.com.novalocal docker[2751]: time="2015-08-31T18:59:26Z" level=info msg="redis not configured" instance.id=ad99935f-6dcd-48b7-8a93-f962deba2318 version=v2.1.1 Aug 31 18:59:26 megacore01.jinkit.com.novalocal docker[2751]: time="2015-08-31T18:59:26Z" level=info msg="using inmemory blob descriptor cache" instance.id=ad99935f-6dcd-48b7-8a93-f...sion=v2.1.1 Aug 31 18:59:26 megacore01.jinkit.com.novalocal docker[2751]: time="2015-08-31T18:59:26Z" level=info msg="listening on [::]:5000" instance.id=ad99935f-6dcd-48b7-8a93-f962deba2318 version=v2.1.1 Aug 31 18:59:26 megacore01.jinkit.com.novalocal docker[2751]: time="2015-08-31T18:59:26Z" level=info msg="Starting upload purge in 16m0s" instance.id=ad99935f-6dcd-48b7-8a93-f962deb...sion=v2.1.1 Hint: Some lines were ellipsized, use -l to show in full.
I recently an email asking about --selinux-enabled in the docker daemon, I thought others might wonder about this so I wrote this blog.
I'm currently researching the topic of `--selinux-enabed` in docker and what it is doing when set to TRUE.
From what I'm seeing, it simply will set context and labels to the services (docker daemon) when SELinux is enabled on the system and not using OverlayFS.
But I'm wondering if that is even correct, and if so, what else is happening when setting `--selinux-enabled` to TRUE.
--selinux-enabled on the docker daemoin causes it to set SELinux labels on the containers. Docker reads the contexts file /etc/selinux/targeted/contexts/lxc_contexts
for the default context to run containers.
cat /etc/selinux/targeted/contexts/lxc_contexts
process = "system_u:system_r:svirt_lxc_net_t:s0"
content = "system_u:object_r:virt_var_lib_t:s0"
file = "system_u:object_r:svirt_sandbox_file_t:s0"
sandbox_kvm_process = "system_u:system_r:svirt_qemu_net_t:s0"
sandbox_kvm_process = "system_u:system_r:svirt_qemu_net_t:s0"
sandbox_lxc_process = "system_u:system_r:svirt_lxc_net_t:s0"
Docker by defaults a confined SELinux type svirt_lxc_net_t
to isolate the container processes from the host, and it generates a unique MCS label to allow SELinux to prevent one container process from attacking other container processes and content.
If you don't specify --selinux-enabled. Docker does not execute SELinux code to set labels. When docker launches a container process, the system falls back to default transtion policy. This means the container processes will either run as docker_t or spc_t (Depending on the version of policy you have installed.) Both of these types are unconfined. SELinux will provide no security separation for these container processes.
In addition, I'm also wondering what the impact will be, when `--selinux-enabed` is set to TRUE together with `--icc` to TRUE? Does it have any impact or is it unrelated.
Could it be possible that with `--icc` and `--selinux-enabed` set to TRUE that we might have challenges when communicating between Containers when the connection is not specified (as it should be, when `--icc` is set to FALSE).
Unrelated --icc FALSE sets up iptables rules that prevent containers from connecting to each other over the local network to each other. The default SELinux policy allows all network connectivity between containers.
Also, does anybody know what will happen if I run `docker` with `--selinux-enabled` for a while (so start containers, pull images, etc.) and then restart the daemon with `--selinux-enabled`. Is it possible that it does impact the environment somehow and that certain activities need to be done or shouldn't this affect the service at all?
The only potential problem I can think of would be volume mounting being mislabeled. If you volume mount content off of the host with -v /source/path:/dest/path:Z and SELinux is disabled, docker will not set up labels for the container. If you later turn on --selinux-enabled these containers would not be able to read/write the content in the volume mounted in. Thinking about this further, I am not sure what labels the containers created while --selinux-disabled was set. These precreated containers would probably continue to run with an unconfined domain, and newer containers would run with confinement.
@tigert I think this is for you.
I'd like to design a root page for the new docs.projectatomic.io. My thought was simply a switchboard with a grid of project icons (with names and a 10-word description), which, when you click on them, brings you to the appropriate set of docs. But I am completely willing to be swayed by someone with actual web design skills to do it another way.
We attempt to ship new versions of ProjectAtomic every 6 weeks. I am in charge of the docker portion of each release. I also lead the team developing the atomic command. Just because Docker releases a new version does not mean this instantly gets into the RHEL release. We
would like to allow our QE team time for testing, and to make sure it is "Enterprise Ready". Towards the end of this 6 week period Docker released an updated Docker 1.6.1 package with a series of
CVEs.
Red Hat's Security Response Team analysed these CVEs and found them to be not something worth waiting for. Of course---should there be any issues of actual importance, Red Hat will do everything it can to have a timely (but tested) asynchronous release.
Trevor Jay from Red Hat Security states
Technically speaking, these don't cross any trust boundaries. Docker images are root-run software. They can drop or restrict permissions and capabilities so that you're protected should they become compromised just like any other software that starts with elevated privileges, but you are inherently trusting the image itself to be well-written (to take advantage of the safeties we provide) and non-malicious.
This is all about trusting the application you install on your system. Sometimes I worry people have the opinion that any piece of software I install, as long as it is in a container I am safe. I believe Docker is playing whack a mole with these vulnerabilities and preventing this is going to be near impossible.
Trevor continues.
Container safety is about restricting what can happen when your application get owned, not about randomly running potential malware.
Hope I don't sound like a broken record, since I have covered this before.
Guide mentions that I should
configure answers.conf by copying Nulecule/answers.conf.sample to answers.conf
but there is no answers.conf.sample
:
$ ll Nulecule/answers.conf.sample
ls: cannot access 'Nulecule/answers.conf.sample': No such file or directory
$ pwd
/home/tt/g/atomic-site
$ ./create-post.rb
/usr/share/rubygems/rubygems/core_ext/kernel_require.rb:54:in `require': cannot load such file -- launchy (LoadError)
from /usr/share/rubygems/rubygems/core_ext/kernel_require.rb:54:in `require'
from ./create-post.rb:4:in `<main>'
$ gem install launchy
Fetching: launchy-2.4.3.gem (100%)
Successfully installed launchy-2.4.3
Parsing documentation for launchy-2.4.3
Installing ri documentation for launchy-2.4.3
Done installing documentation for launchy after 1 seconds
1 gem installed
$ ./create-post.rb
/usr/share/rubygems/rubygems/core_ext/kernel_require.rb:54:in `require': cannot load such file -- chronic (LoadError)
from /usr/share/rubygems/rubygems/core_ext/kernel_require.rb:54:in `require'
from ./create-post.rb:6:in `<main>'
$ gem install chronic
Fetching: chronic-0.10.2.gem (100%)
Successfully installed chronic-0.10.2
Parsing documentation for chronic-0.10.2
Installing ri documentation for chronic-0.10.2
Done installing documentation for chronic after 1 seconds
1 gem installed
$ ./create-post.rb
/usr/share/rubygems/rubygems/core_ext/kernel_require.rb:54:in `require': cannot load such file -- slop (LoadError)
from /usr/share/rubygems/rubygems/core_ext/kernel_require.rb:54:in `require'
from ./create-post.rb:7:in `<main>'
$ gem install slop
Fetching: slop-4.2.0.gem (100%)
Successfully installed slop-4.2.0
Parsing documentation for slop-4.2.0
Installing ri documentation for slop-4.2.0
Done installing documentation for slop after 0 seconds
1 gem installed
$ ./create-post.rb
./create-post.rb:15:in `<main>': undefined method `parse!' for Slop:Module (NoMethodError)
I was asked a question about running users inside of a docker container: could they still get privileges?
For more background on Linux capabilities, see: http://linux.die.net/man/7/capabilities
We'll start with a simple container where the primary process is running as root. One can look at the capabilities of the current process via grep Cap /proc/self/status
. There is also a capsh
utility.
# docker run --rm -ti fedora grep Cap /proc/self/status
CapInh: 00000000a80425fb
CapPrm: 00000000a80425fb
CapEff: 00000000a80425fb
CapBnd: 00000000a80425fb
CapAmb: 0000000000000000
Notice that the Effective Capabilities (CapEff
) is a non-zero value, which means that the process has capabilities.
Using the pscap
tool, I see that the process has these capabilities.
chown, dac_override, fowner, fsetid, kill, setgid, setuid, setpcap, net_bind_service, net_raw, sys_chroot, mknod, audit_write, setfcap
Now let's run a container as non root using the -u
option.
docker run -u 3267 fedora grep Cap /proc/self/status
CapInh: 00000000a80425fb
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 00000000a80425fb
CapAmb: 0000000000000000
Notice that the Effective capabilities (CapEff) is all zero, but the bounding set of capabilities (CapBnd
) is not. This means that if there is a setuid binary included in the image, it would be possible to gain these capabilities. Notice also, not surprisingly, this number matches the previous container.
So even though this process is running as non root inside the container, it could potentially run with the same capabilities as above if the image builder included a setuid binary.
chown, dac_override, fowner, fsetid, kill, setgid, setuid, setpcap, net_bind_service, net_raw, sys_chroot, mknod, audit_write, setfcap
Docker has a nice feature where you can drop all capabilities via --cap-drop=all
. Now, if we execute the same container with a non privileged user and drop all capabilities:
# docker run --rm -ti --cap-drop=all -u 3267 fedora grep Cap /proc/self/status
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 0000000000000000
CapAmb: 0000000000000000
Now this user cannot gain any capabilities on the system. I would advise almost all users of Docker to that run their containers with non privileged users to use this feature. This adds a lot of security to the system.
The link to "Nulecule examples directory" on http://www.projectatomic.io/docs/atomicapp/ under the section named"What is Atomic App?" is broken.It points to https://github.com/projectatomic/nulecule/tree/master/examples.
The correct link perhaps is https://github.com/projectatomic/nulecule-library
Just a placeholder to remind me I have an email from Langdon with some comments about the site that need to be addressed.
Hi,
I am getting following errors while building Docker build
➜ atomic-site git:(build_fix) ✗ ./docker.sh
Sending build context to Docker daemon 27.43 MB
Step 1 : FROM fedora:23
---> 3944b65d6ed6
Step 2 : MAINTAINER [email protected]
---> Using cache
---> 6c8d31f469ea
Step 3 : WORKDIR /tmp
---> Using cache
---> bad340bbe395
Step 4 : RUN dnf install -y tar libcurl-devel zlib-devel patch rubygem-bundler ruby-devel git make gcc gcc-c++ redhat-rpm-config && dnf clean all
---> Using cache
---> 2e22b2e68693
Step 5 : ADD config.rb /tmp/config.rb
---> Using cache
---> 5bb930c2556c
Step 6 : ADD Gemfile /tmp/Gemfile
---> Using cache
---> cd8f7277ee0e
Step 7 : ADD Gemfile.lock /tmp/Gemfile.lock
---> Using cache
---> e87dbf26294e
Step 8 : ADD lib /tmp/lib
---> Using cache
---> 79401607dae4
Step 9 : RUN bundle install
---> Using cache
---> 26b5551e2e95
Step 10 : EXPOSE 4567
---> Using cache
---> f8152326bd01
Step 11 : ENTRYPOINT bundle exec
---> Using cache
---> f32491e251fa
Step 12 : CMD middleman server
---> Using cache
---> 9f6418bb1840
Successfully built 9f6418bb1840
== The Middleman is loading
Updating git submodules...
fatal: Not a git repository (or any of the parent directories): .git
fatal: Not a git repository (or any of the parent directories): .git
Other observations:
127.0.0.1:4567
in browsing running the siteWhen approaching the forum a new user might want to get hold of some admin or some other "authorized" personell. Also a legal notice might be useful. A good place would be the about page: http://ask.projectatomic.io/en/about/ linked in the footer.
When using SELinux for controlling processes within a container, you need to make sure any content that gets volume mounted into the container is readable and potentially
writable depending on the use case.
By default docker container processes run with the system_u:system_r:svirt_lxc_net_t:s0 label. svirt_lxc_net_t type is allowed to read/execute most content under /usr, but it is
not allowed to use most other types on the system. If you want to volume mount content under /var for example into a container, you need to set the labels on this content. In the docker run man page we mention this.
man docker-run
...
When using SELinux, be aware that the host has no knowledge of container SELinux policy.
Therefore, in the above example, if SELinux policy is enforced, the /var/db directory is not
writable to the container. A "Permission Denied" message will occur and an avc: message in
the host's syslog.
To work around this, at time of writing this man page, the followingcommand needs to be run
in order for the proper SELinux policy type label to be attached to the host directory:
# chcon -Rt svirt_sandbox_file_t /var/db
This got easier recently since Docker finally merged a patch which will be showing up in docker-1.7 (We have been carrying the patch in docker-1.6 on RHEL, Centos and Fedora).
This patch adds support for "z" and "Z" as options on the volume mounts (-v).
For example
docker run -v /var/db:/var/db:z rhel7 /bin/sh
Will automatically do the chcon -Rt svirt_sandbox_file_t /var/db
described in the man page.
Even better,
docker run -v /var/db:/var/db:Z rhel7 /bin/sh
will label the content inside the container with the exact MCS label that the container will run with, basically it runs
chcon -Rt svirt_sandbox_file_t -l s0:c1,c2 /var/db Where s0:c1,c2 differs for each container.
I have a bugzilla that was reported to me, that might become common.
https://bugzilla.redhat.com/show_bug.cgi?id=1230098
The user got AVC's that looked like the following.
Raw Audit Messages
type=AVC msg=audit(1433926625.524:1347): avc: denied { write } for pid=29280 comm="launch" name="addons" dev="dm-2" ino=2491404 scontext=system_u:system_r:svirt_lxc_net_t:s0:c147,c266 tcontext=system_u:object_r:svirt_sandbox_file_t:s0:c372,c410 tclass=dir permissive=0
Notice the bolded MCS labels.
The problem here was the user created a volume and labeled it from one container with the "Z" for one container, then
attempted to share it in another container. Which SELinux denied since the MCS labels differed. As I described in the bugzilla.
I told the reporter that he need to mount it using Z or z, since at some point the volume got a different containers labels on it.
You container processes are running with this label, system_u:system_r:svirt_lxc_net_t:s0:c147,c266 Notice the s0:c147,c266
The volume mount is labeled system_u:object_r:svirt_sandbox_file_t:s0:c372,c410 Notice the MCS label s0:c372,c410 MCS
Security prevents read/write of content with different MCS labels. However it will low read/write of content labeled with an s0 label.
If you volume mount a image with -v /SOURCE:/DESTINATION:z docker will automatically relabel the content for you to s0. If
you volume mount with a "Z", then the label will be specific to the container, and not be able to be shared between containers.
Please add a maintainers file with who can commit and who can rebuild the site. This will be useful for people to be able to ping the right people with issues.
The documentation on http://www.projectatomic.io/docs/gettingstarted/ mentioned to add a "drop-in" for systemd to override docker.. However the example file is incorrect and yields the following error when startup up docker.
Aug 07 23:05:45 atomic01.kaos.realm systemd[1]: docker.service has more than one ExecStart= setting, which is only allowed for Type=oneshot services. Refusing.
There was an open bug up on the docker github that references how to fix this issue.
The solution is simply to ADD an
ExecStart=
before the full ExecStart= line. I've tested this locally and it does indeed work as expected.
Now that we have docs.projectatomic.io, we want to move all docs to it. However, we don't want to break everyone's links. Ideas on how to make redirects work for individual doc pages?
Won't work on atomic, I suggest:
mv /var/lib/docker{,.backup}
or so.
We should probably try to add social media buttons so people can easily tweet, share on Facebook, etc.
Adding tigert to the ticket for discussion. IIRC there are some ways to do this, but some of them involve some objectionable JS.
I received a bug report that it was not working with SELinux. Turns out there was a simple fix which should get into docker-1.7 or docker-1.8 to fully support labeling within a docker container using the ZFS back end.
Sadly ZFS can not be merged into the Linux kernel because of Licensing issues. But if you are willing to use loadable kernel modules you can now run your containers on ZFS with SELinux enforcement.
I recently published an article on the new Atomic command.
One of the questions I got was about command substitution. The user of the tool wanted to have standard bash substitutions to work. Specifically he wanted to allow substitutions or
I decided to try this out by creating a simple Dockerfile.
from rhel7
LABEL RUN echo $PWD/.foobar --user=$(id -u)
Build the container
docker build -t test .
Execute atomic run test
atomic run test
echo
Looking at the label using docker inspect, I see that building the container dropped the $() content.
Changing to quote it.
from rhel7
LABEL RUN echo '$PWD/.foobar' '--user=$(id -u)'
Build the container
docker build -t test .
Execute atomic run test
atomic run test
echo $PWD/.foobar --user=$(id -u)
/root/test1/.foobar --user=0
Hi,
I was going through building the site and found the "Running the atomic.app" steps are not working for me.
I did all necessary steps mentioned in First time setup.
I am getting some kind of info not error.
# answers.conf
➜ atomic-site git:(master) ✗ cat Nulecule/answers.conf
[atomic-site]
datadir = /home/budhram/redhat/atomic-site/data
hostport = 4567
sourcedir = /home/budhram/redhat/atomic-site/source
image = jberkus/atomic-site
provider = kubernetes
[general]
namespace = default
provider = kubernetes
➜ atomic-site git:(master) ✗ sudo atomicapp run Nulecule
[INFO] - main.py - Action/Mode Selected is: run
[INFO] - kubernetes.py - Using namespace default
[INFO] - kubernetes.py - trying kubectl at /usr/bin/kubectl
[INFO] - kubernetes.py - found kubectl at /usr/bin/kubectl
[INFO] - kubernetes.py - Deploying to Kubernetes
Your application resides in Nulecule
Please use this directory for managing your application
Anything wrong in my setup?
Cockpit is no longer part of Atomic, AFAIK, so it should be removed from the website also.
... currently it doesn't indent enough, so that blockquotes look like just a font change.
Reported on Ask: http://ask.projectatomic.io/en/question/3583/bare-metal-install-bootiso/
The boot.iso link in the second section and the Fedora guide links throughout are outdated.
One of the reasons Docker has taken off is because it made it easier for developers to ship and update their software.
They streamlined the development process and allowed developers to choose the runtime environments that their applications
are going to run. The runtime/userspace that the developer chooses gets tested by the QE Team and the exact same runtime gets executed in production.
Part of the development process is usually built around updating the application and the userspace. The developer does things like yum or dnf install, and then copies in code that is particular to his application. But once the developer is done, he usually expects the QE and production to treat this content as read only. I believe that the image of a container should be put into production in a readonly mode, which would prevent the application or processes within the container from writing to the container, it would only allow these processes to write to volumes mounted into the container.
From a security point of view, this is great. Image you are running an application that gets hacked. The first thing the hacker wants to do is to write his hack into the application, so that the next time the application starts up, it starts up with the hackers code. If the image was read/only the hacker would be prevented from leaving his back door. (He would have to break in again). Docker added a feature to handle this via docker run -d --read-only image ...
a while ago. But it is difficult to use, since a lot of applications need to write to temporary directories like /run or /tmp, since these are
readonly the apps fail. You could setup temporary locations on your host to volume mount into the container, but this ended up exposing temporary data to the host.
I wanted to be able to mount a tmpfs into the container. I have been working on a fix for this since last May and it was finally merged in on 12/2/15. moby/moby#13587
It will show up in docker-1.10.
With this patch you can setup tmpfs as /run and /tmp.
docker run -d --read-only --tmpfs /run --tmpfs /tmp IMAGE
I would actually recommend applications in production run with this mode.
You might want to continue to mount in volumes from the host for permanent data.
One cool thing about this patch is that it tar
s up the contents of the underlying directory in the image on top of the
tmpfs. So if you have /run/httpd
directory in the container image, you will have /run/httpd
in containers tmpfs.
You can also do some other stuff with this patch like setting up a temporary /var or /etc inside of your container.
If you execute
docker run -ti --tmpfs /etc fedora /bin/sh
It will mount a tmpfs on /etc, and tar up the content of the underlying /etc onto the new /etc tmpfs. This means you could
make changes to the /etc but it will not survive a container stop. The container will be fresh every time you start and stop the container.
You can also pass tmpfs mount options on the command line
docker run -ti --tmpfs /etc:rw,noexec,nosuid,size=2g container
Docker will pass down these mount options.
It would be nice in the future if developers told people to run in production with the --read-only
docker run/create flags. Or better yet setup atomic labels to do this by default. Then we can separate the way a container application runs in development from the way it runs in production.
I found two resources:
Reading from the above resources does not give a clear answer to what is represented by x and y in /ostree/boot.x.y/ path.
Since we are starting to get more and more infrastructure, I just noticed that we do not have any privacy policy explaining what we are doing with the information and everything.
Clicking quick start on http://www.projectatomic.io/registry/ brings me http://docs.projectatomic.io/registry/latest/registry_quickstart/index.html to which makes the first three links on the sidebar not work (get started, documentation and developer community)
Under http://docs.projectatomic.io/registry/latest/install_config/syncing_groups_with_ldap.html where it says: "LDAP Client Configuration" the formatting is very hard to read (purple key names, light red key values) There are a few other places where this is happening too.
Draft is here - and needs review please:
PR will come tomorrow (Friday 17 June). I will be traveling and out of pocket from 18 June so I need to get this to a point where it can be merged easily.
http://etherpad.osuosl.org/4tlMydVGzp
Open Issues:
1 - I need a place to post some slides from one of the speakers. Can I put them into this repo?
2 - I need to put some pictures into this post. @eliskasl is going to provide some when she gets them off of her camera.
"Atomic Reigstry"
I often get bug reports asking:
Why can't I use docker as a non root user, by default.
Docker has the ability to change the group ownership of the /run/docker.socket to have group permission of 660, with the group
ownership the docker group. This would allow users added to the docker group to be able to run docker containers without having to execute sudo or su to become root. Sounds great...
ls -l /var/run/docker.sock
srw-rw----. 1 root docker 0 Aug 3 13:02 /var/run/docker.sock
On RHEL, Fedora and Centos we prefer to have the docker.socket set like:
ls -l /var/run/docker.sock
srw-rw----. 1 root root 0 Aug 3 13:02 /var/run/docker.sock
If a user can talk to the docker socket, they can execute the following command:
docker run -ti --privileged -v /:/host fedora chroot /host
Giving them full root access to the host system.
It is similar to giving them the following in sudo.
grep dwalsh /etc/sudoers
dwalsh ALL=(ALL) NOPASSWD: ALL
Which would allow them to run sudo sh
and get the same access. But there is one big flaw with this.
Docker has no auditing or logging built in, while sudo does.
Docker currently records events but the events disappear when the docker daemon is restarted. Docker does not currently do any auditing.
From a security perspective, Red Hat has expressed concerns with enabling access to the docker daemon from non-root users, absent auditing and proper logging. We've implemented those controls in PR14446 and are awaiting merge. Short term, it is recommended to implement sudo rules to permit access to the docker daemon. Sudo will then provide logging and audit.
This patch provides much needed system logging for docker's API functions. With this patch, when an API request is made, an entry will be added to the syslog.
Important events will contain the action requested, the container's ID, the dwalsh and login UID of the user issuing the request, the process ID, and any initialized configuration settings. In an effort to reduce the size of the log message, uninitialized configuration parameters are not logged.
We have a another patch for auditing that we are working, but the patch is based on the logging patch. Once the logging patch gets accepted, we will submit the audit patch.
If you want to give docker access to non root users we recommend setting up sudo. Here is a short guide on how to do this.
Add an entry like the following to /etc/sudoers.
grep dwalsh /etc/sudoers
dwalsh ALL=(ALL) NOPASSWD: /usr/bin/docker
This will allow the specified user to run docker as root, without a password.
(NOTE: I do not recommend using NOPASSWD, this would allow any process on your system to become root. If you require the password, the user needs to specify his password when running the docker command, making the system a bit more secure. Sudo gives you a 5 minute grace period to run docker again without password)
Setup an alias for running the docker command
alias docker="sudo /usr/bin/docker"
Now when the user executes the docker command as non root it will be allowed and get proper logging.
docker run -ti --privileged -v /:/host fedora chroot /host
Look at the journal or /var/log/messages.
journalctl -b | grep docker.*privileged
Aug 04 09:02:56 dhcp-10-19-62-196.boston.devel.redhat.com sudo[23422]: dwalsh : TTY=pts/3 ; PWD=/home/dwalsh/docker/src/github.com/docker/docker ; USER=root ; COMMAND=/usr/bin/docker run -ti --privileged -v /:/host fedora chroot /host
Look at audit log
ausearch -m USER_ROLE_CHANGE -i
type=USER_ROLE_CHANGE msg=audit(08/04/2015 09:02:56.514:1460) : pid=23423 uid=root auid=dwalsh ses=1 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 msg='newrole: old-context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
new-context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 exe=/usr/bin/sudo hostname=? addr=? terminal=/dev/pts/3 res=success'
Better yet if you wanted to only allow a user to access a particular container, you could write a simple script:
cat /usr/bin/docker-fedora
#!/bin/sh
docker run -ti --rm fedora /bin/sh
Then configure sudoers to run it:
grep dwalsh /etc/sudoers
dwalsh ALL=(ALL) NOPASSWD: /usr/bin/docker-fedora
This user would only be able to run the fedora container, without privileges.
We have other patches that we are working on to make the docker daemon more secure including authentication. Here is an issue where this is an ongoing discussion on it.
And we are developing a proposal to add Authorization/RBAC (Roles Based Access Control) to docker, to allow administrators to specify which users are allowed to do which activity on which containers/images.
https://github.com/rhatdan/docker-rbac
We believe the security of managing the docker daemon needs a lot of improvement, before we can think of opening up access to non privileged users directly. Until these fixes are made sudo
is the best option.
I believe SELinux is the best security measure we currently have for controlling access between standard docker containers. Of course I might be biased. All of the security separation measures are nice and should be enabled for security in depth, but SELinux policy prevents a lot of break out situations where the other security mechanisms fail. With SELinux on Docker we write policy that says that the container process running as svirt_lxc_net_t
can only read/write svirt_sandbox_file_t
by default, (There are some booleans to allow it to write to network shared storage, if required like NFS.) This means that if a process from a docker container broke out of the container it would only be able to write to files/directories labeled svirt_sandbox_file_t
. We take advantage of MCS separation to ensure that the processes running in the container can only write to svirt_sandbox_file_t
files with the same MCS Label or "s0
".
The problem with SELinux and Docker comes up when you are volume mounting content into a docker container.
There are multiple ways to run containers with an SELinux enforced system when sharing content inside of the container.
The SELinux policy for svirt_lxc_net_t
also allows the processes to read/execute most of the labels under /usr. This means if you wanted to volume mount an executable from /usr into a container, SELinux would probably allow the processes to execute the commands. If you want to share the same directory with multiple containers such that the containers can read/execute the content you could label the content as usr_t
.
docker run -v /opt:/opt rhel7 ...
If you are sharing parts of the Host OS that can not be relabeled, you can always disable SELinux separation in the container.
docker run --security-opt label:disable rhel7 ...
This means that you can continue to run your system with SELinux enforcing and even run most of your containers locked down, but for this one particular container, it will run an unconfined type. (We use the spc_t type for this.) I wrote a blog on Super Privileged Containers (SPC) a while back, these are containers that you don't want to isolate from the system.
http://developerblog.redhat.com/2014/11/06/introducing-a-super-privileged-container-concept/
Docker has the ability to automatically relabel content on the disk when doing volume mounts, by appending a :z
or :Z
on to the mount point.
If you want to take a random directory from say /var/foobar and share it with multiple containers such that the containers can write to these volumes, you just need to add the :z to the end of the volume mount.
docker run -v /var/foobar:/var/foobar:z rhel7 ...
If you want to take a random directory from say /var/foobar and have it private to the container such that only that container can write to the volume, you just need to add the :Z to the end of the volume mount.
docker run -v /var/foobar:/var/foobar:Z rhel7 ...
You should be careful when doing this, since the tool will basically the following:
chcon -R -t svirt_sandbox_file_t /SOURCEDIR
Which could cause breakage on your system. Make sure the directory is only to be used with containers. The following is probably a bad
idea.
docker run -v /var:/var:Z rhel7
This appears to make an image the VirtualBox can't boot. Using qemu-img convert works fine.
We got a bug report the other day with the following comment:
On a RHEL 7 host (registered and subscribed), I can use Yum to install additional packages from 'docker run ...' or in a Docker file. If I install the 'iputils' package (and any dependencies), the basic 'ping' command does not work. However, if I use the 'busybox' image from the public Docker index, it's ping command works perfectly fine (on same RHEL 7 host).
# docker run -i -t registry.access.redhat.com/rhel7:0-21 bash
# yum install -y iputils
# ping 127.0.0.1
bash: /usr/bin/ping: Operation not permitted
# docker run -ti --rm busybox /bin/sh
# editor
# ping google.com
PING google.com (74.125.226.4): 56 data bytes
64 bytes from 74.125.226.4: seq=0 ttl=52 time=44.923 ms
64 bytes from 74.125.226.4: seq=1 ttl=52 time=46.181 ms
^C
My first idea when I saw this was that ping needs a set of capabilities
enabled: net_raw and net_admin. By default our containers run without the
net_admin capability.
What if I added net_admin capability would the container work.
# docker run -i -t --cap-add net_raw --cap-add net_admin registry.access.redhat.com/rhel7:0-21 bash
# ping google.com
PING google.com (74.125.228.3) 56(84) bytes of data.
64 bytes from iad23s05-in-f3.1e100.net (74.125.228.3): icmp_seq=1 ttl=47 time=11.0 ms
64 bytes from iad23s05-in-f3.1e100.net (74.125.228.3): icmp_seq=2 ttl=47 time=11.1 ms
^C
But I really do not want to have to run a privileged
container, just so ping
will work. But then we got more information on the bugzilla, where someone
just copied the executable to another name and it worked.
Then another engineer tried to just copy the ping command to another name and it worked!!!
# docker run -i -t registry.access.redhat.com/rhel7:0-21 bash
# yum install -y iputils
# ping 127.0.0.1
bash: /usr/bin/ping: Operation not permitted
# mkdir -p /opt/ping
# cp /usr/bin/ping /opt/ping/
# /opt/ping/ping -c1 10.3.1.1
PING 10.3.1.1 (10.3.1.1) 56(84) bytes of data.
64 bytes from 10.3.1.1: icmp_seq=1 ttl=62 time=0.358 ms
...
This told us their must be something about the permissions on the ping command itself.
We eventually figured out the problem was caused by using with File Capabilities.
In Red Hat based distributions we are using File Capabilties which allow
applications to start up with a limited number of capabilities, even if launched
by root. In other distributions like busybox and Ubuntu, the ping command is
shipped as setuid root. If we executed chcon 4755 /usr/bin/ping in the container the container, ping would start to work.
# getcap /usr/bin/ping
/usr/bin/ping = cap_net_admin,cap_net_raw+ep
man capabilities
...
File capabilities
Since kernel 2.6.24, the kernel supports associating capability sets with an executable file using setcap(8). The file capability sets are stored in an extended attribute (see setxattr(2)) named security.capability. Writing to this extended attribute requires the CAP_SETFCAP capability. The file capability sets, in conjunction with the capability sets of the thread, determine the capabilities of a thread after an execve(2).
The three file capability sets are:
Permitted (formerly known as forced):
These capabilities are automatically permitted to the thread, regardless of the thread's inheritable capabilities.
Inheritable (formerly known as allowed):
This set is ANDed with the thread's inheritable set to determine which inheritable capabilities are enabled in the permitted set of the thread after the execve(2).
Effective:
This is not a set, but rather just a single bit. If this bit is set, then during an execve(2) all of the new permitted capabilities for the thread are also raised in the effective set. If this bit is not set, then after an execve(2), none of the new permitted capabilities is in the new effective set.
By setting the file capabilities on ping to cap_net_admin,cap_net_raw+ep
, this means when a user executes the ping command the kernel will attempt to gain cap_net_admin and cap_net_raw capabilities when it gets executed. If both capabilities are not available to the user, the execution will fail.
It's the execve() of /usr/bin/ping in the first place that is failing:
# strace ping -h
execve("/usr/bin/ping", ["ping", "-h"], [/* 12 vars */]) = -1 EPERM (Operation not permitted)
If however we removed the Effective
bit, the application will execute and would only fail if it tried to execute a system call that requires the cap_net_raw capability, due to it attempting to raise capabilities that are not available.
# setcap cap_net_raw,cap_net_admin+p /usr/bin/ping
# ping -c1 10.3.1.1
PING 10.3.1.1 (10.3.1.1) 56(84) bytes of data.
64 bytes from 10.3.1.1: icmp_seq=1 ttl=62 time=0.358 ms
...
There is no need for the kernel to automatically add those capabilities on execve(). The ping application is doing that step itself.
We have opened up Bugzilla on iputils to fix this in the package for rhel and Fedora. So if your application is blowing up for
strange reasons within a container, you might want to check out its file capabilities.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.