Giter Site home page Giter Site logo

projectatomic / atomic-site Goto Github PK

View Code? Open in Web Editor NEW
82.0 82.0 121.0 19.14 MB

Source code for projectatomic.io

Home Page: projectatomic.io

License: Other

Shell 0.53% JavaScript 0.44% CSS 0.32% Ruby 12.99% Python 0.82% Dockerfile 0.39% Haml 60.79% Sass 23.73%

atomic-site's People

Contributors

aweiteka avatar baude avatar benhosmer avatar bexelbie avatar bproffitt avatar cdrage avatar cgwalters avatar coolbrg avatar dustymabe avatar garrett avatar giuseppe avatar goern avatar ipbabble avatar jasonbrooks avatar jberkus avatar jlebon avatar johnmark avatar jzb avatar miabbott avatar mscherer avatar nzwulfin avatar rhamilto avatar runcom avatar scollier avatar scovl avatar smarterclayton avatar stefwalter avatar tomsweeneyredhat avatar trishnaguha avatar veillard avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

atomic-site's Issues

New Blog Summission

Docker logging to container output to journald

Docker has added a new feature to allow alternate logging drivers.

Currently in docker-1.6 they support json-file, which is the old default, syslog or even none.

If you ran your docker daemon with the --log-driver=syslog option, any output from a container would go directly to system syslog daemon, and usually end up in /var/log/messages.

I added a patch to support journald as a logging driver, and it was recently merged.

In docker-1.7 you will have the option to use --log-driver=journald on either your docker daemon to cause all of your container logging to go directly to the journal, or you could use the --log-driver=journald on a docker run/create line, so you could get individual containers running with different logging drivers.

To test this I setup my docker daemon to run with the following options.

docker -d --selinux-enabled --log-driver=journald

Now I run a simple container.

docker run fedora echo "Dan Walsh was here"

Then I look up the container ID for the newly created container, and I can use journalctl to examine the content.

journalctl MESSAGE_ID=cdf02c627e27
-- Logs begin at Mon 2015-04-06 16:06:42 EDT, end at Fri 2015-04-24 08:40:38 EDT. --
Apr 20 15:06:39 dhcp-10-19-62-196.boston.devel.redhat.com docker[27792]: Dan Walsh was here

Currently docker logs command only works with json-file backend. But I hope to eventually get the docker daemon to communicate with the journald to get this information. Of course if anyone wants to take a stab at this, I would welcome your effort.

making Container Best Practices push to /docs/ work

Summarizing a couple weeks of emails:

The team writing Container Best Practices wants to enable an automated CI push for docs to show up at /docs/container-best-practices.html. This is a reasonable request, so we'd like to accomodate it. Here's my idea on getting it to work:

  1. we get Middleman to ignore docs/container-best-practices.html for build, but still serve it
  2. give their team a bot login to the atomic-site server
  3. allow their CI to push a new file to that location

The problem is that I'm a bit stuck on how to accomplish some of the above steps, due to my unfamiliarity with middleman and OpenShift Online v2 permissions. Help?

@tigert @mscherer @garrett @jlebon

make www redirect work with subdirs

The way it is now:

If I type a subdir without the "www", I get redirected to the projectatomic home page:

http://projectatomic.io/docs

The way it should be:

I should get redirected to the page indicated.

This isn't a big deal, but then it might not be that hard to fix, either.

BLOG: No New Privileges support in docker

We recently add support for no_new_priv to docker.
It was also earlier added to runc and the Open Container Initiative
spec.
This security feature was added to the Linux kernel back in 2012. A process can set the no_new_priv bit
in the kernel that persists across fork, clone and execve. The no_new_priv bit ensures that the process or its
children processes do not gain any additional privileges. A process isn't allowed to unset the no_new_priv bit
once it is set. Process with no_new_privs are not allows to change uid/gid or gain any other capabilities. Even
if the process executes setuid binaries or executables with file capability bits set. no_new_priv also, prevents LSMs like SELinux from transitioning to process labels that have access not allowed to the current process. This means an SELinux process is only allowed to transition to a process type with less privileges.

For more details see kernel documentation

Here is an example showcasing how it helps in docker:

Create a setuid binary that displays the effective uid

[$ dockerfiles]# cat testnnp.c 
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>

int main(int argc, char *argv[])
{
        printf("Effective uid: %d\n", geteuid());
        return 0;
}
[$ dockerfiles]# make testnnp
cc     testnnp.c   -o testnnp

Now we will add the binary to a docker image

[$ dockerfiles]# cat Dockerfile 
FROM fedora:latest
ADD testnnp /root/testnnp
RUN chmod +s /root/testnnp
ENTRYPOINT /root/testnnp

[$ dockerfiles]# docker build -t testnnp .
Sending build context to Docker daemon 12.29 kB
Step 1 : FROM fedora:latest
 ---> 760a896a323f
Step 2 : ADD testnnp /root/testnnp
 ---> 6c700f277948
Removing intermediate container 0981144fe404
Step 3 : RUN chmod +s /root/testnnp
 ---> Running in c1215bfbe825
 ---> f1f07d05a691
Removing intermediate container c1215bfbe825
Step 4 : ENTRYPOINT /root/testnnp
 ---> Running in 5a4d324d54fa
 ---> 44f767c67e30
Removing intermediate container 5a4d324d54fa
Successfully built 44f767c67e30

Now we will create and run a container without no-new-privileges.

[$ dockerfiles]# docker run -it --rm --user=1000  testnnp
Effective uid: 0

This shows that even though you requested a non privileged user (UID=1000) to run your container,
that user would be able to become root by executing the setuid app on the container image.

Running with no-new-privileges prevents the uid transition while running a setuid binary

[$ dockerfiles]# docker run -it --rm --user=1000 --security-opt=no-new-privileges testnnp
Effective uid: 1000

As you can see above the container process is still running as UID=1000, meaning that even if the
image has dangerous code in it, we can stil prevent the user from escalating privs.

If you want to allow users to run images as a non privilege UID, in most cases you would want to
prevent them from becoming root. no_new_privileges is a great tool for guaranteeing this.

Guestbook example requires kube-dns

The kubernetes guestbook example was updated with the following a week ago : "Rewrite guestbook example to use kube-dns instead of host:port"
kubernetes/kubernetes@4679a37
This means that currently, following the getting started guide results in a non-functional demo system. To make it work, I had to create a replication controller and service by editing the templated yaml.in files from kubernetes/cluster/addons/dns/, then modify KUBELET_ARGS to include cluster_dns and cluster_domain arguments.

BLOG: Getting started with OCI

Getting started with OCI

This post will walk you through the steps to running a runc container using OCI configuration.

We will walk through two examples. One for running a fedora container and another for running a redis container.
There are 3 steps to running a runc container:

  1. Construct a rootfs
  2. Create a OCI configuration
  3. Start the runtime

Getting ocitools

ocitools are a bunch of utilities to work with the OCI specification. We are going to make use of the generate utility. It helps generate a OCI configuration for runc with a command line that is similar to docker run.

$ export GOPATH=/some/dir
$ go get github.com/opencontainers/ocitools
$ cd $GOPATH/src/github.com/opencontainers/ocitools
$ make && make install

Note: This will be available as a package on Fedora/RHEL as a package soon just like runc.

Fedora container

There are various ways to construct a rootfs. Ultimately, it is just a directory with a bunch of file that will be visible
and used inside your container. We will use dnf to construct a rootfs.

First we create a directory for our container:

$ mkdir /runc/containers/fedora/rootfs
$ cd /runc/containers/fedora
$ dnf install --installroot /runc/containers/fedora/rootfs bash coreutils procps-ng iptools

Next we generate a configuration using ocitools.

$ ocitools generate --args bash

This create a config.json in the current directory. By passing --args we change the command to be run to bash from the default.

And finally we start a container.

$ runc start fedora
bash-4.3# ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 18:02 ?        00:00:00 bash
root         7     1  0 18:03 ?        00:00:00 ps -ef
bash-4.3# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
bash-4.3# ls -l
total 48
lrwxrwxrwx.   1 root root    7 Feb  3 22:10 bin -> usr/bin
dr-xr-xr-x.   2 root root 4096 Feb  3 22:10 boot
drwxr-xr-x.   5 root root  360 Apr  8 18:02 dev
drwxr-xr-x.  29 root root 4096 Apr  8 17:45 etc
drwxr-xr-x.   2 root root 4096 Feb  3 22:10 home
lrwxrwxrwx.   1 root root    7 Feb  3 22:10 lib -> usr/lib
lrwxrwxrwx.   1 root root    9 Feb  3 22:10 lib64 -> usr/lib64
drwxr-xr-x.   2 root root 4096 Feb  3 22:10 media
drwxr-xr-x.   2 root root 4096 Feb  3 22:10 mnt
drwxr-xr-x.   2 root root 4096 Feb  3 22:10 opt
dr-xr-xr-x. 287 root root    0 Apr  8 18:02 proc
dr-xr-x---.   2 root root 4096 Apr  8 17:43 root
drwxr-xr-x.   5 root root 4096 Apr  8 17:45 run
lrwxrwxrwx.   1 root root    8 Feb  3 22:10 sbin -> usr/sbin
drwxr-xr-x.   2 root root 4096 Feb  3 22:10 srv
dr-xr-xr-x.  13 root root    0 Apr  7 22:19 sys
drwxrwxrwt.   2 root root 4096 Apr  8 17:45 tmp
drwxr-xr-x.  12 root root 4096 Apr  8 17:42 usr
drwxr-xr-x.  19 root root 4096 Apr  8 17:45 var
bash-4.3# 

We can list the running containers by using runc list in another terminal.

$ runc list
ID          PID         STATUS      BUNDLE      CREATED
fedora      7770        running     /runc       2016-04-08T18:02:12.186900248Z

We can exec another process in the container using runc exec

[root@dhcp-16-129 ~]# runc exec fedora sh
sh-4.3# ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 18:02 ?        00:00:00 bash
root        10     0  0 18:04 ?        00:00:00 sh
root        16    10  0 18:04 ?        00:00:00 ps -ef
sh-4.3# exit
exit

Redis container

We create a rootfs using dnf just like we did for Fedora.

$ mkdir /runc/containers/redis/rootfs
$ cd /runc/containers/redis
$ dnf install --installroot /runc/containers/redis/rootfs redis

Next, we generate a configuration using ocitools

$ ocitools generate --args /usr/bin/redis-server --network host

We customize the args to start the redis-server and set network to host which means that we will use the host network stack
instead of creating a new network namespace for the container.

Next, we start the redis container

$ runc start redis
1:C 08 Apr 18:08:22.665 # Warning: no config file specified, using the default config. In order to specify a config file use /usr/bin/redis-server /path/to/redis.conf
1:M 08 Apr 18:08:22.667 # You requested maxclients of 10000 requiring at least 10032 max file descriptors.
1:M 08 Apr 18:08:22.667 # Redis can't set maximum open files to 10032 because of OS error: Operation not permitted.
1:M 08 Apr 18:08:22.667 # Current maximum open files is 1024. maxclients has been reduced to 992 to compensate for low ulimit. If you need higher maxclients increase 'ulimit -n'.
                _._                                                  
           _.-``__ ''-._                                             
      _.-``    `.  `_.  ''-._           Redis 3.0.6 (00000000/0) 64 bit
  .-`` .-```.  ```\/    _.,_ ''-._                                   
 (    '      ,       .-`  | `,    )     Running in standalone mode
 |`-._`-...-` __...-.``-._|'` _.-'|     Port: 6379
 |    `-._   `._    /     _.-'    |     PID: 1
  `-._    `-._  `-./  _.-'    _.-'                                   
 |`-._`-._    `-.__.-'    _.-'_.-'|                                  
 |    `-._`-._        _.-'_.-'    |           http://redis.io        
  `-._    `-._`-.__.-'_.-'    _.-'                                   
 |`-._`-._    `-.__.-'    _.-'_.-'|                                  
 |    `-._`-._        _.-'_.-'    |                                  
  `-._    `-._`-.__.-'_.-'    _.-'                                   
      `-._    `-.__.-'    _.-'                                       
          `-._        _.-'                                           
              `-.__.-'                                               

1:M 08 Apr 18:08:22.669 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1:M 08 Apr 18:08:22.669 # Server started, Redis version 3.0.6
1:M 08 Apr 18:08:22.670 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
1:M 08 Apr 18:08:22.670 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
1:M 08 Apr 18:08:22.670 * DB loaded from disk: 0.001 seconds
1:M 08 Apr 18:08:22.670 * The server is now ready to accept connections on port 6379

We have our redis-server up and running!

We can try to connect to it from a redis-cli by exec'ing into the container.

$ runc exec redis redis-cli
127.0.0.1:6379> 
127.0.0.1:6379> set name mrunal
OK
127.0.0.1:6379> get name
"mrunal"
127.0.0.1:6379> quit

These are some simple examples to get started with OCI. ocitools generate allows configuring the config with flags.
We use the host networking stack as a convenience. In a future post, we will delve into OCI hooks that allow
setting up container networking.

BLOG: Extending SELinux Policy for containers.

Extending SELinux policy for containers

A developer contacted me about building a container which will run as a log aggregator for
fluentd. This container needed to be a SPC container that would manage parts of the host system, namely the log files under /var/logs.

Being a good conscientious developer, he wanted to run his application as securely as possible.
The option he wanted to avoid was running the container in --privileged mode, removing all security
from the container. When he ran his container SELinux complained about the container processes trying to read log files.

He asked me if there was a way to run a container where SELinux would allow the access but the container process could still be confined. I suggested that he could disable SELinux protections for just this container, leaving SELinux enforcing on for the other containers and for the host.

docker run -d --security-opt label:disable -v /var/log:/var/log fluentd

We did not like this solution. I believe SELinux provides the best security separation currently available for containers.

Another option would talked about was relabeling the content in the /var/log directories

docker run -d -v /var/log:/var/log:Z fluentd

The problem with this is that all of the files under /var/log would not be labeled with a container
specific label (svirt_sandbox_file_t) Other parts of the host system like Logrotate, and log scanners would now be blocked from access the log files.

The best option we came up with was to generate a new type to run the container with.

We need to write a little bit of writing policy to make this happen. Here is what I came up with:

cat container_logger.te
policy_module(container_logger, 1.0)

virt_sandbox_domain_template(container_logger)
##############################
# virt_sandbox_net_domain(container_logger_t)
gen_require(`
 attribute   sandbox_net_domain;
')

typeattribute container_logger_t sandbox_net_domain;
##############################
logging_manage_all_logs(container_logger_t)

Compile and install the policy

make -f /usr/selinux/devel/Makefile container_login.pp
semodule -i container_login.pp

Run the container with the new policy.

docker run -d -v /var/log:/var/log --security-opt label:type:container_logger_t -n logger fluentd

Exec into the container to make sure you can read/write the log files.

docker exec -ti logger cat /var/log/messages
docker exec -ti logger touch /var/log/foobar
docker exec -ti logger rm /var/log/foobar

Everything works!

Lets take a closer look at the policy

policy_module(container_logger, 1.0)

policy_module names the policy and also brings in all standard definitions of policy. All policy type
enforcement files start with this.

virt_sandbox_domain_template(container_logger)

virt_sandbox_domain_template is a template macro that actually creates the container_logger_t type, and
sets up all of the policy so that the docker process (docker_t) can transition to it. It also defines
rules that allow it to manage svirt_sandbox_file_t files and sets it up to be MCS Separated. Meaning it
will only be able to use its content and no other containers content, whether or not the container is running
as the default type svirt_lxc_net_t or a custom type.

##############################
# virt_sandbox_net_domain(container_logger_t)
gen_require(`
    attribute sandbox_net_domain;
')

typeattribute container_logger_t sandbox_net_domain;
##############################

This section will eventually be an interface virt_sandbox_net_domain. (I sent a patch to upstream
selinux-policy package to add this interface) This new interface just assigns an attribute to container_logger_t. Attributes bring in lots of policy rules, basically this attribute gives full network access to the container_logger_t processes. If your container did not need access to the network or you wanted to tighten the network ports that container_logger_t would be able to listen on or connect to, you would not use this interface.

logging_manage_all_logs(container_logger_t)

This last interface logging_manage_all_logs gives container_logger_t the ability to manage all of the log
file types. SELinux interfaces are defined and shipped under /usr/share/selinux/devel.

Conclusion

Adding a fairly simple policy module allows us to run the container as securely as possible and still able to get the job done.

setting up local docker registry (getting started instructions on projectatomic.io)

following the 'getting started instructions' at http://www.projectatomic.io/docs/gettingstarted/, i noticed an error when pulling the docker registry image.

e94834ac9522: Error pulling image (latest) from docker.io/registry, ApplyLayer exit status 1 stdout:  stderr: unexpected EOF 
FATA[0038] Error pulling image (latest) from docker.io/registry, ApplyLayer exit status 1 stdout:  stderr: unexpected EOF 
[centos@megacore01 ~]$ sudo docker create -p 5000:5000 -v /var/lib/local-registry:/srv/registry -e STANDALONE=false -e MIRROR_SOURCE=https://registry-1.docker.io -e MIRROR_SOURCE_INDEX=https://index.docker.io -e STORAGE_PATH=/srv/registry --name=local-registry registry
Unable to find image 'registry:latest' locally
latest: Pulling from docker.io/registry

i just think the notes need to be updated for the following instructions: https://docs.docker.com/registry/deploying/. this will deploy the docker registry 2.0 server, from what i understand. when i create that container, it seems the issue clears up. thus, i ended up using registry 2.0. (i hope that's correct?)

[centos@megacore01 ~]$ sudo docker create -p 5000:5000 \
> -v /var/lib/local-registry:/srv/registry \
> -e STANDALONE=false \
> -e MIRROR_SOURCE=https://registry-1.docker.io \
> -e MIRROR_SOURCE_INDEX=https://index.docker.io \
> -e STORAGE_PATH=/srv/registry \
> --name=local-registry registry:2
Unable to find image 'registry:2' locally
2: Pulling from docker.io/registry
d3a1f33e8a5a: Pull complete 
c22013c84729: Pull complete 
d74508fb6632: Pull complete 
91e54dfb1179: Pull complete 
5c3e6bcaa8b0: Pull complete 
a5b8dc690ce7: Pull complete 
e4aee72fc6c3: Pull complete 
76b7062ceb9a: Pull complete 
6228a99f9630: Pull complete 
e024fb496e6b: Pull complete 
1e847b14150e: Already exists 
docker.io/registry:2: The image you are pulling has been verified. Important: image verification is a tech preview feature and should not be relied on to provide security.
Digest: sha256:0631faa22077d0494e9ac3d7e90bac3eeb6fd9f579cbf7b87dab1cda85c86f73
Status: Downloaded newer image for docker.io/registry:2
eebd0f819ac4352913e94a654d31a9bc2ea6575e2f7a17632a419430b2697274
[centos@megacore01 ~]$

Service seems to be running fine...

[centos@megacore01 ~]$ sudo systemctl status local-registry
local-registry.service - Local Docker Mirror registry cache
   Loaded: loaded (/etc/systemd/system/local-registry.service; enabled)
   Active: active (running) since Mon 2015-08-31 18:59:24 UTC; 18s ago
 Main PID: 2751 (docker)
   CGroup: /system.slice/local-registry.service
           └─2751 /usr/bin/docker start -a local-registry

Aug 31 18:59:24 megacore01.jinkit.com.novalocal systemd[1]: Started Local Docker Mirror registry cache.
Aug 31 18:59:26 megacore01.jinkit.com.novalocal docker[2751]: time="2015-08-31T18:59:26Z" level=warning msg="No HTTP secret provided - generated random secret. This may cause problems with upl...
Aug 31 18:59:26 megacore01.jinkit.com.novalocal docker[2751]: time="2015-08-31T18:59:26Z" level=info msg="redis not configured" instance.id=ad99935f-6dcd-48b7-8a93-f962deba2318 version=v2.1.1
Aug 31 18:59:26 megacore01.jinkit.com.novalocal docker[2751]: time="2015-08-31T18:59:26Z" level=info msg="using inmemory blob descriptor cache" instance.id=ad99935f-6dcd-48b7-8a93-f...sion=v2.1.1
Aug 31 18:59:26 megacore01.jinkit.com.novalocal docker[2751]: time="2015-08-31T18:59:26Z" level=info msg="listening on [::]:5000" instance.id=ad99935f-6dcd-48b7-8a93-f962deba2318 version=v2.1.1
Aug 31 18:59:26 megacore01.jinkit.com.novalocal docker[2751]: time="2015-08-31T18:59:26Z" level=info msg="Starting upload purge in 16m0s" instance.id=ad99935f-6dcd-48b7-8a93-f962deb...sion=v2.1.1
Hint: Some lines were ellipsized, use -l to show in full.

BLOG: What does --selinux-enabled do?

"What does the --selinux-enabled flag in docker do?"

I recently an email asking about --selinux-enabled in the docker daemon, I thought others might wonder about this so I wrote this blog.

I'm currently researching the topic of `--selinux-enabed` in docker and what it is doing when set to TRUE.

From what I'm seeing, it simply will set context and labels to the services (docker daemon) when SELinux is enabled on the system and not using OverlayFS.

But I'm wondering if that is even correct, and if so, what else is happening when setting `--selinux-enabled` to TRUE. 

--selinux-enabled on the docker daemoin causes it to set SELinux labels on the containers. Docker reads the contexts file /etc/selinux/targeted/contexts/lxc_contexts for the default context to run containers.

cat /etc/selinux/targeted/contexts/lxc_contexts 
process = "system_u:system_r:svirt_lxc_net_t:s0"
content = "system_u:object_r:virt_var_lib_t:s0"
file = "system_u:object_r:svirt_sandbox_file_t:s0"
sandbox_kvm_process = "system_u:system_r:svirt_qemu_net_t:s0"
sandbox_kvm_process = "system_u:system_r:svirt_qemu_net_t:s0"
sandbox_lxc_process = "system_u:system_r:svirt_lxc_net_t:s0"

Docker by defaults a confined SELinux type svirt_lxc_net_t to isolate the container processes from the host, and it generates a unique MCS label to allow SELinux to prevent one container process from attacking other container processes and content.

If you don't specify --selinux-enabled. Docker does not execute SELinux code to set labels. When docker launches a container process, the system falls back to default transtion policy. This means the container processes will either run as docker_t or spc_t (Depending on the version of policy you have installed.) Both of these types are unconfined. SELinux will provide no security separation for these container processes.

In addition, I'm also wondering what the impact will be, when `--selinux-enabed` is set to TRUE together with `--icc` to TRUE? Does it have any impact or is it unrelated.

 Could it be possible that with `--icc` and `--selinux-enabed` set to TRUE that we might have challenges when communicating between Containers when the connection is not specified (as it should be, when `--icc` is set to FALSE).

Unrelated --icc FALSE sets up iptables rules that prevent containers from connecting to each other over the local network to each other. The default SELinux policy allows all network connectivity between containers.

Also, does anybody know what will happen if I run `docker` with `--selinux-enabled` for a while (so start containers, pull images, etc.) and then restart the daemon with `--selinux-enabled`. Is it possible that it does impact the environment somehow and that certain activities need to be done or shouldn't this affect the service at all?

The only potential problem I can think of would be volume mounting being mislabeled. If you volume mount content off of the host with -v /source/path:/dest/path:Z and SELinux is disabled, docker will not set up labels for the container. If you later turn on --selinux-enabled these containers would not be able to read/write the content in the volume mounted in. Thinking about this further, I am not sure what labels the containers created while --selinux-disabled was set. These precreated containers would probably continue to run with an unconfined domain, and newer containers would run with confinement.

design root page for new docs.projectatomic.io

@tigert I think this is for you.

I'd like to design a root page for the new docs.projectatomic.io. My thought was simply a switchboard with a grid of project icons (with names and a 10-word description), which, when you click on them, brings you to the appropriate set of docs. But I am completely willing to be swayed by someone with actual web design skills to do it another way.

NEW BLog: Why is Red Hat shipping docker-1.6 versus waiting for docker-1.6.1

We attempt to ship new versions of ProjectAtomic every 6 weeks. I am in charge of the docker portion of each release. I also lead the team developing the atomic command. Just because Docker releases a new version does not mean this instantly gets into the RHEL release. We
would like to allow our QE team time for testing, and to make sure it is "Enterprise Ready". Towards the end of this 6 week period Docker released an updated Docker 1.6.1 package with a series of
CVEs.

Red Hat's Security Response Team analysed these CVEs and found them to be not something worth waiting for. Of course---should there be any issues of actual importance, Red Hat will do everything it can to have a timely (but tested) asynchronous release.

Trevor Jay from Red Hat Security states

Technically speaking, these don't cross any trust boundaries. Docker images are root-run software. They can drop or restrict permissions and capabilities so that you're protected should they become compromised just like any other software that starts with elevated privileges, but you are inherently trusting the image itself to be well-written (to take advantage of the safeties we provide) and non-malicious.

This is all about trusting the application you install on your system. Sometimes I worry people have the opinion that any piece of software I install, as long as it is in a container I am safe. I believe Docker is playing whack a mole with these vulnerabilities and preventing this is going to be near impossible.

Trevor continues.

Container safety is about restricting what can happen when your application get owned, not about randomly running potential malware.

Hope I don't sound like a broken record, since I have covered this before.

Docker should not be about running

Random Crap from the internet, as root and expecting not to be hacked.

Nulecule deployment is not complete

Guide mentions that I should

configure answers.conf by copying Nulecule/answers.conf.sample to answers.conf

but there is no answers.conf.sample:

$ ll Nulecule/answers.conf.sample
ls: cannot access 'Nulecule/answers.conf.sample': No such file or directory
$ pwd
/home/tt/g/atomic-site

create_post.rb doesn't work

$ ./create-post.rb
/usr/share/rubygems/rubygems/core_ext/kernel_require.rb:54:in `require': cannot load such file -- launchy (LoadError)
        from /usr/share/rubygems/rubygems/core_ext/kernel_require.rb:54:in `require'
        from ./create-post.rb:4:in `<main>'

$ gem install launchy
Fetching: launchy-2.4.3.gem (100%)
Successfully installed launchy-2.4.3
Parsing documentation for launchy-2.4.3
Installing ri documentation for launchy-2.4.3
Done installing documentation for launchy after 1 seconds
1 gem installed

$ ./create-post.rb
/usr/share/rubygems/rubygems/core_ext/kernel_require.rb:54:in `require': cannot load such file -- chronic (LoadError)
        from /usr/share/rubygems/rubygems/core_ext/kernel_require.rb:54:in `require'
        from ./create-post.rb:6:in `<main>'

$ gem install chronic
Fetching: chronic-0.10.2.gem (100%)
Successfully installed chronic-0.10.2
Parsing documentation for chronic-0.10.2
Installing ri documentation for chronic-0.10.2
Done installing documentation for chronic after 1 seconds
1 gem installed

$ ./create-post.rb
/usr/share/rubygems/rubygems/core_ext/kernel_require.rb:54:in `require': cannot load such file -- slop (LoadError)
        from /usr/share/rubygems/rubygems/core_ext/kernel_require.rb:54:in `require'
        from ./create-post.rb:7:in `<main>'

$ gem install slop
Fetching: slop-4.2.0.gem (100%)
Successfully installed slop-4.2.0
Parsing documentation for slop-4.2.0
Installing ri documentation for slop-4.2.0
Done installing documentation for slop after 0 seconds
1 gem installed

$ ./create-post.rb
./create-post.rb:15:in `<main>': undefined method `parse!' for Slop:Module (NoMethodError)

BLOG: How to run a more secure non root user container.

I was asked a question about running users inside of a docker container: could they still get privileges?

For more background on Linux capabilities, see: http://linux.die.net/man/7/capabilities

We'll start with a simple container where the primary process is running as root. One can look at the capabilities of the current process via grep Cap /proc/self/status. There is also a capsh utility.

# docker run --rm -ti fedora grep Cap /proc/self/status
CapInh: 00000000a80425fb
CapPrm: 00000000a80425fb
CapEff: 00000000a80425fb
CapBnd: 00000000a80425fb
CapAmb: 0000000000000000

Notice that the Effective Capabilities (CapEff) is a non-zero value, which means that the process has capabilities.

Using the pscap tool, I see that the process has these capabilities.

chown, dac_override, fowner, fsetid, kill, setgid, setuid, setpcap, net_bind_service, net_raw, sys_chroot, mknod, audit_write, setfcap

Now let's run a container as non root using the -u option.

docker run -u 3267 fedora grep Cap /proc/self/status
CapInh: 00000000a80425fb
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 00000000a80425fb
CapAmb: 0000000000000000

Notice that the Effective capabilities (CapEff) is all zero, but the bounding set of capabilities (CapBnd) is not. This means that if there is a setuid binary included in the image, it would be possible to gain these capabilities. Notice also, not surprisingly, this number matches the previous container.

So even though this process is running as non root inside the container, it could potentially run with the same capabilities as above if the image builder included a setuid binary.

chown, dac_override, fowner, fsetid, kill, setgid, setuid, setpcap, net_bind_service, net_raw, sys_chroot, mknod, audit_write, setfcap

Docker has a nice feature where you can drop all capabilities via --cap-drop=all. Now, if we execute the same container with a non privileged user and drop all capabilities:

# docker run --rm -ti --cap-drop=all -u 3267 fedora grep Cap /proc/self/status
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 0000000000000000
CapAmb: 0000000000000000

Now this user cannot gain any capabilities on the system. I would advise almost all users of Docker to that run their containers with non privileged users to use this feature. This adds a lot of security to the system.

fatal: Not a git repository while building site through Docker build

Hi,

I am getting following errors while building Docker build

➜  atomic-site git:(build_fix) ✗ ./docker.sh 
Sending build context to Docker daemon 27.43 MB
Step 1 : FROM fedora:23
 ---> 3944b65d6ed6
Step 2 : MAINTAINER [email protected]
 ---> Using cache
 ---> 6c8d31f469ea
Step 3 : WORKDIR /tmp
 ---> Using cache
 ---> bad340bbe395
Step 4 : RUN dnf install -y tar libcurl-devel zlib-devel patch rubygem-bundler ruby-devel git make gcc gcc-c++ redhat-rpm-config && dnf clean all
 ---> Using cache
 ---> 2e22b2e68693
Step 5 : ADD config.rb /tmp/config.rb
 ---> Using cache
 ---> 5bb930c2556c
Step 6 : ADD Gemfile /tmp/Gemfile
 ---> Using cache
 ---> cd8f7277ee0e
Step 7 : ADD Gemfile.lock /tmp/Gemfile.lock
 ---> Using cache
 ---> e87dbf26294e
Step 8 : ADD lib /tmp/lib
 ---> Using cache
 ---> 79401607dae4
Step 9 : RUN bundle install
 ---> Using cache
 ---> 26b5551e2e95
Step 10 : EXPOSE 4567
 ---> Using cache
 ---> f8152326bd01
Step 11 : ENTRYPOINT bundle exec
 ---> Using cache
 ---> f32491e251fa
Step 12 : CMD middleman server
 ---> Using cache
 ---> 9f6418bb1840
Successfully built 9f6418bb1840
== The Middleman is loading

Updating git submodules...
fatal: Not a git repository (or any of the parent directories): .git

fatal: Not a git repository (or any of the parent directories): .git

Other observations:

  • Opening 127.0.0.1:4567 in browsing running the site
  • No success message or info that site is running now etc etc

Blog: Sharing volumes in Docker with SELinux.

Using volumes with docker can cause problems with SELinux.

When using SELinux for controlling processes within a container, you need to make sure any content that gets volume mounted into the container is readable and potentially
writable depending on the use case.

By default docker container processes run with the system_u:system_r:svirt_lxc_net_t:s0 label. svirt_lxc_net_t type is allowed to read/execute most content under /usr, but it is
not allowed to use most other types on the system. If you want to volume mount content under /var for example into a container, you need to set the labels on this content. In the docker run man page we mention this.

man docker-run
...
       When  using  SELinux,  be  aware that the host has no knowledge of container SELinux policy.
       Therefore, in the above example, if SELinux policy  is  enforced,  the  /var/db  directory is not
       writable to the container. A "Permission Denied" message will occur and an avc: message in
       the host's syslog.

       To  work  around  this, at time of writing this man page, the followingcommand needs to be run
       in order for the  proper  SELinux  policy  type label to be attached to the host directory:

              # chcon -Rt svirt_sandbox_file_t /var/db

This got easier recently since Docker finally merged a patch which will be showing up in docker-1.7 (We have been carrying the patch in docker-1.6 on RHEL, Centos and Fedora).
This patch adds support for "z" and "Z" as options on the volume mounts (-v).

For example

  docker run -v /var/db:/var/db:z rhel7 /bin/sh

Will automatically do the chcon -Rt svirt_sandbox_file_t /var/db described in the man page.

Even better,

  docker run -v /var/db:/var/db:Z rhel7 /bin/sh

will label the content inside the container with the exact MCS label that the container will run with, basically it runs

chcon -Rt svirt_sandbox_file_t -l s0:c1,c2 /var/db Where s0:c1,c2 differs for each container.

I have a bugzilla that was reported to me, that might become common.

https://bugzilla.redhat.com/show_bug.cgi?id=1230098

The user got AVC's that looked like the following.

Raw Audit Messages

type=AVC msg=audit(1433926625.524:1347): avc: denied { write } for pid=29280 comm="launch" name="addons" dev="dm-2" ino=2491404 scontext=system_u:system_r:svirt_lxc_net_t:s0:c147,c266 tcontext=system_u:object_r:svirt_sandbox_file_t:s0:c372,c410 tclass=dir permissive=0

Notice the bolded MCS labels.

The problem here was the user created a volume and labeled it from one container with the "Z" for one container, then
attempted to share it in another container. Which SELinux denied since the MCS labels differed. As I described in the bugzilla.

I told the reporter that he need to mount it using Z or z, since at some point the volume got a different containers labels on it.
You container processes are running with this label, system_u:system_r:svirt_lxc_net_t:s0:c147,c266 Notice the s0:c147,c266
The volume mount is labeled system_u:object_r:svirt_sandbox_file_t:s0:c372,c410 Notice the MCS label s0:c372,c410 MCS
Security prevents read/write of content with different MCS labels. However it will low read/write of content labeled with an s0 label.

If you volume mount a image with -v /SOURCE:/DESTINATION:z docker will automatically relabel the content for you to s0. If
you volume mount with a "Z", then the label will be specific to the container, and not be able to be shared between containers.

No MAINTAINERS File

Please add a maintainers file with who can commit and who can rebuild the site. This will be useful for people to be able to ping the right people with issues.

Docker drop in service for flannel setup incorrect

The documentation on http://www.projectatomic.io/docs/gettingstarted/ mentioned to add a "drop-in" for systemd to override docker.. However the example file is incorrect and yields the following error when startup up docker.

Aug 07 23:05:45 atomic01.kaos.realm systemd[1]: docker.service has more than one ExecStart= setting, which is only allowed for Type=oneshot services. Refusing.

There was an open bug up on the docker github that references how to fix this issue.

moby/moby#14491

The solution is simply to ADD an

ExecStart=

before the full ExecStart= line. I've tested this locally and it does indeed work as expected.

Deploy without ImageMagick?

@tigert, @garrett

I was building an atomicapp for doing deploys. However, the deploy builds a couple icons, and requires ImageMagick, which in Fedora pulls in all of X11. Is there a way around this?

Social Media Buttons?

We should probably try to add social media buttons so people can easily tweet, share on Facebook, etc.

Adding tigert to the ticket for discussion. IIRC there are some ways to do this, but some of them involve some objectionable JS.

Blog: It has come to my attention recently that zfs has been added a new backend for docker.

I received a bug report that it was not working with SELinux. Turns out there was a simple fix which should get into docker-1.7 or docker-1.8 to fully support labeling within a docker container using the ZFS back end.

Sadly ZFS can not be merged into the Linux kernel because of Licensing issues. But if you are willing to use loadable kernel modules you can now run your containers on ZFS with SELinux enforcement.

New Blog submission.

Atomic command standard environment substitution.

I recently published an article on the new Atomic command.

One of the questions I got was about command substitution. The user of the tool wanted to have standard bash substitutions to work. Specifically he wanted to allow substitutions or $PWD/.foobar and --user=$(id -u).

I decided to try this out by creating a simple Dockerfile.

from rhel7
LABEL RUN echo $PWD/.foobar --user=$(id -u)

Build the container

docker build -t test .

Execute atomic run test

atomic run test
echo 

Looking at the label using docker inspect, I see that building the container dropped the $() content.

Changing to quote it.

from rhel7
LABEL RUN echo '$PWD/.foobar' '--user=$(id -u)'

Build the container

docker build -t test .

Execute atomic run test

atomic run test
echo $PWD/.foobar --user=$(id -u)
/root/test1/.foobar --user=0

Woo Hoo it works.

"Running the atomic.app" steps not working

Hi,

I was going through building the site and found the "Running the atomic.app" steps are not working for me.

I did all necessary steps mentioned in First time setup.
I am getting some kind of info not error.

# answers.conf
➜  atomic-site git:(master) ✗ cat Nulecule/answers.conf
[atomic-site]
datadir = /home/budhram/redhat/atomic-site/data
hostport = 4567
sourcedir = /home/budhram/redhat/atomic-site/source 
image = jberkus/atomic-site
provider = kubernetes
[general]
namespace = default
provider = kubernetes

➜  atomic-site git:(master) ✗ sudo atomicapp run Nulecule 
[INFO] - main.py - Action/Mode Selected is: run
[INFO] - kubernetes.py - Using namespace default
[INFO] - kubernetes.py - trying kubectl at /usr/bin/kubectl
[INFO] - kubernetes.py - found kubectl at /usr/bin/kubectl
[INFO] - kubernetes.py - Deploying to Kubernetes

Your application resides in Nulecule
Please use this directory for managing your application

Anything wrong in my setup?

Adjust blockquote CSS

... currently it doesn't indent enough, so that blockquotes look like just a font change.

Blog: # Cool new feature merged into docker --tmpfs support

Developer Mode vs Production Mode

One of the reasons Docker has taken off is because it made it easier for developers to ship and update their software.
They streamlined the development process and allowed developers to choose the runtime environments that their applications
are going to run. The runtime/userspace that the developer chooses gets tested by the QE Team and the exact same runtime gets executed in production.

Part of the development process is usually built around updating the application and the userspace. The developer does things like yum or dnf install, and then copies in code that is particular to his application. But once the developer is done, he usually expects the QE and production to treat this content as read only. I believe that the image of a container should be put into production in a readonly mode, which would prevent the application or processes within the container from writing to the container, it would only allow these processes to write to volumes mounted into the container.

Security

From a security point of view, this is great. Image you are running an application that gets hacked. The first thing the hacker wants to do is to write his hack into the application, so that the next time the application starts up, it starts up with the hackers code. If the image was read/only the hacker would be prevented from leaving his back door. (He would have to break in again). Docker added a feature to handle this via docker run -d --read-only image ... a while ago. But it is difficult to use, since a lot of applications need to write to temporary directories like /run or /tmp, since these are
readonly the apps fail. You could setup temporary locations on your host to volume mount into the container, but this ended up exposing temporary data to the host.

I wanted to be able to mount a tmpfs into the container. I have been working on a fix for this since last May and it was finally merged in on 12/2/15. moby/moby#13587

It will show up in docker-1.10.

With this patch you can setup tmpfs as /run and /tmp.

docker run -d --read-only --tmpfs /run --tmpfs /tmp IMAGE

I would actually recommend applications in production run with this mode.

You might want to continue to mount in volumes from the host for permanent data.

Other cool stuff.

One cool thing about this patch is that it tars up the contents of the underlying directory in the image on top of the
tmpfs. So if you have /run/httpd directory in the container image, you will have /run/httpd in containers tmpfs.

You can also do some other stuff with this patch like setting up a temporary /var or /etc inside of your container.

If you execute

docker run -ti --tmpfs /etc fedora /bin/sh

It will mount a tmpfs on /etc, and tar up the content of the underlying /etc onto the new /etc tmpfs. This means you could
make changes to the /etc but it will not survive a container stop. The container will be fresh every time you start and stop the container.

You can also pass tmpfs mount options on the command line

docker run -ti --tmpfs /etc:rw,noexec,nosuid,size=2g container

Docker will pass down these mount options.

Call to action

It would be nice in the future if developers told people to run in production with the --read-only docker run/create flags. Or better yet setup atomic labels to do this by default. Then we can separate the way a container application runs in development from the way it runs in production.

No privacy policy is linked on the website

Since we are starting to get more and more infrastructure, I just noticed that we do not have any privacy policy explaining what we are doing with the information and everything.

Few things broken with docs.projectatomic.io as well as registry documentation

Clicking quick start on http://www.projectatomic.io/registry/ brings me http://docs.projectatomic.io/registry/latest/registry_quickstart/index.html to which makes the first three links on the sidebar not work (get started, documentation and developer community)

Under http://docs.projectatomic.io/registry/latest/install_config/syncing_groups_with_ldap.html where it says: "LDAP Client Configuration" the formatting is very hard to read (purple key names, light red key values) There are a few other places where this is happening too.

Blog Post for Docker Brno Meetup - 15 June 2016

Draft is here - and needs review please:

PR will come tomorrow (Friday 17 June). I will be traveling and out of pocket from 18 June so I need to get this to a point where it can be merged easily.

http://etherpad.osuosl.org/4tlMydVGzp

Open Issues:

1 - I need a place to post some slides from one of the speakers. Can I put them into this repo?

2 - I need to put some pictures into this post. @eliskasl is going to provide some when she gets them off of her camera.

@jberkus @jzb @bproffitt

BLOG: Why doesn't Red Hat Enterprise Linux, Fedora, Centos allow docker to be run by non root users?

Why don't Red Hat Enterprise Linux, Fedora, Centos want docker to be run directly by non root users?

I often get bug reports asking:

Why can't I use docker as a non root user, by default.

Docker has the ability to change the group ownership of the /run/docker.socket to have group permission of 660, with the group ownership the docker group. This would allow users added to the docker group to be able to run docker containers without having to execute sudo or su to become root. Sounds great...

ls -l /var/run/docker.sock 
srw-rw----. 1 root docker 0 Aug  3 13:02 /var/run/docker.sock

BUT

On RHEL, Fedora and Centos we prefer to have the docker.socket set like:

ls -l /var/run/docker.sock 
srw-rw----. 1 root root 0 Aug  3 13:02 /var/run/docker.sock

If a user can talk to the docker socket, they can execute the following command:

docker run -ti --privileged -v /:/host fedora chroot /host

Giving them full root access to the host system.

It is similar to giving them the following in sudo.

grep dwalsh /etc/sudoers
dwalsh  ALL=(ALL)   NOPASSWD: ALL

Which would allow them to run sudo sh and get the same access. But there is one big flaw with this.

Docker has no auditing or logging built in, while sudo does.

Docker currently records events but the events disappear when the docker daemon is restarted. Docker does not currently do any auditing.

From a security perspective, Red Hat has expressed concerns with enabling access to the docker daemon from non-root users, absent auditing and proper logging. We've implemented those controls in PR14446 and are awaiting merge. Short term, it is recommended to implement sudo rules to permit access to the docker daemon. Sudo will then provide logging and audit.

moby/moby#14446

This patch provides much needed system logging for docker's API functions. With this patch, when an API request is made, an entry will be added to the syslog.

Important events will contain the action requested, the container's ID, the dwalsh and login UID of the user issuing the request, the process ID, and any initialized configuration settings. In an effort to reduce the size of the log message, uninitialized configuration parameters are not logged.

We have a another patch for auditing that we are working, but the patch is based on the logging patch. Once the logging patch gets accepted, we will submit the audit patch.

Setting up sudo

If you want to give docker access to non root users we recommend setting up sudo. Here is a short guide on how to do this.

Add an entry like the following to /etc/sudoers.

grep dwalsh /etc/sudoers
dwalsh        ALL=(ALL)       NOPASSWD: /usr/bin/docker

This will allow the specified user to run docker as root, without a password.

(NOTE: I do not recommend using NOPASSWD, this would allow any process on your system to become root. If you require the password, the user needs to specify his password when running the docker command, making the system a bit more secure. Sudo gives you a 5 minute grace period to run docker again without password)

Setup an alias for running the docker command

alias docker="sudo /usr/bin/docker"

Now when the user executes the docker command as non root it will be allowed and get proper logging.

docker run -ti --privileged -v /:/host fedora chroot /host

Look at the journal or /var/log/messages.

journalctl -b | grep docker.*privileged
Aug 04 09:02:56 dhcp-10-19-62-196.boston.devel.redhat.com sudo[23422]:   dwalsh : TTY=pts/3 ; PWD=/home/dwalsh/docker/src/github.com/docker/docker ; USER=root ; COMMAND=/usr/bin/docker run -ti --privileged -v /:/host fedora chroot /host

Look at audit log

ausearch -m USER_ROLE_CHANGE -i
type=USER_ROLE_CHANGE msg=audit(08/04/2015 09:02:56.514:1460) : pid=23423 uid=root auid=dwalsh ses=1 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 msg='newrole: old-context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
new-context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 exe=/usr/bin/sudo hostname=? addr=? terminal=/dev/pts/3 res=success'

Better Security

Better yet if you wanted to only allow a user to access a particular container, you could write a simple script:

cat /usr/bin/docker-fedora
#!/bin/sh
docker run -ti --rm fedora /bin/sh

Then configure sudoers to run it:

grep dwalsh /etc/sudoers
dwalsh        ALL=(ALL)       NOPASSWD: /usr/bin/docker-fedora

This user would only be able to run the fedora container, without privileges.

Authentication

We have other patches that we are working on to make the docker daemon more secure including authentication. Here is an issue where this is an ongoing discussion on it.

moby/moby#13697

Authorization

And we are developing a proposal to add Authorization/RBAC (Roles Based Access Control) to docker, to allow administrators to specify which users are allowed to do which activity on which containers/images.

https://github.com/rhatdan/docker-rbac

Conclusion

We believe the security of managing the docker daemon needs a lot of improvement, before we can think of opening up access to non privileged users directly. Until these fixes are made sudo is the best option.

BLOG: SELinux and Containers

SELinux and Containers.

I believe SELinux is the best security measure we currently have for controlling access between standard docker containers. Of course I might be biased. All of the security separation measures are nice and should be enabled for security in depth, but SELinux policy prevents a lot of break out situations where the other security mechanisms fail. With SELinux on Docker we write policy that says that the container process running as svirt_lxc_net_t can only read/write svirt_sandbox_file_t by default, (There are some booleans to allow it to write to network shared storage, if required like NFS.) This means that if a process from a docker container broke out of the container it would only be able to write to files/directories labeled svirt_sandbox_file_t. We take advantage of MCS separation to ensure that the processes running in the container can only write to svirt_sandbox_file_t files with the same MCS Label or "s0".

The problem with SELinux and Docker comes up when you are volume mounting content into a docker container.

It depends on the use case.

There are multiple ways to run containers with an SELinux enforced system when sharing content inside of the container.

The SELinux policy for svirt_lxc_net_t also allows the processes to read/execute most of the labels under /usr. This means if you wanted to volume mount an executable from /usr into a container, SELinux would probably allow the processes to execute the commands. If you want to share the same directory with multiple containers such that the containers can read/execute the content you could label the content as usr_t.

docker run -v /opt:/opt rhel7 ...

If you are sharing parts of the Host OS that can not be relabeled, you can always disable SELinux separation in the container.

docker run --security-opt label:disable rhel7 ...

This means that you can continue to run your system with SELinux enforcing and even run most of your containers locked down, but for this one particular container, it will run an unconfined type. (We use the spc_t type for this.) I wrote a blog on Super Privileged Containers (SPC) a while back, these are containers that you don't want to isolate from the system.
http://developerblog.redhat.com/2014/11/06/introducing-a-super-privileged-container-concept/

Docker has the ability to automatically relabel content on the disk when doing volume mounts, by appending a :z or :Z on to the mount point.

If you want to take a random directory from say /var/foobar and share it with multiple containers such that the containers can write to these volumes, you just need to add the :z to the end of the volume mount.

docker run -v /var/foobar:/var/foobar:z rhel7 ...

If you want to take a random directory from say /var/foobar and have it private to the container such that only that container can write to the volume, you just need to add the :Z to the end of the volume mount.

docker run -v /var/foobar:/var/foobar:Z rhel7 ...

You should be careful when doing this, since the tool will basically the following:

chcon -R -t svirt_sandbox_file_t /SOURCEDIR

Which could cause breakage on your system. Make sure the directory is only to be used with containers. The following is probably a bad idea.

docker run -v /var:/var:Z rhel7

Using these few easy commands makes SELinux and containers work well together and give you the best security separation possible for default containers.

Blog on ping and file capabilities.

Problem Statement

We got a bug report the other day with the following comment:

On a RHEL 7 host (registered and subscribed), I can use Yum to install additional packages from 'docker run ...' or in a Docker file. If I install the 'iputils' package (and any dependencies), the basic 'ping' command does not work. However, if I use the 'busybox' image from the public Docker index, it's ping command works perfectly fine (on same RHEL 7 host).

Steps to Reproduce:

# docker run -i -t registry.access.redhat.com/rhel7:0-21 bash
# yum install -y iputils
# ping 127.0.0.1
bash: /usr/bin/ping: Operation not permitted
# docker run -ti --rm busybox /bin/sh
# editor
# ping google.com
PING google.com (74.125.226.4): 56 data bytes
64 bytes from 74.125.226.4: seq=0 ttl=52 time=44.923 ms
64 bytes from 74.125.226.4: seq=1 ttl=52 time=46.181 ms
^C

This was not an SELinux issue, since it happened if you put the machine in permissive mode.

My first idea when I saw this was that ping needs a set of capabilities
enabled: net_raw and net_admin. By default our containers run without the
net_admin capability.

What if I added net_admin capability would the container work.

# docker run -i -t --cap-add net_raw --cap-add net_admin registry.access.redhat.com/rhel7:0-21 bash
# ping google.com
PING google.com (74.125.228.3) 56(84) bytes of data.
64 bytes from iad23s05-in-f3.1e100.net (74.125.228.3): icmp_seq=1 ttl=47 time=11.0 ms
64 bytes from iad23s05-in-f3.1e100.net (74.125.228.3): icmp_seq=2 ttl=47 time=11.1 ms
^C

But I really do not want to have to run a privileged container, just so ping
will work. But then we got more information on the bugzilla, where someone
just copied the executable to another name and it worked.

Then another engineer tried to just copy the ping command to another name and it worked!!!

# docker run -i -t registry.access.redhat.com/rhel7:0-21 bash
# yum install -y iputils
# ping 127.0.0.1
bash: /usr/bin/ping: Operation not permitted
# mkdir -p /opt/ping
# cp /usr/bin/ping /opt/ping/
# /opt/ping/ping -c1 10.3.1.1
PING 10.3.1.1 (10.3.1.1) 56(84) bytes of data.
64 bytes from 10.3.1.1: icmp_seq=1 ttl=62 time=0.358 ms
...

This told us their must be something about the permissions on the ping command itself.

We eventually figured out the problem was caused by using with File Capabilities.

In Red Hat based distributions we are using File Capabilties which allow
applications to start up with a limited number of capabilities, even if launched
by root. In other distributions like busybox and Ubuntu, the ping command is
shipped as setuid root. If we executed chcon 4755 /usr/bin/ping in the container the container, ping would start to work.

# getcap  /usr/bin/ping
/usr/bin/ping = cap_net_admin,cap_net_raw+ep
man capabilities
...
   File capabilities
       Since kernel 2.6.24, the kernel supports associating capability sets with an executable file using setcap(8).  The file capability sets are stored in an extended attribute (see setxattr(2)) named security.capability.  Writing  to  this  extended  attribute  requires  the CAP_SETFCAP capability.  The file capability sets, in conjunction with the capability sets of the thread, determine the capabilities of a thread after an execve(2).

       The three file capability sets are:

       Permitted (formerly known as forced):
              These capabilities are automatically permitted to the thread, regardless of the thread's inheritable capabilities.

       Inheritable (formerly known as allowed):
              This set is ANDed with the thread's inheritable set to determine which inheritable capabilities are enabled in the permitted set of the thread after the execve(2).

       Effective:
              This is not a set, but rather just a single bit.  If this bit is set, then during an execve(2) all of the new permitted capabilities for the thread are also raised in the effective set.  If this bit is not set, then after an execve(2),  none  of  the  new permitted capabilities is in the new effective set.

By setting the file capabilities on ping to cap_net_admin,cap_net_raw+ep, this means when a user executes the ping command the kernel will attempt to gain cap_net_admin and cap_net_raw capabilities when it gets executed. If both capabilities are not available to the user, the execution will fail.

It's the execve() of /usr/bin/ping in the first place that is failing:

# strace ping -h
execve("/usr/bin/ping", ["ping", "-h"], [/* 12 vars */]) = -1 EPERM (Operation not permitted)

If however we removed the Effective bit, the application will execute and would only fail if it tried to execute a system call that requires the cap_net_raw capability, due to it attempting to raise capabilities that are not available.

# setcap cap_net_raw,cap_net_admin+p /usr/bin/ping
# ping -c1 10.3.1.1
PING 10.3.1.1 (10.3.1.1) 56(84) bytes of data.
64 bytes from 10.3.1.1: icmp_seq=1 ttl=62 time=0.358 ms
...

+e should be removed from all binaries which call capset().

There is no need for the kernel to automatically add those capabilities on execve(). The ping application is doing that step itself.

We have opened up Bugzilla on iputils to fix this in the package for rhel and Fedora. So if your application is blowing up for
strange reasons within a container, you might want to check out its file capabilities.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.