[Linux] LXC container: from privileged to unprivileged

In a previous article, I showed how to preserve the integrity of your Linux machine by installing unfriendly software in a LXC container.

The container was a "privileged" container, meaning that the user ids in the container maps to the user ids of the host. This can easily be confirmed by running ps on the host: the root user of the container is the root user of the host.

In this article, we'll see how to create an unprivileged container. I'll assume that you read my previous article. In particular, I'll use jeedom1 as the name of the container, don't forget to replace with the name of your container.

What is an unprivileged LXC container?

I think the best definition is found in this answer on StackOverflow:

Unprivileged LXC containers are the ones making use of user namespaces (userns). I.e. of a kernel feature that allows to map a range of UIDs on the host into a namespace inside of which a user with UID 0 can exist again.

Contrary to my initial perception of unprivileged LXC containers for a while, this does not mean that the container has to be owned by an unprivileged host user. That is only one possibility.

Owned by user or by root?

So unprivileged containers can be owned by a regular user or by root. If you're like me, you probably want to take the most secured way: make a container owned by a user.

Well... I tried this way, but it's not as good as you expect:

  1. There a lot more configuration
  2. Not all templates are compatible
  3. The autostart feature is not available

After one or two hours trying this way, I decided to take the other road, and that was much better.

Subordinate ids

So the container is own by root, but the users of the containers wont match the users of the host. In other words, a process ran with user id 0, from the point of view of the container, will actually be executed with a different user id in the host. And the same will also happen for other users of the container.

On the host machine, this new id range will not match with actual users. Instead, they'll rely on subordinate user ids and group id, in short subuid and subgid.

Each user of the host can have one or more ranges of subordinate ids. They are defined in /etc/subuid and /etc/subgid.

Add subordinate ids to root

So to allow root to run an unpriviliged container, we first need to add a subordinate id range.

Edit /etc/subuid and add the following line:

root:1000000:65536

Do the same with /etc/subgid

This will allow root to used 65536 new user and group ids, from 1000000 to 1065536.

As far as I know, it's not possible to add comments in these files.

Create the LXC container

Here, nothing changed compared to the privileged container:

sudo lxc-create -n jeedom1 -- -d ubuntu -r vivid -a amd64

Edit container's configuration

We now arrive to the critical part of this tutorial: we need to tell LXC that we want to map the users of the container to the subordinate ids we've just defined.

Edit /var/lib/lxc/jeedom1/config:

# Map user and group ids
lxc.include = /usr/share/lxc/config/ubuntu.userns.conf
lxc.id_map = u 0 1000000 65536
lxc.id_map = g 0 1000000 65536

As you can see we added two lxc.id_map instructions to map the user ids and the group ids.

We also added an lxc.include instruction to enable user namespaces of the container template. If you use another template, you have to use another file, like centos.userns.conf for instance.

Change the owner of rootfs/

If you try to start the container right now, it won't work because it won't be able to read its own filesystem (remember that the user id 0 of the container is mapped to the user if 1000000 of the host).

So we need to change the owner of the file system so that it matches the root user of the container.

sudo chown -R 1000000:1000000 /var/lib/lxc/jeedom1/rootfs

We also make sure that this user can have access to this folder:

sudo chmod 755 /var/lib/lxc
sudo chmod 755 /var/lib/lxc/jeedom1
sudo chmod 640 /var/lib/lxc/jeedom1/config
sudo chmod 750 /var/lib/lxc/jeedom1/rootfs

Change the owner of devices

In my previous article, I showed how to use /dev/ttyACM0 from a privileged container. But our new container is not allowed to access to that device, we need to change the permission of the device:

Create the file /etc/udev/rules.d/99-zwave.rules:

# 0658:0200 Sigma Designs, Inc.
SUBSYSTEM=="tty", ATTRS{idVendor}=="0658", ATTRS{idProduct}=="0200", \
  SYMLINK+="zwave%n", OWNER="1000000", GROUP="1000020"

This rule is specific to the ZWave adapter I'm using, so you obviously need to update it to match your hardware.
The important part here are OWNER="1000000" which means the device will be owned by the root user of the container, and GROUP="1000020" which means the device group will be the dialout group of the container.

To trigger the new rule, run:

sudo udevadm control --reload
sudo udevadm trigger

Start the container

Now that everything is ready, we can start the container as usual:

lxc-start -n jeedom1

You can now run ps on the host and confirm that the processes of the container are executed by the user 1000000.

Conclusion

This was more complicated than I expected but once you have all the information, you understand that everything makes sense.

The extra layer of security provided by user namespace is worth the effort. Whenever a remote code execution risk will be found (and yes, it will happen), you known that it's going to be difficult for the hacker to get out of this jail.

References