[Linux] LXC container: from privileged to unprivileged
In a previous article, I showed how to preserve the integrity of your Linux machine by installing unfriendly software in a LXC container.
The container was a “privileged” container, meaning that the user ids in the container maps to the user ids of the host. This can easily be confirmed by running ps
on the host: the root
user of the container is the root
user of the host.
In this article, we’ll see how to create an unprivileged container. I’ll assume that you read my previous article. In particular, I’ll use jeedom1
as the name of the container, don’t forget to replace with the name of your container.
What is an unprivileged LXC container?
I think the best definition is found in this answer on StackOverflow:
Unprivileged LXC containers are the ones making use of user namespaces (userns). I.e. of a kernel feature that allows to map a range of UIDs on the host into a namespace inside of which a user with UID 0 can exist again.
Contrary to my initial perception of unprivileged LXC containers for a while, this does not mean that the container has to be owned by an unprivileged host user. That is only one possibility.
Owned by user or by root?
So unprivileged containers can be owned by a regular user or by root. If you’re like me, you probably want to take the most secured way: make a container owned by a user.
Well… I tried this way, but it’s not as good as you expect:
- There a lot more configuration
- Not all templates are compatible
- The
autostart
feature is not available
After one or two hours trying this way, I decided to take the other road, and that was much better.
Subordinate ids
So the container is own by root, but the users of the containers wont match the users of the host. In other words, a process ran with user id 0, from the point of view of the container, will actually be executed with a different user id in the host. And the same will also happen for other users of the container.
On the host machine, this new id range will not match with actual users. Instead, they’ll rely on subordinate user ids and group id, in short subuid
and subgid
.
Each user of the host can have one or more ranges of subordinate ids. They are defined in /etc/subuid
and /etc/subgid
.
Add subordinate ids to root
So to allow root to run an unpriviliged container, we first need to add a subordinate id range.
Edit /etc/subuid
and add the following line:
root:1000000:65536
Do the same with /etc/subgid
This will allow root to used 65536 new user and group ids, from 1000000 to 1065536.
As far as I know, it’s not possible to add comments in these files.
Create the LXC container
Here, nothing changed compared to the privileged container:
sudo lxc-create -n jeedom1 -- -d ubuntu -r vivid -a amd64
Edit container’s configuration
We now arrive to the critical part of this tutorial: we need to tell LXC that we want to map the users of the container to the subordinate ids we’ve just defined.
Edit /var/lib/lxc/jeedom1/config
:
# Map user and group ids
lxc.include = /usr/share/lxc/config/ubuntu.userns.conf
lxc.id_map = u 0 1000000 65536
lxc.id_map = g 0 1000000 65536
As you can see we added two lxc.id_map
instructions to map the user ids and the group ids.
We also added an lxc.include
instruction to enable user namespaces of the container template. If you use another template, you have to use another file, like centos.userns.conf
for instance.
Change the owner of rootfs/
If you try to start the container right now, it won’t work because it won’t be able to read its own filesystem (remember that the user id 0 of the container is mapped to the user if 1000000 of the host).
So we need to change the owner of the file system so that it matches the root user of the container.
sudo chown -R 1000000:1000000 /var/lib/lxc/jeedom1/rootfs
We also make sure that this user can have access to this folder:
sudo chmod 755 /var/lib/lxc
sudo chmod 755 /var/lib/lxc/jeedom1
sudo chmod 640 /var/lib/lxc/jeedom1/config
sudo chmod 750 /var/lib/lxc/jeedom1/rootfs
Change the owner of devices
In my previous article, I showed how to use /dev/ttyACM0
from a privileged container. But our new container is not allowed to access to that device, we need to change the permission of the device:
Create the file /etc/udev/rules.d/99-zwave.rules
:
# 0658:0200 Sigma Designs, Inc.
SUBSYSTEM=="tty", ATTRS{idVendor}=="0658", ATTRS{idProduct}=="0200", \
SYMLINK+="zwave%n", OWNER="1000000", GROUP="1000020"
This rule is specific to the ZWave adapter I’m using, so you obviously need to update it to match your hardware.
The important part here are OWNER="1000000"
which means the device will be owned by the root user of the container, and GROUP="1000020"
which means the device group will be the dialout
group of the container.
To trigger the new rule, run:
sudo udevadm control --reload
sudo udevadm trigger
Start the container
Now that everything is ready, we can start the container as usual:
lxc-start -n jeedom1
You can now run ps
on the host and confirm that the processes of the container are executed by the user 1000000
.
Conclusion
This was more complicated than I expected but once you have all the information, you understand that everything makes sense.
The extra layer of security provided by user namespace is worth the effort. Whenever a remote code execution risk will be found (and yes, it will happen), you known that it’s going to be difficult for the hacker to get out of this jail.