English

What Are Linux Namespaces and What Are They Used for?

Sep 2, 2020

222

Linux namespaces are the underlying tech behind container technologies like Docker. They’re a feature of the Linux kernel that allows the system to restrict the resources that containerized processes see, and that ensures none of them can interfere with another.

What Are Namespaces?

When you’re running many different processes and applications on a single server, as is the case with deployment tools like Kubernetes, it’s important to have each process isolated, mostly for security.

One container shouldn’t be able to gain control over another’s resources, because if that container is then compromised it could compromise the entire system. This method of attack is similar to how the CPU bug Meltdown works; different threads of a processor should be isolated from each other. Similarly, processes running on different virtual systems (containers) should be isolated from other containers.

Namespaces achieve this isolation at a kernel level. Similar to how the application chroot works, which jails a process in a different root directory, namespaces separate other aspects of the system. There are seven namespaces available:

Mount, or mnt. Very similar to chroot, the Mount namespace virtually partitions the file system. Processes running in separate mount namespaces cannot access files outside of their mount point. Because this is done at a kernel level, it’s much more secure than changing the root directory with chroot.
Process, or pid. In Linux, the first processes spawn as children of PID 1, which forms the root of the process tree. The process namespace cuts off a branch of the PID tree, and doesn’t allow access further up the branch. Processes in child namespaces will actually have multiple PIDs—the first one representing the global PID used by the main system, and the second PID representing the PID within the child process tree, which will restart from 1.
Interprocess Communication, or ipc. This namespace controls whether or not processes can talk directly to one another.
Network, or net. This namespace manages which network devices a process can see. However, this doesn’t automatically set up anything for you—you’ll still need to create virtual network devices, and manage the connection between global network interfaces and child network interfaces. Containerization software like Docker already has this figured out, and can manage networking for you.
User. This namespace allows process to have “virtual root” inside their own namespace, without having actual root access to the parent system. It also partitions off UID and GID information, so child namespaces can have their own user configurations.
UTS. This namespace controls hostname and domain information, and allows processes to think they’re running on differently named servers.
Cgroup is another kernel feature very similar to namespaces. Cgroups allow the system to define resource limits (CPU, memory, disk space, network traffic, etc.) to a group of processes. This is a useful feature for containerized apps, but it doesn’t do any kind of “information isolation” like namespaces would. The cgroup namespace is a separate thing, and only controls which cgroups a process can see, and does not assign it to a specific cgroup.

By default, any process you run uses the global namespaces, and most process on your system do as well unless otherwise specified.

Working with Namespaces

You can use the lsns command (ls-namespaces) to view the current namespaces your system has active. This command needs to be ran as root, or else the list may be incomplete.

Above is the lsns output from a fresh Ubuntu install. Each namespace is listed alongside the process ID, user, and command that created it. The seven namespaces spawned from /sbin/init with PID 1 are the seven global namespaces. The only other namespaces are mnt namespaces for system daemons, along with Canonical’s Livepatch service.

If you were working with containers, this list would be much longer. You can output this list in JSON format with the -J flag, which you can use much more easily with a scripting language.

You can change your current namespace with the nsenter utility. This command allows you to “enter” the namespace of another process, usually for debugging purposes. It can actually run any command in that namespace, but by default it just attempts to load a shell (/bin/bash usually).

You specify a process ID, then each namespace that you want to enter:

sudo nsenter -t PID –mount –net –pid //etc.

For example, attempting to enter the mount namespace for kdevtmpfs will load you into that namespace, but subsequently fail because it can’t find /bin/bash, which actually means it worked, because the apparent root directory was changed.

If your child mnt namespace included /bin/bash, you could enter it and load a shell. This can be done manually but should be done through bind mounts, which can manipulate the directory tree and link files across mnt namespaces. This can lead to some interesting use cases, like making two process read different contents from the same file.

To create new namespaces, you have to fork from an existing one (usually global), and specify which namespaces you want changed. This is done with the unshare command, which runs a command with a new namespace “unshared” from the master.

To unshare the hostname namespace, use:

sudo unshare -u command

If the command is left blank, unshare runs bash by default. This creates a new namespace that will show up in lsns‘s output:

The terminal multiplexer screen is used here to keep bash running in the background, otherwise the namespace would disappear when the process closed.

Unless you’re doing very low level programming, you probably won’t have to touch namespaces yourself. Containerization programs like Docker will manage the details for you, and in most cases where you need process isolation, you should just use an existing tool. It is important though to understand how namespaces work in the context of containerization, especially if you’re doing any low-level configuration of your Docker containers or have to do any manual debugging.