Traditionally, Linux separates users and their processes into two different groups: root (user ID 0) and everyone else. Back in 1999, with the 2.2 Linux kernel release, kernel developers started breaking up the privileges of the root user into distinct capabilities, allowing processes to inherit subsets of root’s privilege, without giving away too much. Fast-forward to late 2019 (Linux is up to version 5.4.x, by the way), and we have over three dozen different capabilities to assign out.
Here’s some interesting ones:
CAP_KILL
– Enables a process to send signals to other processes, regardless of their effective UIDs.CAP_IPC_LOCK
– Allows a process to lock memory, to ensure that it doesn’t get swapped out to disk. Security-conscious tools that handle credentials in-RAM.CAP_MKNOD
– Lets you create special files (like block devices). Think of it! Your very own/dev/null
!CAP_NET_ADMIN
– Enables most network management, including interface configuration, packet filtering and more. This is a fairly broad scope.CAP_NET_BIND_SERVICE
– Allows binding of so-called “privileged” ports (those below 1024).
So how do we find out what capabilities we have? As with most things process-oriented, we can look in the lovely /proc
filesystem.
We can look at our own capability set using /proc/self
:
→ grep Cap /proc/self/statusCapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 0000003fffffffff
CapAmb: 0000000000000000
These hex strings represent the bit flags set for slightly different purposes. We’re mostly going to focus on CapPrm
, the “permitted set”. Since I ran grep
as myself, I have no capabilities, and CapPrm
is all zeroes.
Let’s run it as root and see what happens:
→ sudo grep Cap /proc/self/status
CapInh: 0000000000000000
CapPrm: 0000003fffffffff
CapEff: 0000003fffffffff
CapBnd: 0000003fffffffff
CapAmb: 0000000000000000
Hey look at that. All the capabilities!
If you don’t immediately grok that long string of hexadecimal, fret not! I wrote a small utility, which you can find on GitHub, that prints out human-friendly names and descriptions. The easiest way to run this is inside of Docker, using my huntprod/caps
image:
→ docker run --rm huntprod/caps 0000003fffffffff
0000003fffffffff (38 capabilities):
chown 0 (0x00000000000001) Make arbitrary changes to file UIDs and GIDs
dac_override 1 (0x00000000000002) Bypass file read, write, and execute permission checks.
dac_read_search 2 (0x00000000000004) Bypass file read permission checks and directory read and execute permission checks.
fowner 3 (0x00000000000008) Bypass file ownership / process owner equality permission checks.
fsetid 4 (0x00000000000010) Don't clear set-user-ID and set-group-ID mode bits when a file is modified
kill 5 (0x00000000000020) Bypass permission checks for sending signals.
setgid 6 (0x00000000000040) Make arbitrary manipulations of process GIDs and supplementary GID list.
setuid 7 (0x00000000000080) Make arbitrary manipulations of process UIDs.
... etc ...
Looking at Subsets of Capabilities
If we want to look at some middle ground, we need a way of dropping permitted capabilities. There’s two ways to do this: via filesystem attributes using the setcap
program, and via containers. Frankly, containers is a lot easier, since both Docker and Kubernetes have first-class support for explicitly specifying the permitted set of capabilities.
Let’s start with Docker.
If we run the huntprod/caps
image with no arguments, it searches through /proc/self/status
and grabs the permitted capability set, and then displays that.
→ docker run --rm huntprod/caps
(via /proc/self/status)
00000000a80425fb (14 capabilities):
chown 0 (0x00000000000001) Make arbitrary changes to file UIDs and GIDs
dac_override 1 (0x00000000000002) Bypass file read, write, and execute permission checks.
fowner 3 (0x00000000000008) Bypass file ownership / process owner equality permission checks.
fsetid 4 (0x00000000000010) Don't clear set-user-ID and set-group-ID mode bits when a file is modified
kill 5 (0x00000000000020) Bypass permission checks for sending signals.
setgid 6 (0x00000000000040) Make arbitrary manipulations of process GIDs and supplementary GID list.
setuid 7 (0x00000000000080) Make arbitrary manipulations of process UIDs.
setpcap 8 (0x00000000000100) Manage capability sets (from bounded / inherited set).
net_bind_service 10 (0x00000000000400) Bind a socket to Internet domain privileged ports.
net_raw 13 (0x00000000002000) Use RAW and PACKET sockets.
sys_chroot 18 (0x00000000040000) Use chroot(2) and manage kernel namespaces.
mknod 27 (0x00000008000000) Create special files using mknod(2).
audit_write 29 (0x00000020000000) Write records to kernel auditing log.
setfcap 31 (0x00000080000000) Set arbitrary capabilities on a file.
Voila!
Without specifying anything, my docker container was restricted down to a subset of capabilities, a80425fb
.
Unsurprisingly, if we are a privileged container, we get the full capability set:
→ docker run --privileged --rm huntprod/caps | head -n2
(via /proc/self/status)
0000003fffffffff (38 capabilities):
Let’s try explicitly asking for a capability:
→ docker run --rm --cap-add ipc_lock huntprod/caps
(via /proc/self/status)
00000000a80465fb (15 capabilities):
chown 0 (0x00000000000001) Make arbitrary changes to file UIDs and GIDs
dac_override 1 (0x00000000000002) Bypass file read, write, and execute permission checks.
fowner 3 (0x00000000000008) Bypass file ownership / process owner equality permission checks.
fsetid 4 (0x00000000000010) Don't clear set-user-ID and set-group-ID mode bits when a file is modified
kill 5 (0x00000000000020) Bypass permission checks for sending signals.
setgid 6 (0x00000000000040) Make arbitrary manipulations of process GIDs and supplementary GID list.
setuid 7 (0x00000000000080) Make arbitrary manipulations of process UIDs.
setpcap 8 (0x00000000000100) Manage capability sets (from bounded / inherited set).
net_bind_service 10 (0x00000000000400) Bind a socket to Internet domain privileged ports.
net_raw 13 (0x00000000002000) Use RAW and PACKET sockets.
ipc_lock 14 (0x00000000004000) Lock memory, via mlock(2) and friends.
sys_chroot 18 (0x00000000040000) Use chroot(2) and manage kernel namespaces.
mknod 27 (0x00000008000000) Create special files using mknod(2).
audit_write 29 (0x00000020000000) Write records to kernel auditing log.
setfcap 31 (0x00000080000000) Set arbitrary capabilities on a file.
What if we only want the IPC_LOCK capability, and none of the others?
→ docker run --rm --cap-drop all --cap-add ipc_lock huntprod/caps
(via /proc/self/status)
0000000000004000 (1 capability):
ipc_lock 14 (0x00000000004000) Lock memory, via mlock(2) and friends.
Doing It For Real: docker-compose
Docker Compose supports the same facilities as Docker itself for managing your set of capabilities, using the cap_add
and cap_drop
lists in your docker-compose.yml
specification:
version: '2'
services:
pause:
image: starkandwayne/pause
cap_drop:
- ALL
cap_add:
- NET_ADMIN
- SYS_ADMIN
See Compose file reference documentation for more information.
Doing It For Real: Kubernetes
In Kubernetes, you can do something similar, but it has to be configured in a security context, like this:
apiVersion: v1
kind: Pod
metadata:
name: linux-capabilities
spec:
containers:
- name: pause
image: starkandwayne/pause:latest
securityContext:
capabilities:
add:
- NET_ADMIN
- SYS_ADMIN
See the Kubernetes Pod SecurityContext documentation for the full story.
Capabilities in the Wild
Not all Linux capabilities are created equally. Some of them you will rarely need, but others will pop up quite frequently. Here’s our top three all-time faves!
IPC_LOCK
– Being able to lock memory viamlock(2)
and friends comes up far more frequently than you would think, especially when you’re running security-minded infrastructure software.LINUX_IMMUTABLE
– Sometimes, you know you’re never going to rewrite a file, for auditing purposes or whatnot, and the ability tofcntl(2)
the filesystem entry to only ever allow append operations is a definite boon.NET_BIND_SERVICE
– Not all software is cloud-native, and some of it even has hard-coded “privileged” ports. This capability allows that particular shenanigan without widening the attack surface needlessly.
Hopefully, you now have a better understanding and a greater appreciation for the power and utility of Linux capabilities, and are gearing up to incorporate them into your next Docker or Kubernetes deployment.
Happy Hacking!