Kubernetes Privileged Pod - Overview
- You can configure a container inside a Kubernetes Pod to run in privileged mode using security context.
- Running the container in privileged mode, giving it full access to the node’s kernel.
- You can also fine grain the list of privileges assigned to any container inside the Kubernetes Pod by dropping specific capabilities
- Additionally you can define a number of other security related features such as runAsUser, runAsNonRoot etc
In this tutorial we will concentrate only on Kubernetes Privileged and NON-Privileged Pod Examples. You can check following articles if you want to learn more about Principle of Least Privilege in Kubernetes:
Kubernetes SecurityContext Explained with Examples
Kubernetes SecurityContext Capabilities Explained [Examples]
Create Pod Security Policy Kubernetes [Step-by-Step]
Setup Kubernetes Cluster (Pre-requisite)
This article assumes that you already have a Kubernetes Cluster. I will be using my Install multi-node Kubernetes cluster running with Calico Network Plugin and Docker as Container Runtime. You can also use minikube cluster as it is easier to bring up and good for learning Kubernetes.
Example-1: Create Kubernetes Privileged Pod (With all Capabilities)
In this example we will create a simple pod using centos image with all the privilege and Linux Capabilities. To create a privileged pod we can just add privileged: true
inside the securityContext
section as shown below:
[root@centos8-1 ~]# cat privileged-pod-1.yaml
Sample Output:
apiVersion: v1
kind: Pod
metadata:
name: test-pod-1
namespace: default
spec:
containers:
- name: centos
image: centos
command: ['sh', '-c', 'sleep 999']
securityContext:
privileged: true
Let's create this pod:
~]# kubectl create -f privileged-pod-1.yaml
pod/test-pod-1 created
Check if the pod is successfully created and running:
[root@centos8-1 ~]# kubectl get pods
NAME READY STATUS RESTARTS AGE
test-pod-1 1/1 Running 0 5s
Connect to the pod and verify if all the Linux Capabilities are enabled:
[root@centos8-1 ~]# kubectl exec -it test-pod-1 -- bash
Use capsh command to check the list of capabilities allowed to the pod:
[root@test-pod-1 /]# capsh --print Current: = cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,cap_block_suspend,cap_audit_read+eip Bounding set =cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,cap_block_suspend,cap_audit_read Ambient set = Securebits: 00/0x0/1'b0 secure-noroot: no (unlocked) secure-no-suid-fixup: no (unlocked) secure-keep-caps: no (unlocked) secure-no-ambient-raise: no (unlocked) uid=0(root) gid=0(root) groups=
As you can see, almost all the capabilities are allowed to this pod. Additionally this pod is running as root user.
Additionally you can also add allowPrivilegeEscalation: true
inside the securityContext
field to allow Privilege Escalation.
...
securityContext:
privileged: true
allowPrivilegeEscalation: true
allowPrivilegeEscalation
is by default considered to be as true and doesn't need to be explicitly defined. If you try to use allowPrivilegeEscalation: false
with privileged: true
then you will get following error:
spec.containers[0].securityContext: Invalid value: core.SecurityContext{Capabilities:(*core.Capabilities)(nil), Privileged:(*bool)(0xc00d5ca567), SELinuxOptions:(*core.SELinuxOptions)(nil), WindowsOptions:(*core.WindowsSecurityContextOptions)(nil), RunAsUser:(*int64)(nil), RunAsGroup:(*int64)(nil), RunAsNonRoot:(*bool)(nil), ReadOnlyRootFilesystem:(*bool)(nil), AllowPrivilegeEscalation:(*bool)(0xc00d5ca566), ProcMount:(*core.ProcMountType)(nil), SeccompProfile:(*core.SeccompProfile)(nil)}: cannot set `allowPrivilegeEscalation` to false and `privileged` to true
Example-2: Create non-privileged Kubernetes Pod
I have added a HINT as based on the HEADING you would assume that in this example we will create a non-privileged pod. Actually we will but this pod will not be COMPLETELY WITHOUT privilege and it will still have some capabilities.
Here I have a YAML file to create test-pod-2
:
~]# cat privileged-pod-2.yaml
Sample Output:
apiVersion: v1
kind: Pod
metadata:
name: test-pod-2
namespace: default
spec:
containers:
- name: centos
image: centos
command: ['sh', '-c', 'sleep 999']
securityContext:
privileged: false
allowPrivilegeEscalation: false
If you notice, inside the container's securityContext
I have explicitly disabled privileged
and allowPrivilegeEscalation
mode. So the assumption is, if I enable privileged
then the container gets all the privilege, so similarly if I make privileged
as false
then all the privileges should be dropped?
Sorry, but that is not the case. By default the Kubernetes container will still have some Linux capabilities.
Let's create this Pod:
~]# kubectl create -f privileged-pod-2.yaml
pod/test-pod-2 created
Connect to this pod once it is in Running
state:
~]# kubectl exec -it test-pod-2 -- bash
Verify the list of Linux capabilities assigned to this Pod:
[root@test-pod-2 /]# capsh --print Current: = cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_mknod,cap_audit_write,cap_setfcap+eip Bounding set =cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_mknod,cap_audit_write,cap_setfcap Ambient set = ...
This is what I was referring to earlier, even with privilged: false
and allowPrivilegeEscalation: false
, the pod will still have some privileges.
Let's try to perform some task which requires privilege, such as performing su
:
[root@test-pod-2 /]# su -
[root@test-pod-2 ~]#
It worked, which should have failed on a non-privileged pod.
Let's try to install some rpm:
[root@test-pod-2 ~]# yum -y install sudo -q
Failed to set locale, defaulting to C.UTF-8
warning: /var/cache/dnf/baseos-f6a80ba95cf937f2/packages/sudo-1.8.29-7.el8.x86_64.rpm: Header V3 RSA/SHA256 Signature, key ID 8483c65d: NOKEY
Importing GPG key 0x8483C65D:
Userid : "CentOS (CentOS Official Signing Key) <security@centos.org>"
Fingerprint: 99DB 70FA E1D7 CE22 7FB6 4882 05B5 55B3 8483 C65D
From : /etc/pki/rpm-gpg/RPM-GPG-KEY-centosofficial
Installed:
sudo-1.8.29-7.el8.x86_64
So, we are also able to install rpm on a non-privileged pod? Now you can be sure that this is definitely not a non-privileged pod.
Let's check the next example which actually creates a non-privileged container inside Kubernetes Pod.
Example-3: Create non-privileged Kubernetes Pod (DROP all CAPABILITIES)
In this example I will show you the proper way to create an actual non-privileged container inside the Kubernetes Pod. We will create a new YAML file and additionally we will drop all the Linux capabilities inside the container using the securityContext
.
Here is my test-pod-3
sample YAML file:
~]# cat privileged-pod-3.yaml
Sample Output:
apiVersion: v1
kind: Pod
metadata:
name: test-pod-3
namespace: default
spec:
containers:
- name: centos
image: centos
command: ['sh', '-c', 'sleep 999']
securityContext:
privileged: false
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
Here, you can notice that I have additionally added capabilities
section with drop: ALL to drop all the Linux Capabilities inside the container.
Let's create this pod:
[root@centos8-1 ~]# kubectl create -f privileged-pod-3.yaml
pod/test-pod-3 created
Connect to the pod once it is in Running
state:
[root@centos8-1 ~]# kubectl exec -it test-pod-3 -- bash
Verify the list of capabilities assigned to this container:
[root@test-pod-3 /]# capsh --print
Sample Output:
This time there are no Linux capabilities loaded on this container.
let us try to perform some tasks which would normally require some privilege such as switch user (su
):
[root@test-pod-3 /]# su -
su: cannot set groups: Operation not permitted
Great so su
is not allowed this time.
Let us try to install an rpm:
[root@test-pod-3 /]# yum -y install sudo -q
Failed to set locale, defaulting to C.UTF-8
....
Error unpacking rpm package sudo-1.8.29-7.el8.x86_64
Verifying : sudo-1.8.29-7.el8.x86_64 1/1
Failed:
sudo-1.8.29-7.el8.x86_64
Error: Transaction failed
The installation of rpm was also not allowed.
HEY but we are still logged in as root. I mean, even though the Linux capabilities are removed but as root user we can still damage our file system.
For example, what if I go ahead and delete files inside /etc
:
[root@test-pod-3 /]# cd /etc/ [root@test-pod-3 etc]# ls -l total 1120 -rw-r--r--. 1 root root 14 Sep 15 14:17 BUILDTIME -rw-r--r--. 1 root root 94 May 11 2019 GREP_COLORS drwxr-xr-x. 3 root root 4096 Sep 15 14:17 NetworkManager drwxr-xr-x. 6 root root 4096 Sep 15 14:17 X11 -rw-r--r--. 1 root root 16 Sep 15 14:17 adjtime -rw-r--r--. 1 root root 1529 May 15 2020 aliases drwxr-xr-x. 2 root root 4096 Sep 15 14:17 alternatives drwxr-xr-x. 2 root root 4096 Nov 3 2020 bash_completion.d -rw-r--r--. 1 root root 3019 May 15 2020 bashrc -rw-r--r--. 1 root root 1629 May 15 2020 csh.cshrc ....
I will go ahead and delete everything here:
[root@test-pod-3 etc]# rm -rf * rm: cannot remove 'hostname': Device or resource busy rm: cannot remove 'hosts': Device or resource busy rm: cannot remove 'resolv.conf': Device or resource busy [root@test-pod-3 etc]# ls -l total 12 -rw-r--r--. 1 0 0 11 Sep 18 08:21 hostname -rw-r--r--. 1 0 0 207 Sep 18 08:38 hosts -rw-r--r--. 1 0 0 115 Sep 18 08:21 resolv.conf
As you can see, even on a non-privileged pod, a root user can do a lot of damage. So we MUST connect to a non-privileged pod as a non-root user.
Example-4: Kubernetes Non-Privileged Pod with Non Root User
root
user inside the pod for demonstration but in production environment a non-privileged pod must be runAs non-root user.In this example I will use non-root user for my Kubernetes pod so that the normal user will not have access to any root owned directory or files. Here is my sample YAML file to create test-pod-4
:
~]# cat privileged-pod-4.yaml
Sample Output:
apiVersion: v1
kind: Pod
metadata:
name: test-pod-4
namespace: default
spec:
containers:
- name: centos
image: centos
command: ['sh', '-c', 'sleep 999']
securityContext:
privileged: false
allowPrivilegeEscalation: false
runAsUser: 1000
capabilities:
drop:
- ALL
In this YAML file I have additionally added runAsUser
field to start my pod as uid 1000
instead of root user.
Let's create this pod:
~]# kubectl create -f privileged-pod-4.yaml
pod/test-pod-4 created
Connect to this pod once it is in Running
state:
[root@centos8-1 ~]# kubectl exec -it test-pod-4 -- bash
bash-4.4$ id
uid=1000 gid=0(root) groups=0(root)
As you can see, my pod has been started as 1000
uid. We still don't have any privilege on this container.
bash-4.4$ capsh --print
Current: =
Bounding set =
Ambient set =
...
Now since we are logged in as non-root, we won't be able to even navigate into many system directories or execute any binaries which requires root level permission:
bash-4.4$ /usr/sbin/useradd deepak
useradd: Permission denied.
useradd: cannot lock /etc/passwd; try again later.
Let us try to delete files inside /etc as we did in test-pod-3 earlier using root user:
bash-4.4$ cd /etc/ bash-4.4$ ls -l | wc -l 128 bash-4.4$ rm -rf * rm: cannot remove 'BUILDTIME': Permission denied rm: cannot remove 'GREP_COLORS': Permission denied rm: cannot remove 'NetworkManager/dispatcher.d/11-dhclient': Permission denied rm: cannot remove 'X11/xinit/xinitrc.d/50-systemd-user.sh': Permission denied rm: cannot remove 'X11/fontpath.d': Permission denied rm: cannot remove 'X11/xorg.conf.d/00-keyboard.conf': Permission denied rm: cannot remove 'X11/applnk': Permission denied rm: cannot remove 'adjtime': Permission denied rm: cannot remove 'aliases': Permission denied ... bash-4.4$ ls -l | wc -l 128
So, now our non-root user is not allowed to do any kind of modification in the system which includes deleting any root owned file or directory.
Example-5: Define specific Linux Capabilities for Kubernetes non-privileged Pod
In our previous examples we have either used a completely privileged pod or we have completely dropped all the capabilities and created non-privileged pod.
Now in this section we will create partially privileged pod. For example you have a requirement to modify the NICE value of any process which requires some privilege. But then just to have this capability, why should we allow all other capabilities?
So, we can explicitly define capabilities which must be enabled inside the pod. Here is a YAML file to create test-pod-5
where we will demonstrate this behaviour:
~]# cat privileged-pod-5.yaml
Sample output:
apiVersion: v1
kind: Pod
metadata:
name: test-pod-5
namespace: default
spec:
containers:
- name: centos
image: centos
command: ['sh', '-c', 'sleep 999']
securityContext:
privileged: false
allowPrivilegeEscalation: false
runAsUser: 1000
capabilities:
drop:
- ALL
add:
- SYS_NICE
As you can check under capabilities section, I have explicitly allowed SYS_NICE capability for this pod.
- SYS_NICE
in the above exampleLet's create this pod:
~]# kubectl create -f privileged-pod-5.yaml
pod/test-pod-5 created
Connect to this pod once it is in Running
state:
~]# kubectl exec -it test-pod-5 -- bash
Check the list of allowed capabilities:
bash-4.4$ capsh --print Current: = cap_sys_nice+i Bounding set =cap_sys_nice Ambient set = ...
Now let us try to change the nice value of any process:
## List available process with their NICE value bash-4.4$ ps ax -o pid,ni,cmd PID NI CMD 1 0 /usr/bin/coreutils --coreutils-prog-shebang=sleep /usr/bin/sleep 999 7 0 bash 13 0 ps ax -o pid,ni,cmd ## Change nice value of PID 7 bash-4.4$ renice -n 10 -p 7 7 (process ID) old priority 0, new priority 10 ## Try to change the nice value again bash-4.4$ renice -n 19 -p 7 7 (process ID) old priority 10, new priority 19 ## Verify bash-4.4$ ps ax -o pid,ni,cmd PID NI CMD 1 0 /usr/bin/coreutils --coreutils-prog-shebang=sleep /usr/bin/sleep 999 7 19 bash 16 19 ps ax -o pid,ni,cmd
So, as you can see due to SYS_NICE
capability we are able to execute renice command.
When should you use allowPrivilegeEscalation?
Now we have covered different scenarios of Kubernetes privileged and non-privileged pod through out this article. But we didn't covered the actual use of allowPrivilegeEscalation
.
If you have thoroughly followed all the examples in this tutorial, then you may have realised that just by adding Linux capabilities we were able to perform privileged task, so why do we need privileged: true
or allowPrivilegeEscalation: true
? I mean we could just add capabilities and things should work, right?
Technically YES but this may not be correct in all the cases. For example, let us consider the usecase of running SSHD inside a non-privileged container.
Now you can add all the capabilities required to run SSHD server inside a non-privileged container, but when multiple users try to SSH into your container then the SSH will be denied if allowPrivilegeEscalation
is defined as false.
This is because when multiple non-root users try to do SSH, they internally try to elevate the permission by trying to create terminal such as /dev/tty1
, alternatively there are multiple system call inside SSH which actually requires privilege escalation. So to answer the above question, just by adding capabilities you may not be able to solve every requirement.
Let us take one practical example using test-pod-6
via the following YAML file:
~]# cat privileged-pod-6.yaml
Sample Output:
apiVersion: v1
kind: Pod
metadata:
name: test-pod-6
namespace: default
spec:
containers:
- name: centos
image: golinuxcloud/centos:latest
command: ['sh', '-c', 'sleep 999']
securityContext:
privileged: false
allowPrivilegeEscalation: false
runAsUser: 1000
capabilities:
drop:
- ALL
add:
- SYS_NICE
imagePullSecrets:
- name: regcred
imagePullSecrets
part of this YAML file.The YAML file should look almost same as the one we used in Example-5 for test-pod-5. The difference is that I have installed sudo
package inside the image and also created a user with 1000
uid so that we have a proper user. Although that is not relevant to this topic.
Let me create this pod:
~]# kubectl create -f privileged-pod-6.yaml
pod/test-pod-6 created
Connect to this pod once running:
[root@centos8-1 ~]# kubectl exec -it test-pod-6 -- bash
[deepak@test-pod-6 /]$
Now I have a proper username assigned to the uid 1000. Nothing related to privileged or non-privileged pod - just looks good 🙂
So coming back to the topic, I had added SYS_NICE capability so I should be able to change NICE value of any process as non-root user:
[deepak@test-pod-6 /]$ ps ax -o pid,ni,cmd PID NI CMD 1 0 /usr/bin/coreutils --coreutils-prog-shebang=sleep /usr/bin/sleep 999 8 0 bash 27 0 ps ax -o pid,ni,cmd [deepak@test-pod-6 /]$ renice -n 19 -p 8 8 (process ID) old priority 0, new priority 19 [deepak@test-pod-6 /]$ ps ax -o pid,ni,cmd PID NI CMD 1 0 /usr/bin/coreutils --coreutils-prog-shebang=sleep /usr/bin/sleep 999 8 19 bash 29 19 ps ax -o pid,ni,cmd
As expected, user deepak
is able to use renice
command.
But can we execute the same command as sudo?
[deepak@test-pod-6 /]$ sudo renice -n 10 -p 8
sudo: effective uid is not 0, is /usr/bin/sudo on a file system with the 'nosuid' option set or an NFS file system without root privileges?
NO, the user is NOT allowed to run any commands as sudo due to the fact that allowPrivilegeEscalation
is marked as False
.
So, I hope this is clear. If you have a requirement to execute any process as SUDO or SU which requires privilege escalation then you must use allowPrivilegeEscalation: true
or else you can just continue to add capabilities with both privileged
and allowPrivilegeEscalation
set as false
.
Summary
In this tutorial we explored Kubernetes Privileged Pod and NON-Privileged Pod Examples. We covered different scenarios to create a privileged and non-privileged pod properly. When you are trying to add specific capabilities for individual process inside the container then this can be tricky process. I normally prefer using strace
command to identify all the system calls made by the respective process and then use the strace
output to get the list of capabilities which may be required by the process.
Alternatively you can start your pod with privileged and when the process is in running state then you can just query grep Cap /proc/<PID>/status
which should give you a HEX value of the Capabilities used by that process. Later you can use capsh --decode=<HEX_VALUE>
to get the list of capabilities in use.
I have tried to explain everything related to the topic but this is actually a very vast topic and covering all the possible scenarios may not be possible. If you face any issues then you can reach out to me using comment section and I will definitely try to help you.
What's Next
Kubernetes SecurityContext Capabilities Explained [Examples]
References
man page for Linux Capabilities