Kubernetes SecurityContext Capabilities Introduction
With Kubernetes you can control the level of privilege assigned to each Pod and container. We can utilize Kubernetes SecurityContext Capabilities to add or remove Linux Capabilities from the Pod and Container so the container can be made more secure from any kind of intrusion. The Kubernetes SecurityContext Capabilities is tightly coupled with Pod Security Policy which defines the policy for the entire cluster. Later we use these policies with PSP (Pod Security Policy) to map the Pods and control the privilege.
In this tutorial we will give a brief overview on Pod Security Policy (for detailed understanding on PSP you can read my older article Create Pod Security Policy Kubernetes [Step-by-Step]). Then we will explore Kubernetes SecurityContext Capabilities in detail with multiple examples covering different scenarios.
Create Pod Security Policy
First we will create our Pod Security Policy which we will use through out this article. Here is my PSP definition file along with Cluster Role and Cluster Role Binding:
---
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
name: testns-psp-01
spec:
privileged: true
allowPrivilegeEscalation: true
requiredDropCapabilities:
allowedCapabilities:
- '*'
defaultAddCapabilities:
seLinux:
rule: RunAsAny
supplementalGroups:
rule: RunAsAny
runAsUser:
rule: RunAsAny
fsGroup:
rule: RunAsAny
volumes:
- '*'
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: testns-psp-01
rules:
- apiGroups:
- policy
resourceNames:
- testns-psp-01
resources:
- podsecuritypolicies
verbs:
- use
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: testns-psp-01
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: testns-psp-01
subjects:
- kind: Group
apiGroup: rbac.authorization.k8s.io
name: system:authenticated
- kind: Group
name: system:serviceaccounts
apiGroup: rbac.authorization.k8s.io
Here is the output of my installed PSP:
]# kubectl get psp | grep -E 'PRIV|testns'
NAME PRIV CAPS SELINUX RUNASUSER FSGROUP SUPGROUP READONLYROOTFS VOLUMES
testns-psp-01 false * RunAsAny MustRunAsNonRoot RunAsAny RunAsAny false *
In our Pod Security Policy we have not added any restrictions and everything is allowed basically.
How to create a privileged container inside a Kubernetes Pod
In this example first we will create a privileged pod which should have all the capabilities. In most of the cases following Kubernetes SecurityContext Capability definition should be enough to start a privileged pod:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: test-statefulset
namespace: testns
spec:
selector:
matchLabels:
app: dev
serviceName: test-pod
replicas: 2
template:
metadata:
labels:
app: dev
spec:
containers:
- name: test-statefulset
image: golinux-registry:8090/secure-context-img:latest
command: ["supervisord", "-c", "/etc/supervisord.conf"]
imagePullPolicy: Always
securityContext:
runAsUser: 1025
## enable privileged mode
privileged: true
Create this statefulset:
]# kubectl create -f test-statefulset.yaml
statefulset.apps/test-statefulset created
Check the list of allowed capabilities:
]# kubectl exec -it test-statefulset-0 -n testns -- capsh --print Current: = cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,35,36,37+i Bounding set =cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,35,36,37 Securebits: 00/0x0/1'b0 secure-noroot: no (unlocked) secure-no-suid-fixup: no (unlocked) secure-keep-caps: no (unlocked) uid=1025(user1) gid=1025(user1)
As you can see, all the capabilities are allowed in our container.
In some cases, if you don't see all the capabilities added to your container then you can use below Kubernetes SecurityContext Capabilities:
...
securityContext:
runAsUser: 1025
privileged: true
allowPrivilegeEscalation: true
capabilities:
add:
- ALL
...
This YAML file expects the respective Pod Security Policy has allowed all capabilities.
How to create a non-privileged container inside a Kubernetes Pod
Now you may wonder that by using privileged as true enables all the privilege so just by making it false, the pod should execute as no-privilege?
Let's try this theory using this practical example, we have updated our statefulset definition file with the following Kubernetes SecurityContext Capabilities field:
...
containers:
- name: test-statefulset
image: golinux-registry:8090/secure-context-img:latest
command: ["supervisord", "-c", "/etc/supervisord.conf"]
imagePullPolicy: Always
securityContext:
runAsUser: 1025
privileged: false
allowPrivilegeEscalation: false
...
So, basically I have disabled privilege and any kind of privilege escalation inside the container. Once we create this statefulset, let's verify the available capabilities on the pod:
]# kubectl exec -it test-statefulset-0 -n testns -- capsh --print Current: = cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_mknod,cap_audit_write,cap_setfcap+i Bounding set =cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_mknod,cap_audit_write,cap_setfcap Securebits: 00/0x0/1'b0 secure-noroot: no (unlocked) secure-no-suid-fixup: no (unlocked) secure-keep-caps: no (unlocked) uid=1025(user1) gid=1025(user1) groups=
As you can see, even with privileged: false, the container still has multiple capabilities enabled so it is actually not a non-privileged pod.
Solution-1: Drop all capabilities using requiredDropCapabilities inside Pod Security Policy
I would not recommend this solution because PSP are created for whole cluster and it does not make sense to disable all the privilege in the PSP just for one pod. Although you can use RBAC to limit the usage of this PSP only for certain user, in which case this method can be used.
But either way, I will share the steps to drop all the privileges using a Pod Security Policy and you may choose your preferred method.
We will edit our testns-psp-01
using kubectl edit psp testns-psp-01 -n testns
command which will open the PSP definition file using your default editor. After updating the same, this is what my Kubernetes SecurityContext Capabilities looks like for the PSP:
...
spec:
allowPrivilegeEscalation: false
fsGroup:
rule: RunAsAny
requiredDropCapabilities:
- ALL
runAsUser:
rule: MustRunAsNonRoot
seLinux:
rule: RunAsAny
supplementalGroups:
rule: RunAsAny
volumes:
- '*'
So, basically I have removed the allowedCapabilities
section and added requiredDropCapabilities
field which will drop all the default capabilities from the container inside the Pod.
We will re-deploy our statefulset to pick up the new changes. Next verify the available capabilities inside the container:
]# kubectl exec -it test-statefulset-1 -n testns -- capsh --print
Current: =
Bounding set =
Securebits: 00/0x0/1'b0
secure-noroot: no (unlocked)
secure-no-suid-fixup: no (unlocked)
secure-keep-caps: no (unlocked)
uid=1025(user1)
gid=1025(user1)
groups=
Now as you can see, we cant see any capabilities assigned to our container. So now this is a proper non-privileged container inside a Kubernetes Pod
Solution-2: Using Kubernetes SecurityContext Capabilities in the Pod definition file
Next we will use the Pod definition file to start a non-privileged container by using Kubernetes SecurityContext Capabilities field. In addition to privileged: false, we must explicitly drop all the capabilities as shown below:
...
containers:
- name: test-statefulset
image: golinux-registry:8090/secure-context-img:latest
command: ["supervisord", "-c", "/etc/supervisord.conf"]
imagePullPolicy: Always
securityContext:
runAsUser: 1025
privileged: false
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
...
Let us re-deploy our statefulset and verify the applied Linux capabilities inside the container:
]# kubectl exec -it test-statefulset-1 -n testns -- capsh --print
Current: =
Bounding set =
Securebits: 00/0x0/1'b0
secure-noroot: no (unlocked)
secure-no-suid-fixup: no (unlocked)
secure-keep-caps: no (unlocked)
uid=1025(user1)
gid=1025(user1)
groups=
So as expected, the container has dropped all the capabilities and can be used as a non-privileged container in Kubernetes Pod.
How to assign limited Linux capabilities to a container inside Kubernetes Pod
Now that we know how to have a privileged and non-privileged pod, let me show you some example to create a pod with limited privilege.
In this example we will only add SYS_TIME capability to our container inside the Kubernetes Pod. To achieve this, I have modified my Pod Security Policy to allow privileged pods and allow all capabilities to be added. We don't want to restrict this at PSP level, rather we will control this at Pod level.
]# kubectl get psp | grep -E 'PRIV|testns'
NAME PRIV CAPS SELINUX RUNASUSER FSGROUP SUPGROUP READONLYROOTFS VOLUMES
testns-psp-01 true * RunAsAny MustRunAsNonRoot RunAsAny RunAsAny false *
Here is the snippet of my Kubernetes SecurityContext Capabilities which I will use to first drop all the capabilities and then only add SYS_TIME
capability
IMPORTANT NOTE:
add
field with SYS_TIME
and then later provide the drop ALL
field then all the capabilities would be dropped from the container. So, make sure you use drop
first followed by add
....
spec:
containers:
- name: test-statefulset
image: golinux-registry:8090/secure-context-img:latest
command: ["supervisord", "-c", "/etc/supervisord.conf"]
imagePullPolicy: Always
securityContext:
runAsUser: 1025
privileged: false
allowPrivilegeEscalation: true
capabilities:
drop:
- ALL
add:
- SYS_TIME
...
Let us re-deploy our statefulset and check the applied capabilities:
]# kubectl exec -it test-statefulset-1 -n testns -- capsh --print Current: = cap_sys_time+i Bounding set =cap_sys_time Securebits: 00/0x0/1'b0 secure-noroot: no (unlocked) secure-no-suid-fixup: no (unlocked) secure-keep-caps: no (unlocked) uid=1025(user1) gid=1025(user1) groups=
As expected, the container has dropped all the other capabilities and only applied SYS_TIME
.
How to check the list of capabilities applied to a container inside Kubernetes Pod
Let me show you different ways to get the list of capabilities applied to your Kubernetes Pod's container:
Method-1: Check the list of Linux capabilities in a container using capsh --print command
We will use capsh
command to print the list of applied capabilities to any container.
[user1@test-statefulset-1 /]$ capsh --print Current: = cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_sys_admin,cap_mknod,cap_audit_write,cap_setfcap+i Bounding set =cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_sys_admin,cap_mknod,cap_audit_write,cap_setfcap
Here, we have two fields:
Current: This field contains the list of capabilities currently in use by the system process
Bounding Set: Tis field contains the list of capabilities which can be used if required by any of the system or application process
You may also notice +i
at the end of Current set of capabilities. These are Thread Capability Set, there are three different types of thread capability set which can be defined or allocated:
- Effective - the capabilities used by the kernel to perform permission checks for the thread.
- Permitted - the capabilities that the thread may assume (i.e., a limiting superset for the effective and inheritable sets). If a thread drops a capability from its permitted set, it can never re-acquire that capability (unless it exec()s a set-user-ID-root program).
- inheritable - the capabilities preserved across an execve(2). A child created via fork(2) inherits copies of its parent's capability sets. See below for a discussion of the treatment of capabilities during exec(). Using capset(2), a thread may manipulate its own capability sets, or, if it has the CAP_SETPCAP capability, those of a thread in another process.
Method-2: Check applied capabilities per process
The above command was showing us system wide Linux capabilities, we can also list the capabilities which are being used by individual process. For example, on my container I have the following process running:
[user1@test-statefulset-1 /]$ ps -ef
UID PID PPID C STIME TTY TIME CMD
user1 1 0 0 17:38 ? 00:00:00 /usr/bin/python /usr/bin/supervisord -c /etc/supervisord.conf
user1 9 1 0 17:38 ? 00:00:00 /usr/sbin/rsyslogd -n -f /tmp/rsyslog.conf -i /tmp/rsyslog.pid
root 10 1 0 17:38 ? 00:00:00 /usr/sbin/sshd -D -f /opt/ssh/sshd_config -p 5022 -E /tmp/sshd.log
user1 643 0 0 17:48 pts/0 00:00:00 bash
user1 1214 643 0 17:58 pts/0 00:00:00 ps -ef
Now I want to check the list of capabilities used by my SSHD process which has PID 10.
[user1@test-statefulset-1 /]$ grep Cap /proc/10/status CapInh: 00000000a82425fb CapPrm: 00000000a82425fb CapEff: 00000000a82425fb CapBnd: 00000000a82425fb CapAmb: 0000000000000000
Here,
- CapInh = Inherited capabilities
- CapPrm – Permitted capabilities
- CapEff = Effective capabilities
- CapBnd = Bounding set
- CapAmb = Ambient capabilities set
So we get some hex code value for different capabilities. To convert the hexcode into actual human readable format of capabilities we will use following command:
[user1@test-statefulset-1 /]$ capsh --decode=00000000a82425fb
0x00000000a82425fb=cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_sys_admin,cap_mknod,cap_audit_write,cap_setfcap
So, now we have the list of capabilities used by the SSHD process.
How to assign Linux capability to individual file or binary (setcap)
By default many Linux system binaries will have some capabilities assigned to them. You can check this using getcap command. For example to check the list of capability assigned to ping command we can use:
[user1@test-statefulset-1 /]$ getcap `which ping`
/usr/bin/ping = cap_net_admin,cap_net_raw+p
So ping command requires cap_net_admin
and cap_net_raw
to be able to function properly.
Let's use ping with the default capabilities:
[user1@test-statefulset-1 /]$ capsh -- -c "/bin/ping -c 1 localhost"
PING localhost (127.0.0.1) 56(84) bytes of data.
64 bytes from localhost (127.0.0.1): icmp_seq=1 ttl=64 time=0.018 ms
--- localhost ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.018/0.018/0.018/0.000 ms
This seems to be working, let's try the same command but without cap_net_admin capability:
[user1@test-statefulset-1 /]$ capsh --drop=cap_net_admin -- -c "/bin/ping -c 1 localhost"
unable to raise CAP_SETPCAP for BSET changes: Operation not permitted
As you can see, ping command fails to execute with Operation not permitted error.
To add capability to any file we can use setcap
command. Let us add some capability to /usr/sbin/sshd
binary, currently as you can see there are no capabilities assigned to this binary:
[user1@test-statefulset-1 /]$ getcap /usr/sbin/sshd
Next I will add NET_ADMIN
capability to this binary file:
[user1@test-statefulset-1 /]$ setcap cap_net_admin+i /usr/sbin/sshd
Verify the same again:
[user1@test-statefulset-1 /]$ getcap /usr/sbin/sshd
/usr/sbin/sshd = cap_net_admin+i
Summary
In this tutorial we explored different areas related to Kubernetes SecurityContext Capabilities. We covered following topics in this article:
- Create a privileged and non-privileged container inside a Kubernetes Pod.
- How to add or drop all the capabilities from a Pod.
- How to add single or pre-defined set of capabilities to a container
- Understanding more about Linux Capabilities
- How to check if capabilities are assigned to a container
Further Readings
man page for capabilities
Linux Capabilities In Practice
man page for setcap
Hi All,
I want to know how to create a yaml for a StatefulSet pod with CentOS 8, MariaDB 10.6.5, Apache latest. It should run with system enabled. It should use the storage class from Ceph Storage . I also want all the pods to communicate each other internal and also able to access from ssh command externally using load balance.
I have tried many sites, but not able to create with systemd.
Please help me to create one for production use case.
Regards,
Mark.
The requirement sounds like a complete production architecture for end to end design.
Answering the first question. I hope you have a docker registry, if not you can sign up at dockerhub where you can create your own private/public registry,
Next connect to your docker hub using any Linux box and pull the centos 8 image
Connect to centos8 image and do any modification you want i,e. enabling systemd or installing packages
Next you can either save the image as archive or push it your registry
Now you can use this image to do all sorts of deployment in kubernetes.
Hi Admin,
Thanks for the reply.
I have local docker registry
I have the deployments running for production as Deployments and not StatefulSets. I do not have any issue over there in Deployment.
Here in StatefulSet, if I pull the image with the yaml configuration. Then same image which is working with systemd command in Deployment doesn’t work with the StatefulSet.
Using deployment I am able to access the pods from external via exposed IP through MetalLB load balancer.
When I try the same yaml for StatefulSets, it doesn’t connect as I was not able to start the SSHD service due to systemd command not working in StatefulSets.
Can you please help me with the yaml file on how to create a StatefulSet with systemd command working in the pod will be of grate help, then I will be able to expose the pods externally via load balancer.
My main agenda is to have a StatefulSet for production environment, able to scale based on load and also it should exposed with static IP so that the web applications can connect to the MariaDB and store data to PVC.
I don’t see how systemd and StatefulSet/Deployment are directly related? You can give some details on the error you are facing.
We are using sshd with both statefulset and deployments. Although we use supervisord but systemd is far more flexible and better then supervisord
You can check this article on how sshd can be configured as non-root user using systemd or supervisord inside Kubernetes container. The same can be used for root user as well
SOLVED: Run SSHD as non-root user (without sudo) in Linux
I will not be able to approve your last comment because it is very big and may not be useful for others. But I noticed that you are removing all systemd related files from the image and then starting your container with supervisord?
If your requirement is to use systemd then you would have to have a privileged container and you should not delete the systemd files.
Also avoid using supervisord in that case.
You can share the same data as an attachment via mail at admin@golinuxcloud.com and let me see if I can patch your YAML file