Kubernetes SecurityContext Capabilities Explained [Examples]


Kubernetes Tutorial

 

Kubernetes SecurityContext Capabilities Introduction

With Kubernetes you can control the level of privilege assigned to each Pod and container. We can utilize Kubernetes SecurityContext Capabilities to add or remove Linux Capabilities from the Pod and Container so the container can be made more secure from any kind of intrusion. The Kubernetes SecurityContext Capabilities is tightly coupled with Pod Security Policy which defines the policy for the entire cluster. Later we use these policies with PSP (Pod Security Policy) to map the Pods and control the privilege.

In this tutorial we will give a brief overview on Pod Security Policy (for detailed understanding on PSP you can read my older article Create Pod Security Policy Kubernetes [Step-by-Step]). Then we will explore Kubernetes SecurityContext Capabilities in detail with multiple examples covering different scenarios.

 

Create Pod Security Policy

First we will create our Pod Security Policy which we will use through out this article. Here is my PSP definition file along with Cluster Role and Cluster Role Binding:

---
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
  name: testns-psp-01
spec:
  privileged: true
  allowPrivilegeEscalation: true
  requiredDropCapabilities:
  allowedCapabilities:
  - '*'
  defaultAddCapabilities:
  seLinux:
    rule: RunAsAny
  supplementalGroups:
    rule: RunAsAny
  runAsUser:
    rule: RunAsAny
  fsGroup:
    rule: RunAsAny
  volumes:
  - '*'
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: testns-psp-01
rules:
- apiGroups:
  - policy
  resourceNames:
  - testns-psp-01
  resources:
  - podsecuritypolicies
  verbs:
  - use

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: testns-psp-01
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: testns-psp-01
subjects:
  - kind: Group
    apiGroup: rbac.authorization.k8s.io
    name: system:authenticated
  - kind: Group
    name: system:serviceaccounts
    apiGroup: rbac.authorization.k8s.io

Here is the output of my installed PSP:

]# kubectl get psp | grep -E 'PRIV|testns'
NAME                                PRIV    CAPS               SELINUX    RUNASUSER          FSGROUP     SUPGROUP    READONLYROOTFS   VOLUMES
testns-psp-01                       false   *                  RunAsAny   MustRunAsNonRoot   RunAsAny    RunAsAny    false            *

In our Pod Security Policy we have not added any restrictions and everything is allowed basically.

 

How to create a privileged container inside a Kubernetes Pod

In this example first we will create a privileged pod which should have all the capabilities. In most of the cases following Kubernetes SecurityContext Capability definition should be enough to start a privileged pod:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: test-statefulset
  namespace: testns
spec:
  selector:
    matchLabels:
      app: dev
  serviceName: test-pod
  replicas: 2
  template:
    metadata:
      labels:
        app: dev
    spec:
      containers:
      - name: test-statefulset
        image: golinux-registry:8090/secure-context-img:latest
        command: ["supervisord", "-c", "/etc/supervisord.conf"]
        imagePullPolicy: Always
        securityContext:
          runAsUser: 1025
          ## enable privileged mode
          privileged: true

Create this statefulset:

]# kubectl create -f test-statefulset.yaml 
statefulset.apps/test-statefulset created

Check the list of allowed capabilities:

]# kubectl exec -it test-statefulset-0 -n testns -- capsh --print
Current: = cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,35,36,37+i
Bounding set =cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,35,36,37
Securebits: 00/0x0/1'b0
 secure-noroot: no (unlocked)
 secure-no-suid-fixup: no (unlocked)
 secure-keep-caps: no (unlocked)
uid=1025(user1)
gid=1025(user1)

As you can see, all the capabilities are allowed in our container.

In some cases, if you don't see all the capabilities added to your container then you can use below Kubernetes SecurityContext Capabilities:

...
        securityContext:
          runAsUser: 1025
          privileged: true
          allowPrivilegeEscalation: true
          capabilities:
            add:
             - ALL
...

This YAML file expects the respective Pod Security Policy has allowed all capabilities.

 

How to create a non-privileged container inside a Kubernetes Pod

Now you may wonder that by using privileged as true enables all the privilege so just by making it false, the pod should execute as no-privilege?

Let's try this theory using this practical example, we have updated our statefulset definition file with the following Kubernetes SecurityContext Capabilities field:

...
      containers:
      - name: test-statefulset
        image: golinux-registry:8090/secure-context-img:latest
        command: ["supervisord", "-c", "/etc/supervisord.conf"]
        imagePullPolicy: Always
        securityContext:
          runAsUser: 1025
          privileged: false
          allowPrivilegeEscalation: false
...

So, basically I have disabled privilege and any kind of privilege escalation inside the container. Once we create this statefulset, let's verify the available capabilities on the pod:

]# kubectl exec -it test-statefulset-0 -n testns -- capsh --print
Current: = cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_mknod,cap_audit_write,cap_setfcap+i
Bounding set =cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_mknod,cap_audit_write,cap_setfcap
Securebits: 00/0x0/1'b0
 secure-noroot: no (unlocked)
 secure-no-suid-fixup: no (unlocked)
 secure-keep-caps: no (unlocked)
uid=1025(user1)
gid=1025(user1)
groups=

As you can see, even with privileged: false, the container still has multiple capabilities enabled so it is actually not a non-privileged pod.

 

Solution-1: Drop all capabilities using requiredDropCapabilities inside Pod Security Policy

I would not recommend this solution because PSP are created for whole cluster and it does not make sense to disable all the privilege in the PSP just for one pod. Although you can use RBAC to limit the usage of this PSP only for certain user, in which case this method can be used.

But either way, I will share the steps to drop all the privileges using a Pod Security Policy and you may choose your preferred method.

We will edit our testns-psp-01 using kubectl edit psp testns-psp-01 -n testns command which will open the PSP definition file using your default editor. After updating the same, this is what my Kubernetes SecurityContext Capabilities looks like for the PSP:

...
spec:
  allowPrivilegeEscalation: false
  fsGroup:
    rule: RunAsAny
  requiredDropCapabilities:
  - ALL
  runAsUser:
    rule: MustRunAsNonRoot
  seLinux:
    rule: RunAsAny
  supplementalGroups:
    rule: RunAsAny
  volumes:
  - '*'

So, basically I have removed the allowedCapabilities section and added requiredDropCapabilities field which will drop all the default capabilities from the container inside the Pod.

We will re-deploy our statefulset to pick up the new changes. Next verify the available capabilities inside the container:

]# kubectl exec -it test-statefulset-1 -n testns -- capsh --print
Current: =
Bounding set =
Securebits: 00/0x0/1'b0
 secure-noroot: no (unlocked)
 secure-no-suid-fixup: no (unlocked)
 secure-keep-caps: no (unlocked)
uid=1025(user1)
gid=1025(user1)
groups=

Now as you can see, we cant see any capabilities assigned to our container. So now this is a proper non-privileged container inside a Kubernetes Pod

 

Solution-2: Using Kubernetes SecurityContext Capabilities in the Pod definition file

Next we will use the Pod definition file to start a non-privileged container by using Kubernetes SecurityContext Capabilities field. In addition to privileged: false, we must explicitly drop all the capabilities as shown below:

...
      containers:
      - name: test-statefulset
        image: golinux-registry:8090/secure-context-img:latest
        command: ["supervisord", "-c", "/etc/supervisord.conf"]
        imagePullPolicy: Always
        securityContext:
          runAsUser: 1025
          privileged: false
          allowPrivilegeEscalation: false
          capabilities:
            drop:
             - ALL
...

Let us re-deploy our statefulset and verify the applied Linux capabilities inside the container:

]# kubectl exec -it test-statefulset-1 -n testns -- capsh --print
Current: =
Bounding set =
Securebits: 00/0x0/1'b0
 secure-noroot: no (unlocked)
 secure-no-suid-fixup: no (unlocked)
 secure-keep-caps: no (unlocked)
uid=1025(user1)
gid=1025(user1)
groups=

So as expected, the container has dropped all the capabilities and can be used as a non-privileged container in Kubernetes Pod.

 

How to assign limited Linux capabilities to a container inside Kubernetes Pod

Now that we know how to have a privileged and non-privileged pod, let me show you some example to create a pod with limited privilege.

In this example we will only add SYS_TIME capability to our container inside the Kubernetes Pod. To achieve this, I have modified my Pod Security Policy to allow privileged pods and allow all capabilities to be added. We don't want to restrict this at PSP level, rather we will control this at Pod level.

]# kubectl get psp | grep -E 'PRIV|testns'
NAME                                PRIV    CAPS               SELINUX    RUNASUSER          FSGROUP     SUPGROUP    READONLYROOTFS   VOLUMES
testns-psp-01                       true    *                  RunAsAny   MustRunAsNonRoot   RunAsAny    RunAsAny    false            *

 

Here is the snippet of my Kubernetes SecurityContext Capabilities which I will use to first drop all the capabilities and then only add SYS_TIME capability

IMPORTANT NOTE:

Here the order is very important, if you first provide the add field with SYS_TIME and then later provide the drop ALL field then all the capabilities would be dropped from the container. So, make sure you use drop first followed by add.
...
    spec:
      containers:
      - name: test-statefulset
        image: golinux-registry:8090/secure-context-img:latest
        command: ["supervisord", "-c", "/etc/supervisord.conf"]
        imagePullPolicy: Always
        securityContext:
          runAsUser: 1025
          privileged: false
          allowPrivilegeEscalation: true
          capabilities:
            drop:
             - ALL
            add:
             - SYS_TIME
...

Let us re-deploy our statefulset and check the applied capabilities:

]# kubectl exec -it test-statefulset-1 -n testns -- capsh --print
Current: = cap_sys_time+i
Bounding set =cap_sys_time
Securebits: 00/0x0/1'b0
 secure-noroot: no (unlocked)
 secure-no-suid-fixup: no (unlocked)
 secure-keep-caps: no (unlocked)
uid=1025(user1)
gid=1025(user1)
groups=

As expected, the container has dropped all the other capabilities and only applied SYS_TIME.

 

How to check the list of capabilities applied to a container inside Kubernetes Pod

Let me show you different ways to get the list of capabilities applied to your Kubernetes Pod's container:

 

Method-1: Check the list of Linux capabilities in a container using capsh --print command

We will use capsh command to print the list of applied capabilities to any container.

[user1@test-statefulset-1 /]$ capsh --print
Current: = cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_sys_admin,cap_mknod,cap_audit_write,cap_setfcap+i
Bounding set =cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_sys_admin,cap_mknod,cap_audit_write,cap_setfcap

Here, we have two fields:
Current: This field contains the list of capabilities currently in use by the system process
Bounding Set: Tis field contains the list of capabilities which can be used if required by any of the system or application process

You may also notice +i at the end of Current set of capabilities. These are Thread Capability Set, there are three different types of thread capability set which can be defined or allocated:

  • Effective - the capabilities used by the kernel to perform permission checks for the thread.
  • Permitted - the capabilities that the thread may assume (i.e., a limiting superset for the effective and inheritable sets). If a thread drops a capability from its permitted set, it can never re-acquire that capability (unless it exec()s a set-user-ID-root program).
  • inheritable - the capabilities preserved across an execve(2). A child created via fork(2) inherits copies of its parent's capability sets. See below for a discussion of the treatment of capabilities during exec(). Using capset(2), a thread may manipulate its own capability sets, or, if it has the CAP_SETPCAP capability, those of a thread in another process.

 

Method-2: Check applied capabilities per process

The above command was showing us system wide Linux capabilities, we can also list the capabilities which are being used by individual process. For example, on my container I have the following process running:

[user1@test-statefulset-1 /]$ ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
user1          1     0  0 17:38 ?        00:00:00 /usr/bin/python /usr/bin/supervisord -c /etc/supervisord.conf
user1          9     1  0 17:38 ?        00:00:00 /usr/sbin/rsyslogd -n -f /tmp/rsyslog.conf -i /tmp/rsyslog.pid
root        10     1  0 17:38 ?        00:00:00 /usr/sbin/sshd -D -f /opt/ssh/sshd_config -p 5022 -E /tmp/sshd.log
user1        643     0  0 17:48 pts/0    00:00:00 bash
user1       1214   643  0 17:58 pts/0    00:00:00 ps -ef

Now I want to check the list of capabilities used by my SSHD process which has PID 10.

[user1@test-statefulset-1 /]$ grep Cap /proc/10/status 
CapInh:	00000000a82425fb
CapPrm:	00000000a82425fb
CapEff:	00000000a82425fb
CapBnd:	00000000a82425fb
CapAmb:	0000000000000000

Here,

  • CapInh = Inherited capabilities
  • CapPrm – Permitted capabilities
  • CapEff = Effective capabilities
  • CapBnd = Bounding set
  • CapAmb = Ambient capabilities set

So we get some hex code value for different capabilities. To convert the hexcode into actual human readable format of capabilities we will use following command:

[user1@test-statefulset-1 /]$ capsh --decode=00000000a82425fb
0x00000000a82425fb=cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_sys_admin,cap_mknod,cap_audit_write,cap_setfcap

So, now we have the list of capabilities used by the SSHD process.

 

How to assign Linux capability to individual file or binary (setcap)

By default many Linux system binaries will have some capabilities assigned to them. You can check this using getcap command. For example to check the list of capability assigned to ping command we can use:

[user1@test-statefulset-1 /]$ getcap `which ping`
/usr/bin/ping = cap_net_admin,cap_net_raw+p

So ping command requires cap_net_admin and cap_net_raw to be able to function properly.

Let's use ping with the default capabilities:

[user1@test-statefulset-1 /]$ capsh -- -c "/bin/ping -c 1 localhost"
PING localhost (127.0.0.1) 56(84) bytes of data.
64 bytes from localhost (127.0.0.1): icmp_seq=1 ttl=64 time=0.018 ms

--- localhost ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.018/0.018/0.018/0.000 ms

This seems to be working, let's try the same command but without cap_net_admin capability:

[user1@test-statefulset-1 /]$ capsh --drop=cap_net_admin -- -c "/bin/ping -c 1 localhost"
unable to raise CAP_SETPCAP for BSET changes: Operation not permitted

As you can see, ping command fails to execute with Operation not permitted error.

To add capability to any file we can use setcap command. Let us add some capability to /usr/sbin/sshd binary, currently as you can see there are no capabilities assigned to this binary:

[user1@test-statefulset-1 /]$ getcap /usr/sbin/sshd

Next I will add NET_ADMIN capability to this binary file:

[user1@test-statefulset-1 /]$ setcap cap_net_admin+i /usr/sbin/sshd

Verify the same again:

[user1@test-statefulset-1 /]$ getcap /usr/sbin/sshd
/usr/sbin/sshd = cap_net_admin+i

 

Summary

In this tutorial we explored different areas related to Kubernetes SecurityContext Capabilities. We covered following topics in this article:

  • Create a privileged and non-privileged container inside a Kubernetes Pod.
  • How to add or drop all the capabilities from a Pod.
  • How to add single or pre-defined set of capabilities to a container
  • Understanding more about Linux Capabilities
  • How to check if capabilities are assigned to a container

 

Further Readings

man page for capabilities
Linux Capabilities In Practice
man page for setcap

 

Deepak Prasad

Deepak Prasad

He is the founder of GoLinuxCloud and brings over a decade of expertise in Linux, Python, Go, Laravel, DevOps, Kubernetes, Git, Shell scripting, OpenShift, AWS, Networking, and Security. With extensive experience, he excels in various domains, from development to DevOps, Networking, and Security, ensuring robust and efficient solutions for diverse projects. You can connect with him on his LinkedIn profile.

Can't find what you're searching for? Let us assist you.

Enter your query below, and we'll provide instant results tailored to your needs.

If my articles on GoLinuxCloud has helped you, kindly consider buying me a coffee as a token of appreciation.

Buy GoLinuxCloud a Coffee

For any other feedbacks or questions you can send mail to admin@golinuxcloud.com

Thank You for your support!!

5 thoughts on “Kubernetes SecurityContext Capabilities Explained [Examples]”

  1. Hi All,
    I want to know how to create a yaml for a StatefulSet pod with CentOS 8, MariaDB 10.6.5, Apache latest. It should run with system enabled. It should use the storage class from Ceph Storage . I also want all the pods to communicate each other internal and also able to access from ssh command externally using load balance.

    I have tried many sites, but not able to create with systemd.
    Please help me to create one for production use case.

    Regards,
    Mark.

    Reply
    • The requirement sounds like a complete production architecture for end to end design.
      Answering the first question. I hope you have a docker registry, if not you can sign up at dockerhub where you can create your own private/public registry,
      Next connect to your docker hub using any Linux box and pull the centos 8 image
      Connect to centos8 image and do any modification you want i,e. enabling systemd or installing packages
      Next you can either save the image as archive or push it your registry
      Now you can use this image to do all sorts of deployment in kubernetes.

      Reply
      • Hi Admin,

        Thanks for the reply.

        I have local docker registry
        I have the deployments running for production as Deployments and not StatefulSets. I do not have any issue over there in Deployment.

        Here in StatefulSet, if I pull the image with the yaml configuration. Then same image which is working with systemd command in Deployment doesn’t work with the StatefulSet.
        Using deployment I am able to access the pods from external via exposed IP through MetalLB load balancer.

        When I try the same yaml for StatefulSets, it doesn’t connect as I was not able to start the SSHD service due to systemd command not working in StatefulSets.

        Can you please help me with the yaml file on how to create a StatefulSet with systemd command working in the pod will be of grate help, then I will be able to expose the pods externally via load balancer.

        My main agenda is to have a StatefulSet for production environment, able to scale based on load and also it should exposed with static IP so that the web applications can connect to the MariaDB and store data to PVC.

        Reply
        • I don’t see how systemd and StatefulSet/Deployment are directly related? You can give some details on the error you are facing.
          We are using sshd with both statefulset and deployments. Although we use supervisord but systemd is far more flexible and better then supervisord
          You can check this article on how sshd can be configured as non-root user using systemd or supervisord inside Kubernetes container. The same can be used for root user as well
          SOLVED: Run SSHD as non-root user (without sudo) in Linux

          Reply
        • I will not be able to approve your last comment because it is very big and may not be useful for others. But I noticed that you are removing all systemd related files from the image and then starting your container with supervisord?
          If your requirement is to use systemd then you would have to have a privileged container and you should not delete the systemd files.
          Also avoid using supervisord in that case.

          You can share the same data as an attachment via mail at admin@golinuxcloud.com and let me see if I can patch your YAML file

          Reply

Leave a Comment