Overview on Pod Security Policy in Kubernetes
PSP is short abbreviation used for Pod Security Policy in Kubernetes. PSP is a cluster scoped resource which checks for a set of conditions before a pod is admitted and scheduled to run in a cluster. This is achieved via a Kubernetes admission controller which evaluates every pod creation request for compliance with the PSP assigned to the pod.
PSP allows you to control:
- Running of privileged containers
- Usage of host namespaces
- Usage of host networking and ports
Following Table explains different fields used with PSP
Field | Usage |
---|---|
privileged |
Allow containers to gain capabilities which include access to host mounts, filesystem to change settings and many more privileged capabilities. |
hostPID, hostIPC |
Container has access to host namespaces where process and ethernet interfaces are visible to container |
hostNetwork, hostPorts |
Container has IP access to the host network and ports. |
volumes |
Allow volumes types like configMap, emtyDir or secret |
allowedHostPaths |
Allow whitelist of host paths that are allowed to be used by hostPath volumes i.e. /tmp |
allowedFlexVolumes |
Allow specific FlexVolume drivers i.e. azure/kv |
fsGroup |
You can set a GID or range of GID which owns the pod’s volumes. |
readOnlyRootFilesystem |
Set containers root file system as read only |
runAsUser, runAsGroup, supplementalGroups |
Defines containers UID and GID. Here you can specify non root user or groups |
allowPrivilegeEscalation, defaultAllowPrivilegeEscalation |
Restricting privilege escalation by process |
defaultAddCapabilities, requiredDropCapabilities, allowedCapabilities |
You can add or drop linux capabilities as needed. |
SELinux |
Define SELinux context of the container |
allowedProcMountTypes |
Define SELinux context of the container |
forbiddenSysctls,allowedUnsafeSysctls |
Sysctl profile used by container |
annotations |
AppArmor and seccomp profiles used by containers |
Getting started with Kubernetes Pod Security Policy
In this section we will look into different Pod Security Policy examples and understand the different fields which are used in the definition YAML file:
Example-1: Restrict hostIPC, hostPID, hostNetwork and hostPorts using PSP
The following listing shows a sample PodSecurityPolicy, which prevents pods from using the host’s IPC, PID, and Network namespaces, and prevents running privileged containers and the use of most host ports (except ports from 10000-11000 and 13000-14000). The policy doesn’t set any constraints on what users, groups, or SELinux groups the container can run as.
apiVersion: extensions/v1beta1
kind: PodSecurityPolicy
metadata:
name: default
spec:
hostIPC: false 1
hostPID: false 1
hostNetwork: false 1
hostPorts: 2
- min: 10000 2
max: 11000 2
- min: 13000 2
max: 14000 2
privileged: false 3
readOnlyRootFilesystem: true 4
runAsUser: 5
rule: RunAsAny 5
fsGroup: 5
rule: RunAsAny 5
supplementalGroups: 5
rule: RunAsAny 5
seLinux: 6
rule: RunAsAny 6
volumes: 7
- '*' 7
Here,
- Containers aren’t allowed to use the host’s IPC, PID, or network namespace.
- They can only bind to host ports 10000 to 11000 (inclusive) or host ports 13000 to 14000.
- Containers cannot run in privileged mode.
- Containers are forced to run with a read-only root filesystem.
- Containers can run as any user and any group.
- They can also use any SELinux groups they want.
- All volume types can be used in pods.
Example-2: Restrict runAsUser, fsGroup, and supplementalGroups using PSP
The policy in the previous example doesn’t impose any limits on which users and groups containers can run as, because you’ve used the RunAsAny
rule for the run-As-User, fsGroup
, and supplementalGroups
fields. If you want to constrain the list of allowed user or group IDs, you change the rule to MustRunAs and specify the range of allowed IDs.
runAsUser:
rule: MustRunAs
ranges:
- min: 2 1
max: 2 1
fsGroup:
rule: MustRunAs
ranges:
- min: 2 2
max: 10 2
- min: 20 2
max: 30 2
supplementalGroups:
rule: MustRunAs
ranges:
- min: 2 2
max: 10 2
- min: 20 2
max: 30 2
Here,
- Add a single range with min equal to max to set one specific ID.
- Multiple ranges are supported—here, group IDs can be 2–10 or 20–30 (inclusive).
If the pod spec tries to set either of those fields to a value outside of these ranges, the pod will not be accepted by the API server
Example-3: Restrict allowed, default, and disallowed capabilities using PSP
By now you must already be familiar that containers can run in privileged mode or not, and you can define a more fine-grained permission configuration by adding or dropping Linux kernel capabilities in each container. Three fields influence which capabilities containers can or cannot use:
- allowedCapabilities field is used to specify which capabilities pod authors can add in the securityContext.capabilities field in the container spec
- defaultAddCapabilities field is used to add default capabilities which will be deployed with every container
- requiredDropCapabilities field contains the list of capabilities which will be dropped automatically from every controller.
Here, we have specified certain capabilities in PodSecurityPolicy, you can check the man page to get the complete list of capabilities which you can use with PSP:
apiVersion: extensions/v1beta1
kind: PodSecurityPolicy
spec:
allowedCapabilities: 1
- SYS_TIME 1
defaultAddCapabilities: 2
- CHOWN 2
requiredDropCapabilities: 3
- SYS_ADMIN 3
- SYS_MODULE 3
...
Here,
- Allow containers to add the SYS_TIME capability.
- Automatically add the CHOWN capability to every container.
- Require containers to drop the SYS_ADMIN and SYS_MODULE capabilities.
Example-4: Restricting the types of volumes which any pod can use
The last thing a PodSecurityPolicy resource can do is define which volume types users can add to their pods. At the minimum, a PodSecurityPolicy should allow using at least the emptyDir
, configMap
, secret
, downwardAPI
, and the persistentVolumeClaim
volumes.
kind: PodSecurityPolicy
spec:
volumes:
- emptyDir
- configMap
- secret
- downwardAPI
- persistentVolumeClaim
If multiple PodSecurityPolicy resources are in place, pods can use any volume type defined in any of the policies (the union of all volumes lists is used).
Lab Environment
I have already deployed a multi-node kubernetes cluster in my previous tutorial, so I will use the same setup for the demonstration.
Workflow to create Pod Security Policy
You must follow the following flow to create pod security policy in Kubernetes.
- Create a PSP.
- Create
ClusterRole
with the ‘use’ verb which authorizes pod deployment controllers to use the policies. - Create
ClusterRoleBindings
which is used to enforce policy for the groups (i.e.system:authenticated
orsystem:unauthenticated
) or SA (Service Accounts).
Step-1: Create Pod Security Policy
In this section we will go ahead and create our first Pod Security Policy on the Kubernetes Cluster. Following is the content of my restricted-psp.yaml
file:
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
name: restricted-psp
spec:
# Required to prevent escalations to root.
privileged: false
# This is redundant with non-root + disallow privilege escalation,
# but we can provide it for defense in depth.
allowPrivilegeEscalation: false
# Drop all capabilities
requiredDropCapabilities:
- ALL
# Allow core volume types.
volumes:
- 'configMap'
- 'emptyDir'
- 'projected'
- 'secret'
- 'downwardAPI'
# Assume that persistentVolumes set up by the cluster admin are safe to use.
- 'persistentVolumeClaim'
hostNetwork: false
hostIPC: false
hostPID: false
runAsUser:
# Require the container to run without root privileges.
rule: 'MustRunAsNonRoot'
seLinux:
# This policy assumes the nodes are using AppArmor rather than SELinux.
rule: 'RunAsAny'
supplementalGroups:
rule: 'MustRunAs'
ranges:
# Forbid adding the root group.
- min: 1
max: 65535
fsGroup:
rule: 'MustRunAs'
ranges:
# Forbid adding the root group.
- min: 1
max: 65535
readOnlyRootFilesystem: false
I have already added enough comments in each field to explain the purpose. Let us go ahead and create this PSP:
]# kubectl create -f restricted-psp.yaml
podsecuritypolicy.policy/restricted-psp created
List the created pod security policy:
]# kubectl get psp | grep -E 'PRIV|restricted-psp'
NAME PRIV CAPS SELINUX RUNASUSER FSGROUP SUPGROUP READONLYROOTFS VOLUMES
restricted-psp false RunAsAny MustRunAsNonRoot MustRunAs MustRunAs false configMap,emptyDir,projected,secret,downwardAPI,persistentVolumeClaim
So our PSP has been successfully created.
Step-2: Create Cluster Role
Next we will create the Cluster Role that needs to grant access to use the desired policies. Here is the content of my Cluster Role YAML file restricted-psp-role.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: restricted-psp
rules:
- apiGroups:
# Use 'policy' apiGroup used for PodSecurityPolicy Resource
- policy
resourceNames:
# Name of the Pod Security Policy, you can add more than one
- restricted-psp
resources:
# resource name for PodSecurityPolicy
- podsecuritypolicies
verbs:
# provide access to 'use'
- use
Create this Cluster Role:
]# kubectl create -f restricted-psp-role.yaml
clusterrole.rbac.authorization.k8s.io/restricted-psp created
List the role which we just created:
]# kubectl get clusterrole | grep restricted-psp
restricted-psp 2021-09-03T05:12:07Z
Step-3: Create Cluster Role Binding
Next we need to bind the Cluster Role using Cluster Role Binding to grant usage for pods. Here is my sample file content from restricted-psp-role-bind.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: restricted-binding-psp
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
# name of the cluster role to bind
name: restricted-psp
subjects:
# Authorize all service accounts in all namespace
- apiGroup: rbac.authorization.k8s.io
kind : Group
# You may restrict to a namespace using system:serviceaccounts:<authorized namespace>
name: system:authenticated
Let us create this Cluster Role Binding:
]# kubectl create -f restricted-psp-role-bind.yaml
clusterrolebinding.rbac.authorization.k8s.io/restricted-binding-psp created
List the ClusterRoleBinding which we created above:
]# kubectl get clusterrolebinding | grep -i restricted-binding
restricted-binding-psp ClusterRole/restricted-psp 2m28s
Now let us go ahead and cover different examples to create Kubernetes Cluster Resources using Security Context to start Pod with limited privilege and capabilities.
Step-4: Verify Pod Security Policy using StatefulSet
Create StatefulSet
In this example I will try to create a non privileged pod which would start with 'root' user:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: test-statefulset
namespace: deepak
spec:
selector:
matchLabels:
app: dev
serviceName: test-pod
replicas: 2
template:
metadata:
labels:
app: dev
spec:
containers:
- name: test-statefulset
image: alpine:latest
command: ["sleep", "10000"]
securityContext:
# Run the pod as root user
runAsUser: 0
# Pod will start with no privilege
privileged: false
# Allow privilege escalation
allowPrivilegeEscalation: true
# Privilege escalation is allowed but first drop all capabilities
capabilities:
drop:
- ALL
# Allow only NET_BIND_SERVICE Capability
add:
- NET_BIND_SERVICE
Let's try to create this statefulset:
]# kubectl create -f test-statefulset.yaml
statefulset.apps/test-statefulset created
If you check the status of statefulset, none of the relica pods are created:
]# kubectl get statefulset -n deepak
NAME READY AGE
test-statefulset 0/2 28s
Troubleshoot "unable to validate against any pod security policy" Errors
So this would mean that something has failed. We can use kubectl describe to get more details:
]# kubectl describe statefulset test-statefulset -n deepak
....Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedCreate 44s (x2 over 44s) statefulset-controller create Pod test-statefulset-0 in StatefulSet test-statefulset failed error: pods "test-statefulset-0" is forbidden: unable to validate against any pod security policy: [spec.containers[0].securityContext.runAsUser: Invalid value: 0: running with the root UID is forbidden spec.containers[0].securityContext.capabilities.add: Invalid value: "NET_BIND_SERVICE": capability may not be added spec.containers[0].securityContext.allowPrivilegeEscalation: Invalid value: true: Allowing privilege escalation for containers is not allowed spec.containers[0].securityContext.runAsUser: Invalid value: 0: running with the root UID is forbidden spec.containers[0].securityContext.capabilities.add: Invalid value: "NET_BIND_SERVICE": capability may not be added spec.containers[0].securityContext.allowPrivilegeEscalation: Invalid value: true: Allowing privilege escalation for containers is not allowed spec.containers[0].securityContext.runAsUser: Invalid value: 0: running with the root UID is forbidden spec.containers[0].securityContext.capabilities.add: Invalid value: "NET_BIND_SERVICE": capability may not be added spec.containers[0].securityContext.allowPrivilegeEscalation: Invalid value: true: Allowing privilege escalation for containers is not allowed spec.containers[0].securityContext.runAsUser: Invalid value: 0: running with the root UID is forbidden]
Why we are getting unable to validate against any pod security policy?
Here, our pod creation has failed because if you remember our restricted-psp does not allow pod to be running as root. Check the PSP again:
]# kubectl get psp | grep -E 'PRIV|restricted-psp' NAME PRIV CAPS SELINUX RUNASUSER FSGROUP SUPGROUP READONLYROOTFS VOLUMES restricted-psp false RunAsAny MustRunAsNonRoot MustRunAs MustRunAs false configMap,emptyDir,projected,secret,downwardAPI,persistentVolumeClaim
The RUNASUSER
field contains MustRunAsNonRoot
while we were trying to run as root user hence the above command failed to create pods.
To overcome this either
- we start the pod using non-root user
- or we modify the Pod Security Policy to allow
RUNASUSER
asRunAsAny
Delete the existing statefulset as it is in failed state:
]# kubectl delete statefulset test-statefulset -n deepak
statefulset.apps "test-statefulset" deleted
Re-create this statefulset after fixing the RunAsUser field to use non-root user:
]# kubectl create -f test-statefulset.yaml
statefulset.apps/test-statefulset created
Verify StatefulSet Status
Verify the statefulset status:
]# kubectl get statefulset -n deepak
NAME READY AGE
test-statefulset 2/2 2m1s
Verify Applied PodSecurityPolicy to the Pod
Verify the PSP applied to our Pods:
]# kubectl describe pod test-statefulset-0 -n deepak | grep psp
kubernetes.io/psp: restricted-psp
So, both our pods are in running state. Let us connect to one of the pods:
[root@ncs20fp1-w2-egress-control-02 hardening]# kubectl exec -it test-statefulset-0 -n deepak -- bash
List applied Capabilities to the container
List the applied capabilities to this Pod
[sdl@test-statefulset-0 /]$ capsh --print
Sample Output:
Since we had dropped all the capabilities from the container and added NET_BIND_SERVICE
in our statefulset YAML file so we can see that only one capability was applied to the Pod.
Limitation of Pod Security Policy
- PodSecurityPolicySpec has references to allowedCapabilities, privileged or hostNetwork. These enforcements can only work on linux based runtimes.
- If you are creating a pod using controllers (e.g. replication controller), it’s worth checking if PSPs are authorized for use by those controllers.
- Once PSPs are enabled cluster wide and a pod doesn’t start because of incorrect PSP, it becomes hectic to troubleshoot the issue. Moreover if PSPs are enabled clusterwide in production clusters you need to test each and every component in your cluster including dependencies like mutating admission controllers and conflicting verdicts.
- Azure Kubernetes Service (AKS) have deprecated support for PSPs and preferred OPA Gatekeeper for policy enforcement to support more flexible policies using OPA engine.
- PSP are deprecated and scheduled to be removed by Kubernetes 1.25.
- Kubernetes can have edge cases where PSPs can be bypassed.
Summary
Pod Security Policy is quiet vast topic where you can control different areas such as privilege, namespace, networking and ports. In this tutorial we have covered the privileged pod section with an example where we demonstrated how a pod creation can fail with the Pod doesn't match the PodSecurityPolicy fields.