How to limit Kubernetes resources (CPU & Memory)

By default when you create a Pod, it is entitled to use all the system resources from the host machine unless you have added a limit to the allowed resources or Container is running in a namespace that has a default CPU limit, and the Container is automatically assigned the default limit. Now a Linux kernel has cgroups which can be used to limit CPU and Memory. The docker run uses cgroup to implement these limitations. So when you specify a Pod, you can optionally also provide resource limit which may be required by the Container to avoid over utilization. The most common resources to specify are CPU and memory (RAM).

When we implement such resource limit, then at the stage of creating Pod, the scheduler uses this information to decide which node to place the Pod on based on the availability of the resource. The kubelet also reserves at least therequestamount of that system resource specifically for that container to use.

Different Kubernetes resource types

At the time of writing this tutorial, there were there different resource types for which requests and limits could be imposed on a Pod and Container:

CPU
Memory
Hugepages (Kubernetes v1.14 or newer)

CPU and memory are collectively referred to as compute resources, or just resources. Compute resources are measurable quantities that can be requested, allocated, and consumed

Resource requests and limits for Pod and Container

Normally when we talk about such threshold limitation, we must have a soft and hard limit. So, we can define a soft limit value for the allowed resources for individual Pod and Containers and an upper limit above which the usage would be denied. In Kubernetes such soft limit is defined as requests while the hard limit is defined as limits.

If the node where a Pod is running has enough of a resource available, it's possible (and allowed) for a container to use more resource than its request for that resource specifies. However, a container is not allowed to use more than its resource limit.

Each Container of a Pod can specify one or more of the following:

spec.containers[].resources.limits.cpu
spec.containers[].resources.limits.memory
spec.containers[].resources.limits.hugepages-<size>
spec.containers[].resources.requests.cpu
spec.containers[].resources.requests.memory
spec.containers[].resources.requests.hugepages-<size>

To understand how hugepages-2Mi / hugepages-1Gi relate to the underlying Linux mechanisms, why they cannot be overcommitted, and how to keep a Pod from staying Pending, see HugePages vs Transparent HugePages.

NOTE

If a Container specifies its own memory limit, but does not specify a memory request, Kubernetes automatically assigns a memory request that matches the limit. Similarly, if a Container specifies its own CPU limit, but does not specify a CPU request, Kubernetes automatically assigns a CPU request that matches the limit.

Understanding resource units

There is a different unit which us used in Kubernetes to measure CPU and Memory:

Defining CPU limit

Limits and requests for CPU resources are measured in cpu units.
One cpu, in Kubernetes, is equivalent to 1 vCPU/Core for cloud providers and 1 hyperthread on bare-metal Intel processors.
Whenever we specify CPU requests or limits, we specify them in terms of CPU cores.
Because often we want to request or limit the use of a pod to some fraction of a whole CPU core, we can either specify this fraction of a CPU as a decimal or as a millicore value.
For example, a value of 0.5 represents half of a core.
It is also possible to configure requests or limits with a millicore value. As there are 1,000 millicores to a single core, we could specify half a CPU as 500 m.
The smallest amount of CPU that can be specified is 1 m or 0.001.

Defining Memory limit

Limits and requests for memory are measured in bytes.
You can express memory as a plain integer or as a fixed-point number using one of these suffixes: E, P, T, G, M, K.
You can also use the power-of-two equivalents: Ei, Pi, Ti, Gi, Mi, Ki

Memory units supported by Kubernetes

Name	Bytes	Suffix	Name	Bytes	Suffix
kulobyte	1000	K	kibibyte	1024	Ki
megabyte	1000*2	M	mebibyte	1024*2	Mi
gigabyte	1000*3	G	gibibyte	1024*3	Gi
terabyte	1000*4	T	tebibyte	1024*4	Ti
petabyte	1000*5	P	pebibyte	1024*5	Pi
exayte	1000*6	E	exbibyte	1024*6	Ei

How pods with resource limits are managed

When the Kubelet starts a container, the CPU and memory limits are passed to the container runtime, which is then responsible for managing the resource usage of that container.
If you are using Docker, the CPU limit (in milicores) is multiplied by 100 to give the amount of CPU time the container will be allowed to use every 100 ms. If the CPU is under load, once a container has used its quota it will have to wait until the next 100 ms period before it can continue to use the CPU.
The method used to share CPU resources between different processes running in cgroups is called theCompletely Fair SchedulerorCFS; this works by dividing CPU time between the different cgroups.
This typically means assigning a certain number of slices to a cgroup. If the processes in one cgroup are idle and don't use their allocated CPU time, these shares will become available to be used by processes in other cgroups.
If memory limits are reached, the container runtime will kill the container (and it might be restarted) with OOM.
If a container is using more memory than the requested amount, it becomes a candidate for eviction if and when the node begins to run low on memory.

Here is an example of a container being killed with OOM:

Nov 28 23:27:36 worker-1.example.com kernel: Memory cgroup out of memory: Kill process 1331 (mysqld) score 2250 or sacrifice child
Nov 28 23:27:36 worker-1.example.com kernel: Killed process 1331 (mysqld) total-vm:1517000kB, anon-rss:126500kB, file-rss:42740kB, shmem-rss:0kB

Example: Define CPU and Memory limit for containers

It is always a good idea to use a separate namespace when defining resource limits so that the resources you create in this exercise are isolated from the rest of your cluster.

[root@controller ~]# kubectl create namespace cpu-limit
namespace/cpu-limit created

Verify the newly created NS

[root@controller ~]# kubectl get ns
NAME              STATUS   AGE
cpu-limit         Active   46s
default           Active   24h
kube-node-lease   Active   24h
kube-public       Active   24h
kube-system       Active   24h

In this example we will create a single Pod with 2 containers having MySQL database and wordpress with some CPU and Memory reservation.

[root@controller ~]# cat pod-resource-limit.yml
apiVersion: v1
kind: Pod
metadata:
  name: frontend
  namespace: cpu-limit
spec:
  containers:
  - name: db
    image: mysql
    env:
    - name: MYSQL_ROOT_PASSWORD
      value: "password"
    resources:
      requests:
        memory: "64Mi"
        cpu: "250m"
      limits:
        memory: "128Mi"
        cpu: "500m"
  - name: wp
    image: wordpress
    resources:
      requests:
        memory: "64Mi"
        cpu: "250m"
      limits:
        memory: "128Mi"
        cpu: "500m"

Next we create the pod using this YAML file:

[root@controller ~]# kubectl create -f pod-resource-limit.yml
pod/frontend created

Monitor the status of the newly created containers:

[root@controller ~]# kubectl get pods  -n cpu-limit -o wide
NAME                         READY   STATUS              RESTARTS   AGE   IP          NODE                   NOMINATED NODE   READINESS GATES
frontend                     0/2     ContainerCreating   0          38s   <none>      worker-1.example.com   <none>           <none>

It may take some time to create the containers, verify the same in few seconds:

[root@controller ~]# kubectl get pods -o wide
NAME                        READY   STATUS    RESTARTS   AGE    IP          NODE                   NOMINATED NODE   READINESS GATES
frontend                    2/2     Running   1          3m6s   10.36.0.2   worker-1.example.com   <none>           <none>

You can check the details of the Pod using kubectl describe:

[root@controller ~]# kubectl describe pods frontend
Name:         frontend
Namespace:    cpu-limit
Priority:     0
Node:         worker-1.example.com/192.168.43.49
Start Time:   Sat, 28 Nov 2020 23:31:10 +0530
Labels:       <none>
Annotations:  <none>
Status:       Running
IP:           10.36.0.2
IPs:
  IP:  10.36.0.2
Containers:

...

    State:          Running
      Started:      Sat, 28 Nov 2020 23:31:51 +0530
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     500m
      memory:  128Mi
    Requests:
      cpu:        250m
      memory:     64Mi
    Environment:  <none>

If you do not specify a CPU limit

If you do not specify a CPU limit for a Container, then one of these situations applies:

The Container has no upper bound on the CPU resources it can use. The Container could use all of the CPU resources available on the Node where it is running.
The Container is running in a namespace that has a default CPU limit, and the Container is automatically assigned the default limit. Cluster administrators can use a LimitRange to specify a default value for the CPU limit.

Deleting Pod

To delete your pod part of cpu-limit namespace you can use:

[root@controller ~]# kubectl delete pods -n cpu-limit frontend
pod "frontend" deleted

Conclusion

In this Kubernetes Tutorial we learned about allocating resource limit to Pod containers to restrict different resource usage such as CPU, memory and hugepages. You can define a soft and hard limit while restricting the resources for individual containers. Alternatively you can allocate resource quota to namespace and then define LimitRange in which case all the containers within the respective namespace will get the default resource from the provided LimitRange.