How to perform kubernetes health check using probes

Kubernetes provides three layers of health checking. First, in the form of HTTP or TCP checks, K8s can attempt to connect to a particular endpoint and give a status of healthy on a successful connection. Second, application-specific health checks can be performed using command-line scripts. We can also use the exec container to run a health check from within your container. Anything that exits with a 0 status will be considered healthy. We will cover them in more details and with different examples and demos in this tutorial.

Using Probes for Kubernetes Health Checks

Kubernetes has a concept of probes to provide the seam between the orchestration engine and the application developer. Kubernetes uses these probes to find out more about the inner state of the application service at hand. Probes are part of container spec and are executed locally, inside each container.

There are three probes available with Kubernetes used to perform health checks:

Liveness probe: used to continue checking the availability of a Pod
Readiness Probe: used to make sure a Pod is not published as available until the readinessProbe has been able to access it.
Startup Probe: If we define a startup probe for a container, then Kubernetes does not execute the liveness or readiness probes, as long as the container's startup probe does not succeed.

These probes further utilize different probes to get the health of the application inside the container:

tcpSocket: Just check that TCP socket for a port is successful
exec: Run a command that returns 0 for success
httpGet: an HTTP request returns a response code between 200 and 399

Configure Probes

From kubernetes.io:
You can further utilise different options in these probes spec to control the behaviour of liveness and readiness probes:

initialDelaySeconds: Number of seconds after the container has started before liveness or readiness probes are initiated. Defaults to 0 seconds. Minimum value is 0.
periodSeconds: How often (in seconds) to perform the probe. Default to 10 seconds. Minimum value is 1.
timeoutSeconds: Number of seconds after which the probe times out. Defaults to 1 second. Minimum value is 1.
successThreshold: Minimum consecutive successes for the probe to be considered successful after having failed. Defaults to 1. Must be 1 for liveness and startup Probes. Minimum value is 1.
failureThreshold: When a probe fails, Kubernetes will try failureThreshold times before giving up. Giving up in case of liveness probe means restarting the container. In case of readiness probe the Pod will be marked Unready. Defaults to 3. Minimum value is 1.

Kubernetes liveness probe

Kubectl watches over your containers. If a container process crashes, kubelet will take care of it based on the restart policy. But this is not always enough. Your process may not crash, but instead run into an infinite loop or a deadlock. The restart policy might not be nuanced enough. With a liveness probe, you get to decide when a container is considered alive.

Kubernetes uses the liveness probe to decide when a container needs to be killed and when another instance should be launched instead. Since Kubernetes operates at a pod level, the respective pod is killed if at least one of its containers reports as being unhealthy

Alternatively, we can say it the other way around: only if all the containers of a pod report to be healthy, is the pod considered to be healthy.

NOTE

If a liveness probe fails for any container, then the pod's restart policy goes into effect. Make sure your restart policy is not Never, because that will make the probe useless.

Some guidelines on using liveness probe to perform health check in Kubernetes:

Liveness probes should be used with caution. The wrong configuration of this probe can result in cascading failures in your services and container restart loops. As a quick experiment, you can redeploy the voting application manifest where you replace the readiness probe with a liveness probe, with similar configuration but very short timeouts and delays—you will experience multiple random crashes and poor availability of the application!
Do not use liveness probes unless you have a good reason for this. A good reason may, for example, be a known issue with a deadlock in your application that has an as yet unknown root cause.
Execute simple and fast checks that determine the status of the process, not its dependencies. In other words, you do not want to check external dependencies' statuses in the liveness probe—this can lead to cascading failures due to an avalanche of container restarts and overloading a small subset of Service Pods.
If your process running in the container is able to crash or exit whenever it encounters an unrecoverable error, you probably do not need a liveness probe at all.
Use conservative settings for initialDelaySeconds to avoid any premature container restarts and falling into a restart loop.

Example-1: Define liveness probe with exec

In this example we will create an nginx container using a simple Pod and use exec probe to check for nginx service status. This means that as long as nginx service is in 'running' state, the Pod will be considered alove or else it will be killed based on other configuration options such as failureThreshold, number of retries etc.

Following is the YAML file for my Pod with liveness probe defined:

[root@controller ~]# cat liveness-eg-1.yaml
apiVersion: v1
kind: Pod
metadata:
  name: liveness-demo
  namespace: default
spec:
  containers:
  - name: test-liveness-probe
    image: nginx
    ports:
    - containerport: 80
    livenessProbe:
      exec:
        command:
          - sh
          - -c
          - service nginx status | grep running
      initialDelaySeconds: 10
      periodSeconds: 5

The relevant part is in the livenessProbe section. First, we define a command that Kubernetes will execute as a probe inside the container. In our case, we have a nginx container and use exec probe to execute "service nginx status" command and grep for the string 'running'. By default the nginx service is expected to be in running state.

The other two settings, initialDelaySeconds and periodSeconds, define how long Kubernetes should wait after starting the container until it first executes the probe and how frequently the probe should be executed thereafter. In our case, Kubernetes waits for 10 seconds prior to executing the first probe and then executes a probe every 5 seconds.

We will create this pod and check the status of the Pod:

[root@controller ~]# kubectl create -f liveness-eg-1.yaml
pod/liveness-demo created

Check the status of the Pod:

How to perform kubernetes health check using probes

So at the moment our Pod is in running state which would mean that liveness probe is able to find the nginx service running. We can also use kubectl describe command to get more details.

For the demonstration purpose, I will intentionally fail the liveness probe by setting incorrect command and will create another pod by the name fail-liveness-demo. Here even though I have intentionally given a wrong command, the pod seems to be in running state:

[root@controller ~]# kubectl get pods
NAME                 READY   STATUS    RESTARTS   AGE
fail-liveness-demo   1/1     Running   0          23s
liveness-demo        1/1     Running   0          4m11s

So let's use kubectl describe pod fail-liveness-demo command to get the details:

Under Events, we can see that the Liveness Probe has failed but by default the Pod will attempt to restart the container 3 times which is the default failure threshold after which the container will be killed.

We can check the status of the pods in few seconds:

[root@controller ~]# kubectl get pods
NAME                 READY   STATUS             RESTARTS   AGE
fail-liveness-demo   0/1     CrashLoopBackOff   3          4m41s
liveness-demo        1/1     Running            0          8m29s

As expected, our fail-liveness-demo has been marked as failed. You can get more details using kubectl describe pod fail-liveness-demo command:

Let's delete both these pods:

[root@controller ~]# kubectl delete pod liveness-demo
pod "liveness-demo" deleted

[root@controller ~]# kubectl delete pod fail-liveness-demo
pod "fail-liveness-demo" deleted

Example-2: Define liveness probe with httpGet

In this example we will use httpGet to probe the liveness of a pod. Normally for containers with webservers we can also directly use the kubelet to send an HTTP GET request to the server that is running in the container and listening on port 80. If the handler for the server's / path returns a success code, the kubelet considers the container to be alive and healthy. If the handler returns a failure code, the kubelet kills the container and restarts it.

Here is my sample YAML file to create a Pod with liveness probe using httpGet:

[root@controller ~]# cat liveness-eg-3.yaml
apiVersion: v1
kind: Pod
metadata:
  name: liveness-probe-httpget
  namespace: default
spec:
  containers:
  - name: test-liveness-probe
    image: nginx
    ports:
    - containerport: 80
    livenessProbe:
      httpGet:
        path: /
        port: 8080
      initialDelaySeconds: 10
      periodSeconds: 5

Create the pod and check the status:

[root@controller ~]# kubectl create -f liveness-eg-3.yaml
pod/liveness-probe-httpget created

[root@controller ~]# kubectl get pods
NAME                     READY   STATUS    RESTARTS   AGE
liveness-probe-httpget   1/1     Running   0          7s

So our pod is in running state, we will also check the events of this pod using kubectl describe command:

[root@controller ~]# kubectl describe pods liveness-probe-httpget

Sample output from my controller, there doesn't seem to be any events of failure of liveness probe so our nginx server must be up and running:

Again for the demonstration purpose I intentionally gave a wrong port to check the liveness probe in another pod and here is the sample Event from that Pod:

  Warning  Unhealthy  1s    kubelet            Liveness probe failed: Get "http://10.36.0.2:8080/": dial tcp 10.36.0.2:8080: connect: connection refused

and the respective pod has entered into CrashLoopBackOff state with continuous restarts:

[root@controller ~]# kubectl get pods
NAME                          READY   STATUS             RESTARTS   AGE
liveness-probe-httpget        1/1     Running            0          7m21s
liveness-probe-httpget-fail   0/1     CrashLoopBackOff   6          6m26s

Let's delete both these pods:

[root@controller ~]# kubectl delete pod liveness-probe-httpget liveness-probe-httpget-fail
pod "liveness-probe-httpget" deleted
pod "liveness-probe-httpget-fail" deleted

Example-3: Define liveness probe with tcpSocket

Kubernetes also supports health checks via simple TCP socket checks. With this configuration, the kubelet will attempt to open a socket to your container on the specified port. If it can establish a connection, the container is considered healthy, if it can't it is considered a failure.

Following is the YAML file to create this Pod with tcpSocket as the probe for liveness health check:

[root@controller ~]# cat liveness-eg-4.yaml
apiVersion: v1
kind: Pod
metadata:
  name: liveness-probe-tcpsocket
  namespace: default
spec:
  containers:
  - name: test-liveness-probe
    image: nginx
    ports:
    - containerport: 80
    livenessProbe:
      tcpSocket:
        port: 80
      initialDelaySeconds: 10
      periodSeconds: 5

Create this pod and check the status:

[root@controller ~]# kubectl create -f liveness-eg-4.yaml
pod/liveness-probe-tcpsocket created

[root@controller ~]# kubectl get pods
NAME                       READY   STATUS    RESTARTS   AGE
liveness-probe-tcpsocket   1/1     Running   0          99s

The pod seems to be in running state but we must also check the Events to make sure there are no liveness probe failures:

[root@controller ~]# kubectl describe pods liveness-probe-tcpsocket

Sample Events output from my controller node for this command:

Let's delete this pod:

[root@controller ~]# kubectl delete pod liveness-probe-tcpsocket
pod "liveness-probe-tcpsocket" deleted

Kubernetes readiness probe

Kubernetes uses a readiness probe to decide when a service instance, that is, a container, is ready to accept traffic. Pods that are not ready (a Pod is ready only if all of its containers are considered ready) will be removed from the Service Endpoints list until they become ready again. In other words, it is a signal for notifying that a given Pod can be used for requests incoming to the Service.

Some guidelines which are recommended to be followed when using readiness probe to perform health checks in Kubernetes:

Use this probe whenever your containers may not be ready to properly serve traffic as soon as the container is started.
Ensure that you check the cache warm-up or database migration status during readiness probe evaluation. You may also consider starting the actual process of a warm-up if it hasn't been started yet, but use this approach with caution—a readiness probe will be executed constantly throughout the life cycle of a Pod, which means you shouldn't do any costly operations for every request. Alternatively, you may want to use a startup probe for this purpose, newly-introduced in Kubernetes 1.16.
For microservice applications that expose HTTP endpoints, consider always configuring the httpGet readiness probe. This will ensure that all cases are covered when a container is successfully running but the HTTP server is not fully initialized.
It is a good idea to use a separate, dedicated HTTP endpoint for readiness checks in your application, for example, a common convention is using /health.
If you are checking the state of dependencies (external database and logging services) in this type of probe, be careful with shared dependencies, such as SQL Server in the voting application. In this case, you should consider using a probe timeout, which is greater than the maximum allowed timeout for the external dependency— otherwise, you may get cascading failures and lower availability instead of occasionally increased latency.

Example: Define readiness probe with httpget

In this example we will use readiness probe to determine the health of the container in an deployment using httpget. Following is ,y sample YAML file to create a deployment:

[root@controller ~]# cat readiness-eg-1.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    type: dev
  name: readiness-probe-demo
spec:
  replicas: 2
  selector:
    matchLabels:
      type: dev
  template:
    metadata:
      labels:
        type: dev
    spec:
      containers:
      - image: nginx
        name: nginx
        readinessProbe:
          httpGet:
            path: /
            port: 80
          initialDelaySeconds: 30
          periodSeconds: 10
          timeoutSeconds: 1
          successThreshold: 1
          failureThreshold: 3

Let's create this deployment and check the status of pods created as part of this deployment:

[root@controller ~]# kubectl create -f readiness-eg-1.yaml
deployment.apps/readiness-probe-demo created

Check the status of the pods:

As you can see the containers are running but are still not marked as "READY". It is because we had defined initialDelaySeconds as 30 to give some time for the nginx service to come up before we perform the health check.

We check the status again in few seconds and now the pods are marked as READY:

You can use kubectl describe command to check the Events for any failures.

[root@controller ~]# kubectl describe pod readiness-probe-demo-65b4cd6f7f-47rht

Sample events output from this command:

Let's delete our deployment

[root@controller ~]# kubectl delete deployment readiness-probe-demo
deployment.apps "readiness-probe-demo" deleted

Kubernetes startup probe

Startup probes have been introduced in Kubernetes 1.16 to support cases when a container may require more time for initialization than initialDelaySeconds + failureThreshold * periodSeconds set in the readiness probe. In general, you should use the same handler configuration for startup probes that you would for readiness probes but use larger delays. If a container is not ready within initialDelaySeconds + failureThreshold * periodSeconds for a readiness probe, then the container will be killed and subject to the Pod's restart policy.

If we define a startup probe for a container, then Kubernetes does not execute the liveness or readiness probes, as long as the container's startup probe does not succeed. Once again, Kubernetes looks at pods and starts executing liveness and readiness probes on its containers if the startup probes of all the pod's containers succeed.

When should you use startup probes?

There might be situations where we have to account for exceptionally long startup and initialization times, such as when containerizing a legacy application. We could technically configure the readiness or the liveness probes to account for this fact, but that would defeat the purpose of these probes. The latter probes are meant to provide quick feedback to Kubernetes on the health and availability of the container. If we configure for long initial delays or periods, then this would counter the desired outcome.

Example: Using startup probe with httpget

We will use one of our previous examples to demonstrate the behaviour of starup probes. We will create a simple pod using the following YAML file:

[root@controller ~]# cat startup-eg-1.yaml
apiVersion: v1
kind: Pod
metadata:
  name: startup-probe-httpget
  namespace: default
spec:
  containers:
  - name: nginx
    image: nginx
    ports:
    - containerport: 80
    startupProbe:
      httpGet:
        path: /
        port: 80
      initialDelaySeconds: 10
      failureThreshold: 30
      periodSeconds: 5

As you can see, the syntax to use startup probe is same as liveness and readiness probe. We just need to change the probe name.

Let us create this pod and check the status:

[root@controller ~]# kubectl create -f startup-eg-1.yaml
pod/startup-probe-httpget created

[root@controller ~]# kubectl get pods
NAME                    READY   STATUS    RESTARTS   AGE
startup-probe-httpget   0/1     Running   0          16s

So our pod with startup probe was created and is in Running status but is yet not marked as READY. After waiting for few seconds and now the pod is marked as running:

[root@controller ~]# kubectl get pods
NAME                         READY   STATUS              RESTARTS   AGE
startup-probe-httpget        1/1     Running             0          3m42s

So the only advantage we get here is that the application will have a maximum of around 3.3 minutes (10 + 30 * 5 = 200s) to finish its startup. To demonstrate this I have created another pod with incorrect port value for the health check with startup probe.

  Warning  Unhealthy  2m44s (x21 over 4m24s)  kubelet            Startup probe failed: Get "http://10.36.0.2:8080/": dial tcp 10.36.0.2:8080: connect: connection refused

Here this pod has been marked unhealthy for over 4m24s and yet it has been restarted only once because the probe waited for 3.3 minutes before restarting the container.

[root@controller ~]# kubectl get pods
NAME                         READY   STATUS    RESTARTS   AGE
startup-probe-httpget        1/1     Running   0          8m20s
startup-probe-httpget-fail   0/1     Running   1          4m41s

Let's delete these pods:

[root@controller ~]# kubectl delete pod startup-probe-httpget startup-probe-httpget-fail
pod "startup-probe-httpget" deleted
pod "startup-probe-httpget-fail" deleted

Summary

In this tutorial we learned about different types of probes available in Kubernetes which can be used to check the health of Pods and containers. You may choose the type of probe based on your requirement. But liveness probes are not used very frequently unless you expect your application to be stuck in some deadlock scenario. Most preferred method would be readiness probe in most cases while for legacy applications which takes more time to come up, we can use startup probes. The usage syntax for startup probe is exactly same as liveness and readiness probe. You can further use exec, httpget or tcpsocket with any of these probes to perform the health check.
*
Related Searches: health check in kubernetes, k8s health check, kubernetes disable health check, health check kubernetes, kubernetes health check https, kubernetes pod health check, kubernetes health check deployment*