Steps to properly remove node from cluster in Kubernetes

Kubernetes runs the workload by placing containers into pods and then schedules them to run on nodes. A node might be a virtual or physical machine, depending on the cluster setup. Each node has the services necessary to run pods, managed by the Kubernetes control plane.There are always going to be cases when you need to stop scheduling pods on some nodes, or rescheduling pods to different nodes, or temporally disabling the scheduling of any pods to some nodes, removing nodes, or any other reasons.

The brief steps required to gracefully delete or remove a node from the cluster in Kubernetes would be:

Drain the respective worker node
Delete the respective worker node

Now in this tutorial we will cover and explain these steps in detail with additional recommendations to properly remove the node from the cluster without any impacts to the Pod.

Getting list of nodes

To start working with nodes, you need to get a list of them first. To get the nodes list, you can use kubectl get nodes command:

Steps to properly remove node from cluster in Kubernetes

The command output shows that we have one controller node and two worker nodes which all are in Ready state.

Cordoning nodes

Let us assume that we want to perform some application load test but we don't want the application to be deployed on a certain node. Since all the worker nodes are in Ready state, so any of the worker node could be chosen by default, so we must disable scheduling on respective worker node.

In this example I will choose to disable scheduling on worker-1.example.com. But before that let's check if we have any pods running on worker-

[root@controller ~]# kubectl get pods -o wide
NAME                           READY   STATUS    RESTARTS   AGE   IP             NODE                   NOMINATED NODE   READINESS GATES
nginx-deploy-d98cc8bdb-57tg8   1/1     Running   0          49s   10.142.0.132   worker-1.example.com   <none>           <none>
nginx-deploy-d98cc8bdb-dtkhh   1/1     Running   0          49s   10.142.0.133   worker-1.example.com   <none>           <none>
nginx-deploy-d98cc8bdb-ssgtw   1/1     Running   0          49s   10.142.0.3     worker-2.example.com   <none>           <none>

So we have two pods already using worker-1.example.com. Cordoning a worker node will disable scheduling for new pods only and any existing pods will continue to run on the same worker node. But no new pods will be scheduled onto that node. So assuming nginx-deploy-d98cc8bdb-dtkhh pod restarts then it will not be scheduled on worker-1.

Let's go ahead and cordon worker-1.example.com:

[root@controller ~]# kubectl cordon worker-1.example.com
node/worker-1.example.com cordoned

Check the status of the nodes, it now shows Ready,SchedulingDisabled for worker-1.example.com.

Now let's delete one of the pod which is running on worker-1:

[root@controller ~]# kubectl delete pod nginx-deploy-d98cc8bdb-dtkhh
pod "nginx-deploy-d98cc8bdb-dtkhh" deleted

Since we have configured 3 replicas for this deployment, a new pod will automatically be created. But if you check, this new pod was created on worker-2 as we have disabled scheduling on worker-1.

[root@controller ~]# kubectl get pods -o wide
NAME                           READY   STATUS    RESTARTS   AGE   IP             NODE                   NOMINATED NODE   READINESS GATES
nginx-deploy-d98cc8bdb-57tg8   1/1     Running   0          73s   10.142.0.132   worker-1.example.com   <none>           <none>
nginx-deploy-d98cc8bdb-ssgtw   1/1     Running   0          73s   10.142.0.3     worker-2.example.com   <none>           <none>
nginx-deploy-d98cc8bdb-wl5tp   1/1     Running   0          14s   10.142.0.4     worker-2.example.com   <none>           <none>

Once we are done with our testing, we can un-cordon the node to enable scheduling.

[root@controller ~]# kubectl uncordon worker-1.example.com
node/worker-1.example.com uncordoned

Check the status and the worker-1 node should be in Ready state again.

[root@controller ~]# kubectl get nodes
NAME                     STATUS   ROLES                  AGE   VERSION
controller.example.com   Ready    control-plane,master   36h   v1.20.5
worker-1.example.com     Ready    <none>                 36h   v1.20.5
worker-2.example.com     Ready    <none>                 36h   v1.20.5

IMPORTANT

If the cordoned node gets rebooted then all pods that were scheduled on it will get rescheduled to different nodes, as even when rebooting the node its readiness status doesn't change.

Draining a node

Cordoning will actually not delete a node and will only disable scheduling. You might want to remove or evict all pods from a node that is going to be deleted, upgraded, or rebooted, for example. There is a command, drain, for that.

We can choose to execute cordon command on the node which you want to be deleted but it's optional as the draining command will also cordon the node and additionally it will evict the node from the cluster gracefully.

[root@controller ~]# kubectl cordon worker-1.example.com
node/worker-1.example.com cordoned

From the help section of kubectl drain command

The given node will be marked unschedulable to prevent new pods from arriving. 'drain' evicts the pods if the APIServer supports http://kubernetes.io/docs/admin/disruptions/ . Otherwise, it will use normal DELETE to delete the pods. The 'drain' evicts or deletes all pods except mirror pods (which cannot be deleted through the API server). If there are DaemonSet-managed pods, drain will not proceed without --ignore-daemonsets, and regardless it will not delete any DaemonSet-managed pods, because those pods would be immediately replaced by the DaemonSet controller, which ignores unschedulable markings.

NOTE

A DaemonSet ensures that all specified Kubernetes nodes run a copy of the same pod specified in the DaemonSet. A DaemonSet cannot be deleted from the Kubernetes node, so the --ignore-daemonsets flag must be used to force draining the node.

We will drain worker-1.example.com node. We have passed the --ignore-daemonsets flag so that if there are any DaemonSets running on the node the drain command will not fail.

[root@controller ~]# kubectl drain --ignore-daemonsets --force worker-1.example.com
node/worker-1.example.com already cordoned
WARNING: ignoring DaemonSet-managed Pods: kube-system/calico-node-hhz9s, kube-system/kube-proxy-rs4ct
evicting pod default/nginx-deploy-d98cc8bdb-57tg8
pod/nginx-deploy-d98cc8bdb-57tg8 evicted
node/worker-1.example.com evicted

The worker-1 node just shows Ready,DchedulingDisabled at this stage:

[root@controller ~]# kubectl get nodes
NAME                     STATUS                     ROLES                  AGE   VERSION
controller.example.com   Ready                      control-plane,master   36h   v1.20.5
worker-1.example.com     Ready,SchedulingDisabled   <none>                 36h   v1.20.5
worker-2.example.com     Ready                      <none>                 36h   v1.20.5

But if you check the pods status, all our pods are moved to worker-2 node.

[root@controller ~]# kubectl get pods -o wide
NAME                           READY   STATUS    RESTARTS   AGE     IP           NODE                   NOMINATED NODE   READINESS GATES
nginx-deploy-d98cc8bdb-2dpch   1/1     Running   0          24s     10.142.0.5   worker-2.example.com   <none>           <none>
nginx-deploy-d98cc8bdb-ssgtw   1/1     Running   0          4m25s   10.142.0.3   worker-2.example.com   <none>           <none>
nginx-deploy-d98cc8bdb-wl5tp   1/1     Running   0          3m26s   10.142.0.4   worker-2.example.com   <none>           <none>

Now we can safely go ahead and remove the node from our cluster in Kubernetes.

Remove node from Kubernetes Cluster

The worker-1 node got drained and is not running any deployments, pods, or StatefulSets, so it can be easily deleted now. To delete the node we will use:

[root@controller ~]# kubectl delete node worker-1.example.com
node "worker-1.example.com" deleted

Check the status of available nodes and now we don't see worker-1.example.com here:

If the Kubernetes cluster supports nodes autoscaling, then nodes can be added and deleted as specified by the autoscaling rules - by setting min and max node counts. If there is not much load running in the cluster, unnecessary nodes will be removed down to the minimum nodes set by the autoscaling rules. And when the load increases, the required amount of nodes will be deployed to accommodate the newly scheduled pods.

IMPORTANT

Actual node deletion depends on your Kubernetes setup. In cloud-hosted clusters, the node gets unregistered and deleted, but if you are running an on-premise self-hosted Kubernetes cluster, the actual node will not be deleted but only deregistered from the Kubernetes API.

Summary

In this tutorial, we have learned how to use kubectl to list nodes running in the cluster and get information about the nodes. We have seen how to cordon, drain, and remove nodes. It is important that the steps are followed in the right order to properly delete a worker node without impacting any of the pods running on these respective nodes.

Related Searches: drain kubernetes node, kubectl drain command, remove node from cluster kubernetes, kubernetes evict pods from node, kubernetes drain and cordon, kubernetes eviction api, kubernetes delete node from cluster, kubernetes disable scheduling on node, kubectl cordon, kubernetes force remove node, pod maintenance

Getting list of nodes

Cordoning nodes

Draining a node

Remove node from Kubernetes Cluster

Summary

Related Articles

Kubernetes Interview Questions and Answers

Set ulimit in Kubernetes Pods: open files, nproc, and process limits

Kubernetes DNS Troubleshooting: Fix CoreDNS, NXDOMAIN, SERVFAIL, ndots, and DNS Timeouts

Search GoLinuxCloud