Kubernetes runs the workload by placing containers into pods and then schedules them to run on nodes. A node might be a virtual or physical machine, depending on the cluster setup. Each node has the services necessary to run pods, managed by the Kubernetes control plane. There are always going to be cases when you need to stop scheduling pods on some nodes, or rescheduling pods to different nodes, or temporally disabling the scheduling of any pods to some nodes, removing nodes, or any other reasons.
The brief steps required to gracefully delete or remove a node from the cluster in Kubernetes would be:
- Drain the respective worker node
- Delete the respective worker node
Now in this tutorial we will cover and explain these steps in detail with additional recommendations to properly remove the node from the cluster without any impacts to the Pod.
Getting list of nodes
To start working with nodes, you need to get a list of them first. To get the nodes list, you can use kubectl get nodes
command:
The command output shows that we have one controller node and two worker nodes which all are in Ready
state.
Cordoning nodes
Let us assume that we want to perform some application load test but we don't want the application to be deployed on a certain node. Since all the worker nodes are in Ready
state, so any of the worker node could be chosen by default, so we must disable scheduling on respective worker node.
In this example I will choose to disable scheduling on worker-1.example.com
. But before that let's check if we have any pods running on worker-
[root@controller ~]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-deploy-d98cc8bdb-57tg8 1/1 Running 0 49s 10.142.0.132 worker-1.example.com <none> <none> nginx-deploy-d98cc8bdb-dtkhh 1/1 Running 0 49s 10.142.0.133 worker-1.example.com <none> <none> nginx-deploy-d98cc8bdb-ssgtw 1/1 Running 0 49s 10.142.0.3 worker-2.example.com <none> <none>
So we have two pods already using worker-1.example.com
. Cordoning a worker node will disable scheduling for new pods only and any existing pods will continue to run on the same worker node. But no new pods will be scheduled onto that node. So assuming nginx-deploy-d98cc8bdb-dtkhh
pod restarts then it will not be scheduled on worker-1
.
Let's go ahead and cordon worker-1.example.com
:
[root@controller ~]# kubectl cordon worker-1.example.com
node/worker-1.example.com cordoned
Check the status of the nodes, it now shows Ready,SchedulingDisabled
for worker-1.example.com
.
Now let's delete one of the pod which is running on worker-1
:
[root@controller ~]# kubectl delete pod nginx-deploy-d98cc8bdb-dtkhh
pod "nginx-deploy-d98cc8bdb-dtkhh" deleted
Since we have configured 3 replicas for this deployment, a new pod will automatically be created. But if you check, this new pod was created on worker-2
as we have disabled scheduling on worker-1
.
[root@controller ~]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-deploy-d98cc8bdb-57tg8 1/1 Running 0 73s 10.142.0.132 worker-1.example.com <none> <none>
nginx-deploy-d98cc8bdb-ssgtw 1/1 Running 0 73s 10.142.0.3 worker-2.example.com <none> <none>
nginx-deploy-d98cc8bdb-wl5tp 1/1 Running 0 14s 10.142.0.4 worker-2.example.com <none> <none>
Once we are done with our testing, we can un-cordon the node to enable scheduling.
[root@controller ~]# kubectl uncordon worker-1.example.com
node/worker-1.example.com uncordoned
Check the status and the worker-1
node should be in Ready
state again.
[root@controller ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
controller.example.com Ready control-plane,master 36h v1.20.5
worker-1.example.com Ready <none> 36h v1.20.5
worker-2.example.com Ready <none> 36h v1.20.5
Draining a node
Cordoning will actually not delete a node and will only disable scheduling. You might want to remove or evict all pods from a node that is going to be deleted, upgraded, or rebooted, for example. There is a command, drain, for that.
We can choose to execute cordon command on the node which you want to be deleted but it's optional as the draining command will also cordon the node and additionally it will evict the node from the cluster gracefully.
[root@controller ~]# kubectl cordon worker-1.example.com
node/worker-1.example.com cordoned
From the help section of kubectl drain
command
The given node will be marked unschedulable to prevent new pods from arriving. 'drain' evicts the pods if the APIServer supports http://kubernetes.io/docs/admin/disruptions/ . Otherwise, it will use normal DELETE to delete the pods. The 'drain' evicts or deletes all pods except mirror pods (which cannot be deleted through the API server). If there are DaemonSet-managed pods, drain will not proceed without --ignore-daemonsets, and regardless it will not delete any DaemonSet-managed pods, because those pods would be immediately replaced by the DaemonSet controller, which ignores unschedulable markings.
--ignore-daemonsets
flag must be used to force draining the node.We will drain worker-1.example.com
node. We have passed the --ignore-daemonsets
flag so that if there are any DaemonSets running on the node the drain command will not fail.
[root@controller ~]# kubectl drain --ignore-daemonsets --force worker-1.example.com
node/worker-1.example.com already cordoned
WARNING: ignoring DaemonSet-managed Pods: kube-system/calico-node-hhz9s, kube-system/kube-proxy-rs4ct
evicting pod default/nginx-deploy-d98cc8bdb-57tg8
pod/nginx-deploy-d98cc8bdb-57tg8 evicted
node/worker-1.example.com evicted
The worker-1
node just shows Ready,DchedulingDisabled
at this stage:
[root@controller ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
controller.example.com Ready control-plane,master 36h v1.20.5
worker-1.example.com Ready,SchedulingDisabled <none> 36h v1.20.5
worker-2.example.com Ready <none> 36h v1.20.5
But if you check the pods status, all our pods are moved to worker-2
node.
[root@controller ~]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-deploy-d98cc8bdb-2dpch 1/1 Running 0 24s 10.142.0.5 worker-2.example.com <none> <none> nginx-deploy-d98cc8bdb-ssgtw 1/1 Running 0 4m25s 10.142.0.3 worker-2.example.com <none> <none> nginx-deploy-d98cc8bdb-wl5tp 1/1 Running 0 3m26s 10.142.0.4 worker-2.example.com <none> <none>
Now we can safely go ahead and remove the node from our cluster in Kubernetes.
Remove node from Kubernetes Cluster
The worker-1
node got drained and is not running any deployments, pods, or StatefulSets, so it can be easily deleted now. To delete the node we will use:
[root@controller ~]# kubectl delete node worker-1.example.com
node "worker-1.example.com" deleted
Check the status of available nodes and now we don't see worker-1.example.com
here:
If the Kubernetes cluster supports nodes autoscaling, then nodes can be added and deleted as specified by the autoscaling rules - by setting min and max node counts. If there is not much load running in the cluster, unnecessary nodes will be removed down to the minimum nodes set by the autoscaling rules. And when the load increases, the required amount of nodes will be deployed to accommodate the newly scheduled pods.
Summary
In this tutorial, we have learned how to use kubectl to list nodes running in the cluster and get information about the nodes. We have seen how to cordon, drain, and remove nodes. It is important that the steps are followed in the right order to properly delete a worker node without impacting any of the pods running on these respective nodes.
Related Searches: drain kubernetes node, kubectl drain command, remove node from cluster kubernetes, kubernetes evict pods from node, kubernetes drain and cordon, kubernetes eviction api, kubernetes delete node from cluster, kubernetes disable scheduling on node, kubectl cordon, kubernetes force remove node, pod maintenance