In this article I will share the commands to cleanup failed actions from pcs status
output for High Availability Pcaemaker cluster.
It happens many times where there are some failed actions logged in the pcs status
when a resource fails to start in the cluster. Even after the resource has successfully started, these failed actions continue to appear in the pcs status
output.
So we can clean failed actions from pcs status
in such case.
Issue: Cleanup failed action messages from pcs status
Below I have a sample output from pcs status on my KVM high Availability Cluster, here there are two types of "Failed Actions"
- Failed Resource Actions
- Failed Fencing Actions
To check the cluster status:
[root@centos8-2 ~]# pcs status Cluster name: ha-cluster Stack: corosync Current DC: centos8-3 (version 2.0.2-3.el8_1.2-744a30d655) - partition with quorum Last updated: Sat May 2 14:38:27 2020 Last change: Sat May 2 14:38:23 2020 by root via cibadmin on centos8-2 3 nodes configured 4 resources configured Online: [ centos8-2 centos8-3 centos8-4 ] Full list of resources: fence-centos8-3 (stonith:fence_xvm): Started centos8-3 fence-centos8-2 (stonith:fence_xvm): Started centos8-2 ClusterIP (ocf::heartbeat:IPaddr2): Started centos8-4 fence-centos8-4 (stonith:fence_xvm): Started centos8-3 Failed Resource Actions: * fence-centos8-2_start_0 on centos8-4 'OCF_TIMEOUT' (198): call=122, status=Timed Out, exitreason='', last-rc-change='Sat May 2 14:36:16 2020', queued=1ms, exec=20012ms * fence-centos8-4_start_0 on centos8-4 'OCF_TIMEOUT' (198): call=124, status=Timed Out, exitreason='', last-rc-change='Sat May 2 14:36:36 2020', queued=0ms, exec=20011ms Failed Fencing Actions: * reboot of centos8-2 failed: delegate=, client=pacemaker-controld.1548, origin=centos8-3, last-failed='Sat May 2 14:37:17 2020' * reboot of centos8-4 failed: delegate=, client=pacemaker-controld.1548, origin=centos8-3, last-failed='Fri May 1 20:57:41 2020' Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled
Now my resource and fencing resource have started successfully, so I don't need to keep these failed action messages.
The commands to cleanup failed actions for Resource and Fencing are different.
Solution: Cleanup Failed Actions for Resource
To cleanup failed actions messages for resource under "Failed Resource Actions" use pcs resource cleanup <resource>
. You can get the resource name from the Failed Resource Actions message output.
Below is the output from my pcs status
* fence-centos8-2_start_0 on centos8-4 'OCF_TIMEOUT' (198): call=122, status=Timed Out, exitreason='', last-rc-change='Sat May 2 14:36:16 2020', queued=1ms, exec=20012ms * fence-centos8-4_start_0 on centos8-4 'OCF_TIMEOUT' (198): call=124, status=Timed Out, exitreason='', last-rc-change='Sat May 2 14:36:36 2020', queued=0ms, exec=20011ms
Here the resource name is fence-centos8-2
and fence-centos8-4
which you can also check using "pcs resource status
"
So to cleanup failed action messages for fence-centos8-2
resource use:
[root@centos8-2 ~]# pcs resource cleanup fence-centos8-2
Cleaned up fence-centos8-2 on centos8-4
Cleaned up fence-centos8-2 on centos8-3
Cleaned up fence-centos8-2 on centos8-2
Waiting for 1 reply from the controller. OK
Similarly to cleanup failed action messages for fence-centos8-2
resource
[root@centos8-2 ~]# pcs resource cleanup fence-centos8-4 Cleaned up fence-centos8-4 on centos8-4 Cleaned up fence-centos8-4 on centos8-3 Cleaned up fence-centos8-4 on centos8-2 Waiting for 1 reply from the controller. OK
After performing cleanup, check the cluster status
[root@centos8-2 ~]# pcs status Cluster name: ha-cluster Stack: corosync Current DC: centos8-3 (version 2.0.2-3.el8_1.2-744a30d655) - partition with quorum Last updated: Sat May 2 14:39:19 2020 Last change: Sat May 2 14:39:17 2020 by hacluster via crmd on centos8-4 3 nodes configured 4 resources configured Online: [ centos8-2 centos8-3 centos8-4 ] Full list of resources: fence-centos8-3 (stonith:fence_xvm): Started centos8-3 fence-centos8-2 (stonith:fence_xvm): Started centos8-2 ClusterIP (ocf::heartbeat:IPaddr2): Started centos8-4 fence-centos8-4 (stonith:fence_xvm): Started centos8-3 Failed Fencing Actions: * reboot of centos8-2 failed: delegate=, client=pacemaker-controld.1548, origin=centos8-3, last-failed='Sat May 2 14:37:17 2020' * reboot of centos8-4 failed: delegate=, client=pacemaker-controld.1548, origin=centos8-3, last-failed='Fri May 1 20:57:41 2020' Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled
So now we don't have any Failed Resource Actions
, next we will cleanup failed action messages for Fencing
Solution: Cleanup Failed Actions for Fencing
Now the pcs status
still shows failed action messages for Fencing, so to cleanup failed action messages for fencing we will use "pcs stonith history cleanup <resource>
"
But before we perform cleanup, we can check the complete history of Failed Fencing Actions using "pcs stonith history show <resource>
"
[root@centos8-2 ~]# pcs stonith history show centos8-2
We failed reboot node centos8-2 on behalf of pacemaker-controld.1548 from centos8-3 at Sat May 2 14:36:57 2020
We failed reboot node centos8-2 on behalf of pacemaker-controld.1548 from centos8-3 at Sat May 2 14:36:37 2020
We failed reboot node centos8-2 on behalf of pacemaker-controld.1548 from centos8-3 at Sat May 2 14:36:17 2020
We failed reboot node centos8-2 on behalf of pacemaker-controld.1548 from centos8-3 at Sat May 2 14:37:16 2020
We failed reboot node centos8-2 on behalf of pacemaker-controld.1548 from centos8-3 at Sat May 2 14:37:17 2020
0 events found
We can get the resource name from this message output of pcs status
* reboot of centos8-2 failed: delegate=, client=pacemaker-controld.1548, origin=centos8-3, last-failed='Sat May 2 14:37:17 2020' * reboot of centos8-4 failed: delegate=, client=pacemaker-controld.1548, origin=centos8-3, last-failed='Fri May 1 20:57:41 2020'
To perform cleanup of failed action messages of fencing use
[root@centos8-2 ~]# pcs stonith history cleanup centos8-2 cleaning up fencing-history for node centos8-2 0 events found [root@centos8-2 ~]# pcs stonith history cleanup centos8-4 cleaning up fencing-history for node centos8-4 0 events found
Now check the pcaemaker cluster status using pcs status
[root@centos8-2 ~]# pcs status
Cluster name: ha-cluster
Stack: corosync
Current DC: centos8-3 (version 2.0.2-3.el8_1.2-744a30d655) - partition with quorum
Last updated: Sat May 2 14:41:05 2020
Last change: Sat May 2 14:39:17 2020 by hacluster via crmd on centos8-4
3 nodes configured
4 resources configured
Online: [ centos8-2 centos8-3 centos8-4 ]
Full list of resources:
fence-centos8-3 (stonith:fence_xvm): Started centos8-3
fence-centos8-2 (stonith:fence_xvm): Started centos8-2
ClusterIP (ocf::heartbeat:IPaddr2): Started centos8-4
fence-centos8-4 (stonith:fence_xvm): Started centos8-3
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
So we don't have any more failed action messages.
Lastly I hope the steps from the article to cleanup failed action messages in pcaemaker cluster on Linux was helpful. So, let me know your suggestions and feedback using the comment section.
References:
Red Hat: How to clean failed action messages for Fencing
Red Hat: How to clean failed action messages for Resource
Please correct the command to show fencing history for the specified node; from:
to
Both seem to be the same command