In this article I will give an overview on what is fencing and Step-by-Step Tutorial to configure cluster fencing/pacemaker fencing using fence_xvm on my KVM HA Cluster in RHEL and CentOS 8
I hope you are familiar with High Availablity Cluster Architecture
What is Fencing?
- As the number of nodes in a cluster increases, its availability increases, but so does the chance of one of them failing at some point.
- If communication with a single node in the cluster fails, then other nodes in the cluster must be able to restrict or release access to resources that the failed cluster node may have access to.
- This cannot be accomplished by contacting the cluster node itself as the cluster node may not be responsive.
- Instead, you must provide an external method, which is called fencing with a fence agent
- By definition, cluster fencing is the process of isolating, or separating, a node from using its resources or starting services, which it should not have access to, and from the rest of the nodes as well.
- Without a fence device configured you do not have a way to know that the resources previously used by the disconnected cluster node have been released, and this could prevent the services from running on any of the other cluster nodes.
- Without a fence device configured data integrity cannot be guaranteed and the cluster configuration will be unsupported.
- When the fencing is in progress no other cluster operation is allowed to run
- Fencing is performed using a mechanism known as STONITH
- STONITH is an acronym for "Shoot The Other Node In The Head" and it protects your data from being corrupted by rogue nodes or concurrent access
Setup KVM HA Cluster
In our previous article I configured KVM High Availability Cluster using Pacemaker GUI
So I will use the same pacemaker cluster setup to configure fencing using fence_xvm
Install Stonith Device on KVM Host
Now that you are familiar with what is fencing, to configure cluster fencing on KVM Virtual Machines we must install fence related rpms on the KVM Host
Install the below list of rpm on your KVM host to configure pacemaker fencing using fence_xvm
[root@rhel-8 ~]# yum install fence-virt fence-virtd fence-virtd-libvirt fence-virtd-multicast fence-virtd-serial
Install fence_xvm on KVM Virtual Machines
Install 'fence-virt
' package on every cluster node
[root@centos8-2 ~]# dnf -y install fence-virt [root@centos8-3 ~]# dnf -y install fence-virt [root@centos8-4 ~]# dnf -y install fence-virt
To list the available fence agents, execute below command on any of the Cluster node
# pcs stonith list
fence_amt_ws - Fence agent for AMT (WS)
fence_apc - Fence agent for APC over telnet/ssh
fence_apc_snmp - Fence agent for APC, Tripplite PDU over SNMP
fence_bladecenter - Fence agent for IBM BladeCen
<Output trimmed>
This will give you a long list of fence agents which you can use to configure cluster fencing
To get more details about the respective fence agent you can use:
[root@centos8-3 ~]# pcs stonith describe fence_xvm fence_xvm - Fence agent for virtual machines fence_xvm is an I/O Fencing agent which can be used withvirtual machines. Stonith options: debug: Specify (stdin) or increment (command line) debug level ip_family: IP Family ([auto], ipv4, ipv6) multicast_address: Multicast address (default=225.0.0.12 / ff05::3:1) ipport: TCP, Multicast, or VMChannel IP port (default=1229) retrans: Multicast retransmit time (in 1/10sec; default=20) <Output trimmed>
Create fence key
We must create fence key to setup pacemaker fencing on the KVM Host inside /etc/cluster
. By default /etc/cluster
directory will not be available on the KVM host. So we will manually create this directory
[root@rhel-8 ~]# mkdir -p /etc/cluster
Next create the fence key using dd
command. We will name our key fence_xvm.key
[root@rhel-8 ~]# dd if=/dev/urandom of=/etc/cluster/fence_xvm.key bs=4k count=1 1+0 records in 1+0 records out 4096 bytes (4.1 kB, 4.0 KiB) copied, 0.000187547 s, 21.8 MB/s
Next copy this key to all the KVM HA Cluster nodes under /etc/cluster
[root@rhel-8 ~]# scp /etc/cluster/fence_xvm.key centos8-2:/etc/cluster/ [root@rhel-8 ~]# scp /etc/cluster/fence_xvm.key centos8-3:/etc/cluster/ [root@rhel-8 ~]# scp /etc/cluster/fence_xvm.key centos8-4:/etc/cluster/
Configure Cluster Fencing
To configure cluster fencing on KVM host we will use fence_virtd
. This tool will create /etc/fence_virt.conf
configuration file.
This tool will prompt for certain values, you can leave most of the values to default or change as per your environment
[root@rhel-8 ~]# fence_virtd -c Module search path [/usr/lib64/fence-virt]: Available backends: libvirt 0.3 Available listeners: multicast 1.2 Listener modules are responsible for accepting requests from fencing clients. Listener module [multicast]: The multicast listener module is designed for use environments where the guests and hosts may communicate over a network using multicast. The multicast address is the address that a client will use to send fencing requests to fence_virtd. Multicast IP Address [225.0.0.12]: <-- Leave to default Using ipv4 as family. Multicast IP Port [1229]: <-- If you change this then remember to allow this port in firewall Setting a preferred interface causes fence_virtd to listen only on that interface. Normally, it listens on all interfaces. In environments where the virtual machines are using the host machine as a gateway, this *must* be set (typically to virbr0). Set to 'none' for no interface. Interface [virbr0]: <-- I am using virbr0. You can change based on your interface used for Cluster nodes The key file is the shared key information which is used to authenticate fencing requests. The contents of this file must be distributed to each physical host and virtual machine within a cluster. Key File [/etc/cluster/fence_xvm.key]: <-- Leave to default Backend modules are responsible for routing requests to the appropriate hypervisor or management layer. Backend module [libvirt]: <-- Leave to default The libvirt backend module is designed for single desktops or servers. Do not use in environments where virtual machines may be migrated between hosts. Libvirt URI [qemu:///system]: <-- Leave to default Configuration complete. === Begin Configuration === backends { libvirt { uri = "qemu:///system"; } } listeners { multicast { port = "1229"; family = "ipv4"; interface = "virbr0"; address = "225.0.0.12"; key_file = "/etc/cluster/fence_xvm.key"; } } fence_virtd { module_path = "/usr/lib64/fence-virt"; backend = "libvirt"; listener = "multicast"; } === End Configuration === Replace /etc/fence_virt.conf with the above [y/N]? y <-- Give confirmation
Start fence_virtd Service
Next start the fence_virtd
service to enable cluster fencing
[root@rhel-8 ~]# systemctl enable fence_virtd --now Created symlink /etc/systemd/system/multi-user.target.wants/fence_virtd.service → /usr/lib/systemd/system/fence_virtd.service.
Check the status of fence_virtd
to make sure it is running successfully
[root@rhel-8 ~]# systemctl status fence_virtd
● fence_virtd.service - Fence-Virt system host daemon
Loaded: loaded (/usr/lib/systemd/system/fence_virtd.service; enabled; vendor preset: disabled)
Active: active (running) since Fri 2020-05-01 12:00:35 IST; 6s ago
Process: 24945 ExecStart=/usr/sbin/fence_virtd $FENCE_VIRTD_ARGS (code=exited, status=0/SUCCESS)
Main PID: 24946 (fence_virtd)
Tasks: 1 (limit: 26213)
Memory: 2.8M
CGroup: /system.slice/fence_virtd.service
└─24946 /usr/sbin/fence_virtd -w
May 01 12:00:35 rhel-8.example.com systemd[1]: Starting Fence-Virt system host daemon...
May 01 12:00:35 rhel-8.example.com fence_virtd[24946]: fence_virtd starting. Listener: libvirt Backend: multicast
May 01 12:00:35 rhel-8.example.com systemd[1]: Started Fence-Virt system host daemon.
Enable fencing on the cluster nodes, make sure the property is set to TRUE
[root@centos8-2 ~]# pcs -f stonith_cfg property
Cluster Properties:
stonith-enabled: true
If the cluster fencing stonith
property is set to FALSE then you can manually set it to TRUE on all the Cluster nodes
[root@centos8-2 ~]# pcs -f stonith_cfg property set stonith-enabled=true
Configure Firewall
Since we are using default port 1229 for fence_virtd
we must allow this TCP port in firewall. As we are using firewalld
, we will allow this port in our firewalld zone
To get the list of active zones with interface details
[root@rhel-8 ~]# firewall-cmd --get-active-zones libvirt interfaces: virbr0 public interfaces: eno49 eno50 nm-bridge
virbr0
interface, I must define my firewall rules to the zone which manages virbr0
interfaceBy default we apply all rules to public zone which was not working for me and I was getting
[root@centos8-2 ~]# fence_xvm -o list
Timed out waiting for response
Operation failed
But if you are using interface from default
zone then you can apply these firewall rules to your default
zone. I will use libvirt
firewall zone. You can modify the firewall command based on your active zone
[root@rhel-8 ~]# firewall-cmd --add-port=1229/udp --permanent --zone=libvirt [root@rhel-8 ~]# firewall-cmd --add-port=1229/tcp --permanent --zone=libvirt
Reload the firewall rule to activate the changes
[root@rhel-8 ~]# firewall-cmd --reload
success
List the currently allowed ports in firewall
[root@rhel-8 ~]# firewall-cmd --list-ports --zone=libvirt
1229/udp 1229/tcp
To list all the services and port allowed in "libvirt" zone
[root@rhel-8 ~]# firewall-cmd --list-all --zone=libvirt libvirt (active) target: ACCEPT icmp-block-inversion: no interfaces: virbr0 sources: services: dhcp dhcpv6 dns ssh tftp ports: protocols: icmp ipv6-icmp masquerade: no forward-ports: source-ports: icmp-blocks: rich rules: rule priority="32767" reject
Verify Pacemaker fencing on Cluster Nodes
To check fence
status from cluster nodes use fence_xvm
on any of the Cluster nodes as shown below. This should show the list of Virtual Machines managed by the KVM Host
[root@centos8-2 ~]# fence_xvm -o list centos8-2 a0c0680a-5655-48ae-9752-fda306e015ed on centos8-3 3ee94484-bf3b-4636-8d64-f4e59a8c5a6d on centos8-4 638841fe-82c6-4fbb-a79a-780c4675b4e6 on rhel-iscsi e0a7fd5f-3b53-4a7c-9a5c-3d2ca4b9c4f6 on
This means that our KVM Host is configured to fence all these VMs
The output of this list would match the same set of Virtual Machines from virsh
command
[root@rhel-8 ~]# virsh list Id Name State ---------------------------------------------------- 75 rhel-iscsi running 80 centos8-3 running 81 centos8-2 running 83 centos8-4 running 91 centos8-5 running
Create Stonith Resource
We will create stonith
resource for cluster fencing for all our cluster nodes
[root@centos8-2 ~]# pcs stonith create fence-centos8-4 fence_xvm port=centos8-4 pcmk_host_list=centos8-4.example.com [root@centos8-2 ~]# pcs stonith create fence-centos8-3 fence_xvm port=centos8-3 pcmk_host_list=centos8-3.example.com [root@centos8-2 ~]# pcs stonith create fence-centos8-2 fence_xvm port=centos8-2 pcmk_host_list=centos8-2.example.com
pcmk_host_list format
This attribute takes a list of nodes separated by space, comma, or semi-colon. The names should exactly match what pacemaker refers to them as, which is derived from the base configuration at /etc/corosync/corosync.conf
, which would also be reflected in pcs status
output.
virsh
output or else the pacemaker fencing would fail
Verify Stonith Resource Health
After creating stonith
resource on the KVM HA Cluster nodes, verify the resource status using crm_mon
[root@centos8-2 ~]# crm_mon Stack: corosync Current DC: centos8-2 (version 2.0.2-3.el8_1.2-744a30d655) - partition with quorum Last updated: Fri May 1 13:08:25 2020 Last change: Fri May 1 13:08:14 2020 by root via cibadmin on centos8-2 3 nodes configured 3 resources configured Online: [ centos8-2 centos8-3 centos8-4 ] Active resources: fence-centos8-4 (stonith:fence_xvm): Started centos8-2 fence-centos8-3 (stonith:fence_xvm): Started centos8-3 fence-centos8-2 (stonith:fence_xvm): Started centos8-4
So all our stonith
resource have started successfully. You can also check the stonith
resource status using pcs
[root@centos8-2 ~]# pcs stonith status fence-centos8-4 (stonith:fence_xvm): Started centos8-2 fence-centos8-3 (stonith:fence_xvm): Started centos8-3 fence-centos8-2 (stonith:fence_xvm): Started centos8-4
Verify Cluster Fencing
To actually fence the nodes, you will have to use the UUIDs listed by the list command instead of the vm-name. In this example I am triggering fencing for centos8-4
[root@centos8-2 ~]# fence_xvm -o off -H 638841fe-82c6-4fbb-a79a-780c4675b4e6
Next check the status of the KVM Cluster
[root@centos8-2 ~]# pcs cluster status
Cluster Status:
Stack: corosync
Current DC: centos8-2 (version 2.0.2-3.el8_1.2-744a30d655) - partition with quorum
Last updated: Fri May 1 13:29:15 2020
Last change: Fri May 1 13:08:14 2020 by root via cibadmin on centos8-2
3 nodes configured
3 resources configured
PCSD Status:
centos8-2: Online
centos8-3: Online
centos8-4: Offline
As expected our centos8-4
cluster node has gone Offline. You can also check the log on KVM host using journalctl
May 1 13:29:34 rhel-8 systemd-machined[1877]: Machine qemu-74-centos8-4 terminated.
So our cluster fencing is working as expected.
Lastly I hope the steps from the article to understand what is Fencing and Configure Cluster fencing/pacemaker fencing using fence_xvm on KVM HA Cluster on RHEL/CentOS 8 Linux was helpful. So, let me know your suggestions and feedback using the comment section.
References:
Configure Cluster fencing using fence_xvm on KVM Cluster Nodes in RHEL 7/8
hi and thanks for the info provided, I did all the steps and things are working as expected and I am able to fence nodes by executing fence_xvm command, however I am having a 2 nodes cluster and i am expecting that when connectivity is lost between the cluster member fencing will automatically take place and one machine shall reset but this is not happening.
Can you please let me know what is missing.
To reboot a node with fencing would require a different fence agent such as fence_bladecenter which can be used with Blade servers. In such case fencing agent will control the power of these blades to reboot the server but here for kvm fence agent that is not possible
Thank you for your reply so what will be the real benefit of fencing in this case if cant usge fence_XVM to reset my problematic vm automatically?
The idea of using fencing is to make sure the impacted node has no access to the resources shared by your cluster nodes. Fencing will stop the access for the problematic cluster nodes, now how this is done would be different based on the fencing agent.
thank you so much you helped me alot