10 easy steps to setup High Availability Cluster CentOS 8

In this article I will share Step-by-Step Guide to setup 3 node High Availability Cluster with LVM on CentOS 8. In a High Availability Cluster, different servers work together to make sure that the downtime of critical resources is reduced to a minimum.

The Components That Build a High Availability Cluster

To build a High Availability Cluster, you’ll need more than just a few servers that are tied together. Typically, the following components are used in most clusters:

Shared storage
Different networks
Bonded network devices
Multipathing
Fencing/STONITH devices

You can learn more about individual components at Linux High Availaibility Clustering

Lab Environment

I have created 3 Virtual Machines with CentOS 8.1 while one VM with CentOS 7 on Oracle VirtualBox which is installed on a Linux server. The 3 VMs will be part of our Linux HA Cluster while the 4th VM will be used to configure iSCSI Storage. We will use this iSCSI storage to configure shared storage across all the Clusters.

HINT

I would recommend to configure one click installation using Network PXE Boot Server. Using PXE server you can install Oracle Virtual Machines or KVM based Virtual Machines or any type of physical server without any manual intervention saving time and effort.

Below are my environment specifications:

	HA-Cluster-Node1	HA Cluster-Node2	HA Cluster-Node2	iSCSI Storage Server
Hostname	centos8-1	centos8-2	centos8-3	server1
FQDN	centos8-1.example.com	centos8-2.example.com	centos8-3.example.com	server1.example.com
IP Address	10.10.10.12	10.10.10.16	10.10.10.17	10.10.10.2
OS	Centos 8.1	Centos 8.1	Centos 8.1	RHEL 7

I have configured an internal network in Oracle VirtualBox using which the Cluster communicate with each other and a separate network to connect external network

Pre-requisite:

Update CentOS 8 to latest available release

The required repository to install High Availability rpms such as pacemaker is available starting with CentOS 8.1 so if you are using CentOS 8.0 then you must update your Linux environment using:

# dnf update

This will update your Linux server to the latest available CentOS release. Below is my CentOS release details

[root@centos8-1 ~]# cat /etc/redhat-release
CentOS Linux release 8.1.1911 (Core)

The HighAvailability repository was added on:

2019-12-17 - [email protected] - 8-1.el8
- Add the HighAvailability repository

Update /etc/hosts or use DNS Server

All the nodes of the high availability clusters must be able to communicate each other using FQDN so either you can

Configure DNS server for name resolution
Update /etc/hosts

Below is the output of my /etc/hosts from all the cluster nodes

[root@centos8-1 ~]# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
10.10.10.17   centos8-3         centos8-3.example.com
10.10.10.16   centos8-2         centos8-2.example.com
10.10.10.12   centos8-1         centos8-1.example.com

Install EPEL repo

You may also need EPEL repo so you can also install EPEL repo on your CentOS 8 Linux node

[root@centos8-1 ~]# dnf install epel-release

Configure Chrony NTP

In a High Availability Cluster it is important that all the cluster nodes are configured with the same time. So you must setup chrony service on all your cluster nodes.

Let us start with the Step-by-Step Guide for CentOS 8 High Availability Cluster Configuration

Step 1: Configure shared storage

In this article since we plan to setup High Availability Cluster for LVM resource so we would need a shared storage. Since this is my Lab environment I do not have access to SAN Storage so I will settle with iSCSI storage.

I have already written another article with detailed list of steps to configure iSCSI Storage using RHEL/CentOS 7/8 Linux. So I will not repeat the steps here again.

After iSCSI discovery, I have /dev/sdc available on all my cluster nodes and connected as iSCSI Storage from the iSCSI target. We will create LVM resource on /dev/sdc in the next chapters

[root@centos8-1 ~]# lsscsi
[0:0:0:0]    cd/dvd  VBOX     CD-ROM           1.0   /dev/sr0
[1:0:0:0]    cd/dvd  VBOX     CD-ROM           1.0   /dev/sr1
[2:0:0:0]    disk    ATA      VBOX HARDDISK    1.0   /dev/sda
[3:0:0:0]    disk    ATA      VBOX HARDDISK    1.0   /dev/sdb
[4:0:0:0]    disk    LIO-ORG  sdb1             4.0   /dev/sdc

Step 2: Install pacemaker and other High Availability rpms

By default the High Availability repository is disabled on CentOS 8 based on the below commit

2019-12-19 - [email protected] - 8-1.0.7
- Typo fixes
- Disable the HA repo by default

So before we install CentOS HA Cluster rpms, we will enable the HighAvailability repo

[root@centos8-1 ~]# dnf config-manager --set-enabled HighAvailability

List the enabled repos

[root@centos8-1 ~]# dnf repolist
Last metadata expiration check: 0:00:07 ago on Sun 05 Apr 2020 01:40:56 PM IST.
repo id                          repo name                                                               status
AppStream                        CentOS-8 - AppStream                                                    5,124
BaseOS                           CentOS-8 - Base                                                         2,126
HighAvailability                 CentOS-8 - HA                                                             130
PowerTools                       CentOS-8 - PowerTools                                                   1,525
*epel                            Extra Packages for Enterprise Linux 8 - x86_64                          5,138
*epel-modular                    Extra Packages for Enterprise Linux Modular 8 - x86_64                      0
extras                           CentOS-8 - Extras                                                          12

Install pacemaker Linux and other high availability rpms using DNF which is the default package manager in RHEL/CentOS 8.

[root@centos8-1 ~]# dnf install pcs pacemaker fence-agents-all -y

Step 3: Start pacemaker cluster manager service

Before we configure Linux HA Cluster, the pcs daemon must be started and enabled to start at boot time on each node of the Linux HA Cluster. This daemon works with the pcs command-line interface to manage synchronizing the corosync configuration across all nodes in the cluster.

[root@centos8-1 ~]# systemctl enable pcsd.service --now
Created symlink /etc/systemd/system/multi-user.target.wants/pcsd.service → /usr/lib/systemd/system/pcsd.service.

Step 4: Assign password to hacluster

hacluster user is created after installing high availability cluster rpms with disabled password. Set a password for user hacluster on each node in the Linux HA cluster and authenticate user hacluster for each node in the cluster on the node from which you will be running the pcs commands.

[root@centos8-1 ~]# passwd hacluster
Changing password for user hacluster.
New password:
Retype new password:
passwd: all authentication tokens updated successfully.

Step 5: Configure firewalld

If you are running the firewalld daemon, enable the ports that are required by the CentOS High Availability repo on all the Cluster nodes.

[root@centos8-1 ~]# firewall-cmd --permanent --add-service=high-availability
success

[root@centos8-1 ~]# firewall-cmd --reload
success

HINT

During the POC stage you can choose to disable firewalld (systemctl stop firewalld) to verify the initial configuration and later enable the firewall before qualifying for production

Step 6: Configure Corosync

On any one of the cluster node, use pcs host auth to authenticate as the hacluster user. Use the below syntax:

pcs host auth [node1] [node2] [node3] ..

With RHEL/CentOS 7 High Availability Cluster with pacemaker Linux, we used pcs cluster auth to authenticate the clusters but this has changed with RHEL/CentOS 8 to "pcs host auth"

[root@centos8-1 ~]# pcs host auth centos8-1.example.com centos8-2.example.com centos8-3.example.com
Username: hacluster
Password:
centos8-2.example.com: Authorized
centos8-3.example.com: Authorized
centos8-1.example.com: Authorized

Now since the cluster nodes are authorized so we can proceed with the next step to setup high availability cluster. Here I am creating a three node Linux HA cluster with the name "my_cluster"

[root@centos8-1 ~]# pcs cluster setup my_cluster centos8-1.example.com centos8-2.example.com centos8-3.example.com
No addresses specified for host 'centos8-1.example.com', using 'centos8-1.example.com'
No addresses specified for host 'centos8-2.example.com', using 'centos8-2.example.com'
No addresses specified for host 'centos8-3.example.com', using 'centos8-3.example.com'
Destroying cluster on hosts: 'centos8-1.example.com', 'centos8-2.example.com', 'centos8-3.example.com'...
centos8-1.example.com: Successfully destroyed cluster
centos8-3.example.com: Successfully destroyed cluster
centos8-2.example.com: Successfully destroyed cluster
Requesting remove 'pcsd settings' from 'centos8-1.example.com', 'centos8-2.example.com', 'centos8-3.example.com'
centos8-1.example.com: successful removal of the file 'pcsd settings'
centos8-3.example.com: successful removal of the file 'pcsd settings'
centos8-2.example.com: successful removal of the file 'pcsd settings'
Sending 'corosync authkey', 'pacemaker authkey' to 'centos8-1.example.com', 'centos8-2.example.com', 'centos8-3.example.com'
centos8-1.example.com: successful distribution of the file 'corosync authkey'
centos8-1.example.com: successful distribution of the file 'pacemaker authkey'
centos8-2.example.com: successful distribution of the file 'corosync authkey'
centos8-2.example.com: successful distribution of the file 'pacemaker authkey'
centos8-3.example.com: successful distribution of the file 'corosync authkey'
centos8-3.example.com: successful distribution of the file 'pacemaker authkey'
Sending 'corosync.conf' to 'centos8-1.example.com', 'centos8-2.example.com', 'centos8-3.example.com'
centos8-1.example.com: successful distribution of the file 'corosync.conf'
centos8-3.example.com: successful distribution of the file 'corosync.conf'
centos8-2.example.com: successful distribution of the file 'corosync.conf'
Cluster has been successfully set up.

Step 7: Start and Verify Cluster

Step 7.1: Start Linux HA Cluster

Now that corosync is configured, it is time to start the cluster. The command below will start corosync and pacemaker Linux on all our nodes in the cluster.

[root@centos8-1 ~]# pcs cluster start --all

Step 7.2: Verify corosync installation

corosync-cfgtool is a tool for displaying and configuring active parameters within corosync

[root@centos8-1 ~]# corosync-cfgtool -s
Printing link status.
Local node ID 1
LINK ID 0
        addr    = 10.10.10.12
        status:
                nodeid  1:      link enabled:1  link connected:1
                nodeid  2:      link enabled:1  link connected:1
                nodeid  3:      link enabled:1  link connected:1

Here,

-s      Displays  the  status  of the current links on this node for UDP/UDPU, with extended status for KNET. 
        If any interfaces are faulty, 1 is returned by the binary. If all interfaces are active 0 is returned to the shell.

We can see here that everything appears normal with our fixed IP address (not a 127.0.0.x loopback address) listed as the id, and no faults for the status.
If you see something different, you might want to start by checking the node’s network, firewall and SELinux configurations.

Next, check the membership and quorum APIs:

[root@centos8-1 ~]# corosync-cmapctl | grep members
runtime.members.1.config_version (u64) = 0
runtime.members.1.ip (str) = r(0) ip(10.10.10.12)
runtime.members.1.join_count (u32) = 1
runtime.members.1.status (str) = joined
runtime.members.2.config_version (u64) = 0
runtime.members.2.ip (str) = r(0) ip(10.10.10.16)
runtime.members.2.join_count (u32) = 3
runtime.members.2.status (str) = joined
runtime.members.3.config_version (u64) = 0
runtime.members.3.ip (str) = r(0) ip(10.10.10.17)
runtime.members.3.join_count (u32) = 3
runtime.members.3.status (str) = joined

Check the status of corosync across cluster nodes

[root@centos8-1 ~]# pcs status corosync

Membership information
----------------------
    Nodeid      Votes Name
         1          1 centos8-1.example.com (local)
         2          1 centos8-2.example.com
         3          1 centos8-3.example.com

Step 7.3: Verify Pacemaker Linux Installation

Now that we have confirmed that Corosync is functional, we can check the rest of the stack. Pacemaker Linux has already been started, so verify the necessary processes are running:

[root@centos8-1 ~]# pcs status
Cluster name: my_cluster

WARNINGS:
No stonith devices and stonith-enabled is not false

Stack: corosync
Current DC: centos8-1.example.com (version 2.0.2-3.el8_1.2-744a30d655) - partition with quorum
Last updated: Sun Apr  5 15:25:03 2020
Last change: Sun Apr  5 15:24:23 2020 by hacluster via crmd on centos8-1.example.com

3 nodes configured
0 resources configured

Online: [ centos8-1.example.com centos8-2.example.com centos8-3.example.com ]

No resources

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

We will also enable corosync and pacemaker service to automatically start on boot on all the Linux HA Cluster nodes

[root@centos8-1 ~]# systemctl enable corosync
Created symlink /etc/systemd/system/multi-user.target.wants/corosync.service → /usr/lib/systemd/system/corosync.service.

[root@centos8-1 ~]# systemctl enable pacemaker
Created symlink /etc/systemd/system/multi-user.target.wants/pacemaker.service → /usr/lib/systemd/system/pacemaker.service.

Step 8: Disable Fencing (Optional)

A High Availability Cluster requires that you configure fencing for the cluster to control the cluster nodes. Fencing is also known as STONITH, an acronym for "Shoot The Other Node In The Head", since the most popular form of fencing is cutting a host’s power.

However in this article I only intend to show the usage of basic Pacemaker Linux commands, so we will disable fencing by setting the stonith-enabled cluster option to false.

[root@centos8-1 ~]# pcs property set stonith-enabled=false

WARNING

The use of stonith-enabled=false is completely inappropriate for a production cluster. It tells the cluster to simply pretend that the nodes which fails are safely in powered off state. Some vendors will refuse to support clusters that STONITH disabled.

Step 9: Create an Active/Passive HA LVM Cluster

We will configure active/passive Linux HA Cluster using a LVM resource.

Step 9.1: Define system_id_source

With RHEL/CentOS 7 we used to create LVM on all the cluster nodes to setup HA LVM Cluster. But with RHEL/CentOS 8 we can use system_id_source feature in lvm.conf which will use system ID to communicate within HA LVM Cluster.

The method LVM uses to set the local system ID.
The lvm( system ID restricts Volume Group (VG) access to one host.
This is useful when a VG is placed on shared storage devices, or when local devices are visible to both host and guest operating systems.
In cases like these, a VG can be visible to multiple hosts at once, and some mechanism is needed to protect it from being used by more than one host at a time.

For more information check man page of lvmsystemid

Update /etc/lvm/lvm.conf file on all the cluster nodes for system_id_source and assign its value as uname

system_id_source = "uname"

Verify that the LVM system ID on the node matches the uname for the node.

[root@centos8-1 ~]# lvm systemid
  system ID: centos8-1.example.com

[root@centos8-1 ~]# uname -n
centos8-1.example.com

Step 9.2: Rebuild initramfs

Rebuild initramfs on all the cluster nodes with the following steps. Take backup of current initramfs file on all of the cluster nodes.

# cp /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.$(date +%m-%d-%H%M%S).bak

Rebuild initramfs file on both of the cluster nodes.

# dracut -f -v

Reboot all of the cluster nodes.

Step 9.3: Create logical volume on shared storage

Once all the Linux HA Cluster nodes are back online, proceed with creation on PV/VG/LV from any one of the cluster node. In the following example VG name is cluster_vg and LV name is cluster_lv.

Perform below steps on only any one of the cluster nodes

My shared storage is mapped to /dev/sdc

[root@centos8-1 ~]# lsscsi
[0:0:0:0]    cd/dvd  VBOX     CD-ROM           1.0   /dev/sr0
[1:0:0:0]    cd/dvd  VBOX     CD-ROM           1.0   /dev/sr1
[2:0:0:0]    disk    ATA      VBOX HARDDISK    1.0   /dev/sda
[3:0:0:0]    disk    ATA      VBOX HARDDISK    1.0   /dev/sdb
[4:0:0:0]    disk    LIO-ORG  sdb1             4.0   /dev/sdc

Create physical volume on /dev/sdc

[root@centos8-1 ~]# pvcreate /dev/sdc
  Physical volume "/dev/sdc" successfully created.

Create a new Volume Group cluster_vg that consists of the physical volume /dev/sdc

[root@centos8-1 ~]# vgcreate cluster_vg /dev/sdc
  Volume group "cluster_vg" successfully created with system ID centos8-1.example.com

Verify that the new volume group has the system ID of the node on which you are running and from which you created the volume group.

[root@centos8-1 ~]# vgs -o+systemid
  VG         #PV #LV #SN Attr   VSize  VFree  System ID
  cluster_vg   1   1   0 wz--n-  9.96g     0  centos8-1.example.com
  rhel         2   5   0 wz--n- 22.49g <3.00g

Create logical volume cluster_lv

[root@centos8-1 ~]# lvcreate -l 100%FREE -n cluster_lv cluster_vg
  Logical volume "cluster_lv" created.

On other cluster nodes you will not find this volume group cluster_vg

[root@centos8-2 ~]# vgs -o+systemid
  VG   #PV #LV #SN Attr   VSize  VFree  System ID
  rhel   2   5   0 wz--n- 22.49g <3.00g

You can see that the new VG is only visible on centos8-1 on which we created and has the System ID same as hostname. While on second node, vgs command does not result in showing cluster_vg as it has centos8-1.example.com as its current System ID.

Create a filesystem of your requirement (ext4/xfs) over the newly created LVM device. The mkfs command should be executed on the cluster node on which the VG is active. I am creating XFS file system on this new logical volume.

[root@centos8-1 ~]# mkfs.xfs /dev/cluster_vg/cluster_lv
meta-data=/dev/cluster_vg/cluster_lv isize=512    agcount=4, agsize=653056 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=1
data     =                       bsize=4096   blocks=2612224, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=2560, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

Create mount points for your HA Logical Volumes on all the cluster nodes

[root@centos8-1 ~]# mkdir /lvm_cluster

I am creating /lvm_cluster on all the cluster nodes on which my logical volume resource will be mounted. You can use a different mount point as per your requirement

Step 9.4: Create cluster resource

Before we create our LVM cluster resource, let us verify the cluster health:

[root@centos8-1 ~]# pcs status
Cluster name: my_cluster
Stack: corosync
Current DC: centos8-1.example.com (version 2.0.2-3.el8_1.2-744a30d655) - partition with quorum
Last updated: Sun Apr  5 15:30:22 2020
Last change: Sun Apr  5 15:30:12 2020 by root via cibadmin on centos8-1.example.com

3 nodes configured
0 resources configured

Online: [ centos8-1.example.com centos8-2.example.com centos8-3.example.com ]

No resources

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

We have all the cluster nodes marked as online.

Next create a cluster resource with resource agent ocf:heartbeat:LVM-activate so the VG can be managed by Cluster.

[root@centos8-1 ~]# pcs resource create my-vg ocf:heartbeat:LVM-activate vgname=cluster_vg activation_mode=exclusive vg_access_mode=system_id --group HA-LVM

where the resource name is my-vg and resource group name is HA-LVM. These values can be updated as per your requirement.

Create a cluster resource with resource agent ocf:heartbeat:Filesystem so cluster will control the mount of filesystem & will make it available on one of the cluster node.

[root@centos8-1 ~]# pcs resource create my-fs ocf:heartbeat:Filesystem device=/dev/cluster_vg/cluster_lv directory=/lvm_cluster fstype=xfs --group HA-LVM

where the resource name is my-fs, directory where the filesystem created over device /dev/cluster_vg/cluster_lv will be mounted at /lvm_cluster, type of filesystem is xfs and this resource is kept in same resource group i.e. HA-LVM.

Step 10: Verify High Availability Cluster Configuration

Next check the cluster resource status for both the Cluster HA LVM resource:

[root@centos8-1 ~]# pcs status
Cluster name: my_cluster
Stack: corosync
Current DC: centos8-2.example.com (version 2.0.2-3.el8_1.2-744a30d655) - partition with quorum
Last updated: Sun Apr  5 15:46:40 2020
Last change: Sun Apr  5 15:46:37 2020 by root via cibadmin on centos8-1.example.com

3 nodes configured
2 resources configured

Online: [ centos8-1.example.com centos8-2.example.com centos8-3.example.com ]

Full list of resources:

 Resource Group: HA-LVM
     my-fs      (ocf::heartbeat:Filesystem):    Started centos8-1.example.com
     my-vg      (ocf::heartbeat:LVM-activate):  Started centos8-1.example.com

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

The VG is now activated on centos8-1.example.com and filesystem is mounted on the same node.

This can be validate by vgs, lvs, df and other similar commands

[root@centos8-1 ~]# mount | grep lvm_cluster
/dev/mapper/cluster_vg-cluster_lv on /lvm_cluster type xfs (rw,relatime,attr2,inode64,noquota)

[root@centos8-1 ~]# df -Th | grep lvm_cluster
/dev/mapper/cluster_vg-cluster_lv xfs        10G  104M  9.9G   2% /lvm_cluster

Now we will change the Cluster status of centos8-1 from active to standby to make sure our HA LVM cluster resource automatically starts from a different cluster node

[root@centos8-1 ~]# pcs node standby centos8-1.example.com

Now if we check the Linux HA Cluster status, centos8-1 is on standby and our Cluster HA LVM resource is running on centos8-2 cluster node.

[root@centos8-2 ~]# pcs status
Cluster name: my_cluster
Stack: corosync
Current DC: centos8-1.example.com (version 2.0.2-3.el8_1.2-744a30d655) - partition with quorum
Last updated: Sun Apr  5 19:29:10 2020
Last change: Sun Apr  5 19:26:39 2020 by hacluster via crmd on centos8-1.example.com

3 nodes configured
2 resources configured

Node centos8-1.example.com: standby
Online: [ centos8-2.example.com centos8-3.example.com ]

Full list of resources:

 Resource Group: HA-LVM
     my-vg      (ocf::heartbeat:LVM-activate):  Started centos8-2.example.com
     my-fs      (ocf::heartbeat:Filesystem):    Started centos8-2.example.com

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

You can also verify the mounted partitions on centos8-2

[root@centos8-2 ~]# vgs
  VG         #PV #LV #SN Attr   VSize  VFree
  cluster_vg   1   1   0 wz--n-  9.96g     0
  rhel         2   5   0 wz--n- 22.49g <3.00g

[root@centos8-2 ~]# mount | grep lvm_cluster
/dev/mapper/cluster_vg-cluster_lv on /lvm_cluster type xfs (rw,relatime,attr2,inode64,noquota)

[root@centos8-2 ~]# df -Th /lvm_cluster/
Filesystem                        Type  Size  Used Avail Use% Mounted on
/dev/mapper/cluster_vg-cluster_lv xfs    10G  104M  9.9G   2% /lvm_cluster

IMPORTANT

In case you missed to make system_id_source changes on all the cluster nodes then it is possible that when you turn your active cluster node to standby, the HA LVM cluster resource may fail to start with error "The specified vg_access_mode doesn't match the lock_type on VG metadata". To overcome this error scenario,

Manually add the system ID of the VG using the current hostname:

# vgchange --systemid $(uname -n)

After the system ID is manually set, the cluster will manage assign the appropriate system ID as the resource fails over from that point forward.

System ID can be verified using the followingvgscommand

# vgs -o+systemid

Lastly I hope the steps from the article to setup High Availability Cluster on CentOS 8 Linux was helpful. So, let me know your suggestions and feedback using the comment section.

References:
Pro Linux High Availability Clustering
Setup CentOS HA Cluster
Red Hat 8 Cluster Configuration
How to configure HA-LVM Cluster using system_id in RHEL/CentOS 8 ?