In my last article I had explained about the different kinds of clustering and their architecture. Before you start with the configuration of High Availability Cluster, you must be aware of the basic terminologies related to Clustering. In this article I will share step by step guide to configure high availability cluster in CentOS Linux 7 using 3 virtual machines. These virtual machines are running on my Windows Host and are created on Oracle Virtual Box.
Features of Highly Available Clusters?
The ClusterLabs stack, incorporating
Pacemaker defines an Open Source, High Availability cluster offering suitable for both small and large deployments.
- Detection and recovery of machine and application-level failures
- Supports practically any redundancy configuration
- Supports both quorate and resource-driven clusters
- Configurable strategies for dealing with quorum loss (when multiple machines fail)
- Supports application startup/shutdown ordering, regardless of which machine(s) the applications are on
- Supports applications that must/must-not run on the same machine
- Supports applications which need to be active on multiple machines
- Supports applications with multiple modes (eg. master/slave)
What Is Pacemaker?
We will use pacemaker and corosync to configure High Availability Cluster. Pacemaker is a cluster resource manager, that is, a logic responsible for a life-cycle of deployed software — indirectly perhaps even whole systems or their interconnections — under its control within a set of computers (a.k.a. nodes) and driven by prescribed rules.
It achieves maximum availability for your cluster services (a.k.a. resources) by detecting and recovering from node- and resource-level failures by making use of the messaging and membership capabilities provided by your preferred cluster infrastructure (either Corosync or Heartbeat), and possibly by utilizing other parts of the overall cluster stack.
Bring up Environment
First of all before we start to Configure High Availability Cluster, let us bring up our virtual machines with CentOS 7. Below are my vm’s configuration details
|OS||CentOS 7||CentOS 7||CentOS 7|
|IP Address (Internal)||10.0.2.20||10.0.2.21||10.0.2.22|
|IP Address (External)||DHCP||DHCP||DHCP|
/etc/hosts file and add the IP address, followed by an FQDN and a short cluster node name for every available cluster node network interface.
[[email protected] ~]# cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 10.0.2.20 node1.example.com node1 10.0.2.21 node2.example.com node2 10.0.2.22 node3.example.com node3 [[email protected] ~]# cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 10.0.2.20 node1.example.com node1 10.0.2.21 node2.example.com node2 10.0.2.22 node3.example.com node3 [[email protected] ~]# cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 10.0.2.20 node1.example.com node1 10.0.2.21 node2.example.com node2 10.0.2.22 node3.example.com node3
To finish, you must check and confirm connectivity among the cluster nodes. You can do this by simply releasing a ping command to every cluster node.
Stop and disable Network Manager on all the nodes
[[email protected] ~]# systemctl disable NetworkManager Removed symlink /etc/systemd/system/dbus-org.freedesktop.NetworkManager.service. Removed symlink /etc/systemd/system/multi-user.target.wants/NetworkManager.service.
After removing or disabling the NetworkManager service, you must restart the networking service.
To configure High Availability Cluster it is important that all your nodes in the cluster are connected and synced to a NTP server. Since my machines are in IST timezone I will use the India pool of NTP servers.
[[email protected] ~]# systemctl start ntpd [[email protected] ~]# systemctl enable ntpd Created symlink from /etc/systemd/system/multi-user.target.wants/ntpd.service to /usr/lib/systemd/system/ntpd.service.
Install pre-requisite rpms
The high availability package is not part of CentOS repo so you will need
[[email protected] ~]# yum install epel-release -y
pcs is the pcaemaker software and all it’s dependencies The
fence-agents-all will install all the default fencing agents which is available for Red Hat Cluster
[[email protected] ~]# yum install pcs fence-agents-all -y
Add firewall rules
[[email protected] ~]# firewall-cmd --permanent --add-service=high-availability; firewall-cmd --reload success success
If you run into any problems during testing, you might want to disable the firewall and SELinux entirely until you have everything working. This may create significant security issues and should not be performed on machines that will be exposed to the outside world, but may be appropriate during development and testing on a protected host.
Step by Step Guide to configure High Availability Cluster
The installed packages will create a
hacluster user with a disabled password. While this is fine for running
pcs commands locally, the account needs a login password in order to perform such tasks as syncing the corosync configuration, or starting and stopping the cluster on other nodes.
password for the Pacemaker cluster on each cluster node using the following command. Here my password is
[[email protected] ~]# echo password | passwd --stdin hacluster Changing password for user hacluster. passwd: all authentication tokens updated successfully.
Start the Pacemaker cluster manager on each node:
[[email protected] ~]# systemctl enable --now pcsd Created symlink from /etc/systemd/system/multi-user.target.wants/pcsd.service to /usr/lib/systemd/system/pcsd.service.
To configure Openstack High Availability we need to configure corosync on any one of the node, use
pcs cluster auth to authenticate as the
[[email protected] ~]# pcs cluster auth node1.example.com node2.example.com node3.example.com Username: hacluster Password: node2.example.com: Authorized node1.example.com: Authorized node3.example.com: Authorized
Finally, run the following commands on the first node to create the cluster and start it. Here our cluster name will be
[[email protected] ~]# pcs cluster setup --start --name mycluster node1.example.com node2.example.com node3.example.com Destroying cluster on nodes: node1.example.com, node2.example.com, node3.example.com... node3.example.com: Stopping Cluster (pacemaker)... node2.example.com: Stopping Cluster (pacemaker)... node1.example.com: Stopping Cluster (pacemaker)... node1.example.com: Successfully destroyed cluster node2.example.com: Successfully destroyed cluster node3.example.com: Successfully destroyed cluster Sending 'pacemaker_remote authkey' to 'node1.example.com', 'node2.example.com', 'node3.example.com' node1.example.com: successful distribution of the file 'pacemaker_remote authkey' node2.example.com: successful distribution of the file 'pacemaker_remote authkey' node3.example.com: successful distribution of the file 'pacemaker_remote authkey' Sending cluster config files to the nodes... node1.example.com: Succeeded node2.example.com: Succeeded node3.example.com: Succeeded Starting cluster on nodes: node1.example.com, node2.example.com, node3.example.com... node2.example.com: Starting Cluster... node1.example.com: Starting Cluster... node3.example.com: Starting Cluster... Synchronizing pcsd certificates on nodes node1.example.com, node2.example.com, node3.example.com... node2.example.com: Success node1.example.com: Success node3.example.com: Success Restarting pcsd on the nodes in order to reload the certificates... node1.example.com: Success node3.example.com: Success node2.example.com: Success
Enable the cluster service i.e.
corosync so they can automatically start on boot
[[email protected] ~]# pcs cluster enable --all node1.example.com: Cluster Enabled node2.example.com: Cluster Enabled node3.example.com: Cluster Enabled
Lastly check the cluster status
[[email protected] ~]# pcs cluster status Cluster Status: Stack: corosync Current DC: node2.example.com (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum Last updated: Sat Oct 27 08:41:52 2018 Last change: Sat Oct 27 08:41:18 2018 by hacluster via crmd on node2.example.com 3 nodes configured 0 resources configured PCSD Status: node3.example.com: Online node1.example.com: Online node2.example.com: Online
To check the cluster’s Quorum status using the
[[email protected] ~]# corosync-quorumtool Quorum information ------------------ Date: Sat Oct 27 08:43:22 2018 Quorum provider: corosync_votequorum Nodes: 3 Node ID: 1 Ring ID: 1/8 Quorate: Yes Votequorum information ---------------------- Expected votes: 3 Highest expected: 3 Total votes: 3 Quorum: 2 Flags: Quorate Membership information ---------------------- Nodeid Votes Name 1 1 node1.example.com (local) 2 1 node2.example.com 3 1 node3.example.com
To get the LIVE status of the cluster use crm_mon
[[email protected] ~]# crm_mon Connection to the CIB terminated
Verify the cluster configuration
Before we make any changes, it’s a good idea to check the validity of the configuration.
[[email protected] ~]# crm_verify -L -V error: unpack_resources: Resource start-up disabled since no STONITH resources have been defined error: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option error: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity Errors found during check: config not valid
As you can see, the tool has found some errors.
In order to guarantee the safety of your data,  fencing (also called
STONITH) is enabled by default. However, it also knows when no STONITH configuration has been supplied and reports this as a problem (since the cluster will not be able to make progress if a situation requiring node fencing arises).
We will disable this feature for now and configure it later. To disable STONITH, set the stonith-enabled cluster option to false on both the controller nodes:
[[email protected] ~]# pcs property set stonith-enabled=false
Next re-validate the cluster
[[email protected] ~]# crm_verify -L -V
This all about Configure High Availability Cluster on Linux, In my next article I will share the steps to configure cluster resources and resource constraints in Cluster with examples.
Lastly I hope the steps from this article to configure high availability cluster on Linux was helpful. So, let me know your suggestions and feedback using the comment section.