Before starting, I hope you are familiar with different Cluster types and it's Architecture. In this article I will explain the steps to configure two node cluster on CentOS / RHEL 7 Linux node. Now for the sake of this article I am using Oracle VirtualBox installed on my Linux Server.
How a two-node cluster is different from a cluster with 3 or higher node?
Quorum is the minimum number of cluster member votes required to perform a cluster operation. Without quorum, the cluster cannot operate. Quorum is achieved when the majority of cluster members vote to execute a specific cluster operation. If the majority of the cluster members do not vote, the cluster operation will not be performed.
In a two-node cluster configuration, the maximum number of expected votes is two with each cluster node has one vote. In a failure scenario when any one of the node goes down, only one node is active and it has only one vote. In such a configuration, quorum cannot be reached, since a majority of the votes cannot be delivered. The single cluster node is stuck at 50 percent and will never get past it. Therefore, the cluster will never operate normally this way.
2-Node Cluster Challenges
- Quorum problems: More than half is not possible after a failure in the 2-node cluster
- Split brain can happen. With fencing enabled, both nodes will try to fence one another.
- The cluster won't start until all nodes are available. This is something that can easily be disabled using the
wait_for_all
parameter
It is recommended to create a two node cluster using
wait_for_all=0
as shown in the example below. When creating a 2-node cluster, the two_node
mode will be enabled in corosync.conf
and will automatically disappear if you add more nodes to your cluster.
pcs cluster setup --start --enable --name cluster_name --wait_for_all=0 node1.example.com node2.example.com
Now earlier for me, I had shared step by step article to configure a three node HA Cluster, on the same setup I have removed node3.example.com
so that I can use the same setup to demonstrate this article
[root@node1 ~]# corosync-quorumtool Quorum information ------------------ Date: Wed Dec 26 16:14:02 2018 Quorum provider: corosync_votequorum Nodes: 2 Node ID: 1 Ring ID: 1/368 Quorate: Yes Votequorum information ---------------------- Expected votes: 2 Highest expected: 2 Total votes: 2 Quorum: 2 Flags: Quorate Membership information ---------------------- Nodeid Votes Name 1 1 node1.example.com (local) 2 1 node2.example.com
Here I need minimum two votes to keep my Cluster alive and functioning.
What if one of my cluster node goes down?
Let us manually try to stop one of my cluster node
[root@node2 ~]# pcs cluster stop node2.example.com
Error: Stopping the node(s) will cause a loss of the quorum, use --force to override
Now since I only have two nodes in my cluster, the service won't allow me easily to shutdown the cluster node. Hence I need to use --force
[root@node2 ~]# pcs cluster stop node2.example.com --force
node2.example.com: Stopping Cluster (pacemaker)...
node2.example.com: Stopping Cluster (corosync)...
So let us now check the status of our cluster
[root@node1 ~]# corosync-quorumtool Quorum information ------------------ Date: Wed Dec 26 16:15:54 2018 Quorum provider: corosync_votequorum Nodes: 1 Node ID: 1 Ring ID: 1/372 Quorate: No Votequorum information ---------------------- Expected votes: 2 Highest expected: 2 Total votes: 1 Quorum: 2 Activity blocked Flags: Membership information ---------------------- Nodeid Votes Name 1 1 node1.example.com (local)
So as expected our cluster is not in Quorate state any more. Since the expected vote is higher than total votes.
So we must do some additional configuration to have a two-node cluster. Now with CentOS 6 this could have been achieved using a Quorum Disk.
Why quorum disk is not possible with Cluster on CentOS 7?
- The quorum provider in the CentOS 7 cluster stack is corosync.
- The CentOS 7 cluster stack, as opposed to the CentOS 6 cluster stack, only provides one option to work around the quorum issue, which is a two node-specific cluster configuration.
- The CentOS 7 cluster stack lacks the Quorum disk workaround option, mainly due to the additional Quorum configuration options provided by Corosync version 2.
- These additional Corosync version 2 options actually make the Quorum disk unnecessary in a two node or multinode cluster configuration.
- The new Quorum features of Corosync version 2 are definitely welcome, are well thought out, and can replace the need for a Quorum disk in every way.
As already mentioned, the quorum provider in the CentOS 7 cluster stack is Corosync version 2. Therefore, the cluster quorum configuration is provided in the corosync.conf
configuration file. With the previous Corosync version (version 1), the quorum capabilities were provided by CMAN; with Corosync version 2 included in the CentOS 7 cluster stack, the quorum capabilities are provided by Corosync itself, specifically by the votequorum
process.
How to configure two-node cluster with CentOS / RHEL 7 Linux?
If you are configuring a two node cluster on the CentOS 7 cluster stack, you should enable the two_node
cluster option. Before starting with the configuration changes, stop your cluster services
[root@node1 ~]# pcs cluster stop --all node1.example.com: Stopping Cluster (pacemaker)... node2.example.com: Stopping Cluster (pacemaker)... node1.example.com: Stopping Cluster (corosync)... node2.example.com: Stopping Cluster (corosync)...
Next add the following parameter to the corosync.conf
under quorum section:
# vim /etc/corosync/corosync.conf quorum { provider: corosync_votequorum two_node: 1 wait_for_all: 0 }
By enabling the two_node
cluster option, the quorum is artificially set to 1, which means that the cluster will be quorate and continue to operate even in the event of a failure of one cluster node.
By enabling the
two_node
cluster option automatically enables an additional wait_for_all
option.Let us check the cluster status, as you see we have additional flags enabled for our two node cluster
[root@node1 ~]# corosync-quorumtool Quorum information ------------------ Date: Wed Dec 26 16:08:19 2018 Quorum provider: corosync_votequorum Nodes: 2 Node ID: 1 Ring ID: 1/356 Quorate: Yes Votequorum information ---------------------- Expected votes: 2 Highest expected: 2 Total votes: 2 Quorum: 1 Flags: 2Node Quorate Membership information ---------------------- Nodeid Votes Name 1 1 node1.example.com (local) 2 1 node2.example.com
Now let us try to stop one of the cluster node
[root@node1 ~]# pcs cluster stop node2.example.com node2.example.com: Stopping Cluster (pacemaker)... node2.example.com: Stopping Cluster (corosync)...
As you observed this time the tool didnot prevented us from stopping the cluster node as it did earlier.
Let us check the quorum status
[root@node1 ~]# corosync-quorumtool Quorum information ------------------ Date: Wed Dec 26 16:09:30 2018 Quorum provider: corosync_votequorum Nodes: 1 Node ID: 1 Ring ID: 1/360 Quorate: Yes Votequorum information ---------------------- Expected votes: 2 Highest expected: 2 Total votes: 1 Quorum: 1 Flags: 2Node Quorate Membership information ---------------------- Nodeid Votes Name 1 1 node1.example.com (local)
So our cluster is functioning even with one node active only.
Let us also understand some other basic terminologies associated with corosync configuration
- wait_for_all (default: 0): The general behavior of the
votequorum
process is to switch from inquorate to quorate as soon as possible. As soon as the majority of nodes are visible to each other, the cluster becomes quorate. Thewait_for_all
option, orWFA
, allows you to configure the cluster to become quorate for the first time, but only after all the nodes have become visible. If thetwo_node
option is enabled, thewait_for_all
option is automatically enabled as well. - last_man_standing (default: 0) / last_man_standing_window (default: 10): The general behavior of the
votequorum
process is to set theexpected_votes
parameter and quorum at startup. Enabling thelast_man_standing
option, orLMS
, allows the cluster to dynamically recalculate theexpected_votes
parameter and quorum under specific circumstances. It is important to enable theWFA
option when using theLMS
option in high-availability clusters. - auto_tie_breaker (default: 0): When the
auto_tie_breaker
option, or ATB, is enabled, the cluster can suffer because of up to 50 percent of the nodes failing at the same time. The cluster partition, or the set of nodes that are still in contact with the node that has the lowestnodeid
parameter, will remain quorate. The other nodes will be inquorate.
You must always disable fencing in a two-node cluster configuration without the Quorum disk to avoid fence race scenarios, where the two cluster nodes kill each other.
Lastly I hope the steps from the article to configure two-node cluster on Linux ( CentOS / RHEL 7 ) was helpful. So, let me know your suggestions and feedback using the comment section.
On my Test machine i set the wait_for_all=0 and if i turn off one node , State remains quorate and same if i turn on only one node. But resource remains stop even when cluster state becomes quorate.
Resources only start when both nodes are up
hi, some supplement,
i have the same corosync-quorum setting following the article , doesn’t have quorum disk,but added 2 fence_ipmilan to cluster, every time after 1 node restarted ,the other one will be fenced to poweroff. the stonith action is off, fence_ipmilan has default setting. i doubt this is caused by fence/stonith
setting without quorum disk?
Two node clusters are always little tricky,
wait_for_all
requires that a node see all other nodes at least once before becoming quorate. This helps prevent a split-brain scenario in which multiple cluster partitions claim quorum independently of one another. Together,two_node
andwait_for_all
options allow one node to maintain quorum if the other node fails. However, if the healthy node reboots or otherwise leaves the cluster and has to rejoin, it cannot form quorum until it sees the failed node. Since the failed node is down, it is necessary to bypass thiswait_for_all
requirement in order to resume resource management.So you may configure your cluster without
wait_for_all
parameter and in terms of fencing, you may add a delay for a certain node so that the fencing device would wait before removing a node from the cluster.hi,thanks for your reply.
so do you mean in this case we don’t need other votequorum disk and also can add fence devices to cluster?
does “configure your cluster without wait_for_all parameter” mean set wait_for_all =0 ?
BR
No
wait_for_all=0
means disabling it. You can make it to1
or just remove that variable.hi,admin,this is an wonderful article, i followed the steps to set up 2-node cluster, but now i have an issue and hope you can help.
i have 2-node cluster with redhat linux 7.5 and fc san shared storage.
2-node cluster doesn’t need votequorum to failover? if so, can’t add fence resource to them also?
now i want to use fence ,so must enable votequorum to cluster? if i have share storage disk as votequorum disk (/dev/mapper/mpatha), how to configure it in cluster?
do you have such solution or could you give more detail about this, thank you in advance!
shall
We’re ɑ group of volunteers and opening a new scheme in our community.
Your site οffered us with valuable info to work on.
You have done an impressive job and our entire community will
be thankful to you.