In this article I will share Step-by-Step tutorial to configure DRBD Cluster File System using Pacemaker 2.0 in RHEL/CentOS 8. We will cover below topics in this article:
- Setup DRBD Cluster File System
- Adding DRBD as Master/Slave Resource
- Adding DRBD as highly available resource
- Verifying Cluster fail over
Pre-requisites
I hope you are familiar with High Availability Cluster Architecture and Basic terminologies. Below are the mandatory pre-requisites which must be performed in the provided order before you setup DRBD Cluster File System using Pacemaker in RHEL/CentOS 8. You may have already completed some of these steps:
Install KVM
Bring up your physical server with RHEL/CentOS 8 and install KVM on your physical Linux server
Create KVM Virtual Machines
Since we are creating DRBD Cluster File System using LVM Virtual Machines, hence we must create multiple KVM Virtual Machines.
You can create KVM Virtual Machines using any of the following method:
⇒ Cockpit Web Console GUI
⇒ Virtual Manager (Deprecated starting RHEL/CentOS 8)
⇒ virt-install Command Line Tool
Setup KVM High Availability Cluster
Since we intend to setup DRBD Cluster File System, so we must need one High Availability Cluster. So you must configure KVM HA Cluster on your RHEL/CentOS 8 Linux server
Install and Configure DRBD
Next you need install and configure DRBD (Distributed Replicated Block Devices) on these KVM Virtual Machines to setup Linux Disk Replication. Once you have a working DRBD configuration, then only we can use that configuration for DRBD Cluster File System on our HA Cluster.
Lab Environment
I will use my existing KVM High Availability Cluster where I have installed and configured DRBD Storage
KVM Host | KVM VM1 | KVM VM2 | KVM VM3 | |
---|---|---|---|---|
Hostname | rhel-8 | centos8-2 | centos8-3 | centos8-4 |
FQDN | rhel-.example.com | centos8-2.example.com | centos8-3.example.com | centos8-4.example.com |
NIC 1 (Public IP) | 10.43.138.12 | NA | NA | NA |
NIC 2 (Private IP) | 192.168.122.1 (NAT) | 192.168.122.10 | 192.168.122.11 | 192.168.122.12 |
OS | RHEL 8.1 | CentOS 8.1 | CentOS 8.1 | CentOS 8.1 |
Memory(GB) | 128 | 10 | 10 | 10 |
Storage(GB) | 500 | 40 | 40 | 40 |
Role | PXE Server | HA Cluster Node | HA Cluster Node | HA Cluster Node |
How to configure DRBD Cluster File System?
There are a couple of ways using which you can add DRBD to your High Availability Cluster
- Add DRBD as a background service
- Configure Master/Slave DRBD resource
- Configure highly Available DRBD Resource
I will share detailed examples for all these scenarios in this DRBD Tutorial
Method 1: Using DRBD as a background service in a pacemaker cluster
In this section you will see that using autonomous DRBD storage can look like local storage; so integrating in a Pacemaker cluster is done by pointing your mount points at DRBD.
1.1: DRBD Configuration to auto-promote
First of all, we will use the auto-promote
feature of DRBD, so that DRBD automatically sets itself Primary when needed. This will probably apply to all of your resources, so setting that a default in the common section makes sense:
Append this in your existing /etc/drbd.d/global_common.conf
, as shown below:
global {
usage-count no;
}
common {
options {
auto-promote yes;
}
net {
protocol C;
}
}
Next copy this file manually across all the cluster nodes
[root@centos8-2 ~]# scp /etc/drbd.d/global_common.conf centos8-3:/etc/drbd.d/ [root@centos8-2 ~]# scp /etc/drbd.d/global_common.conf centos8-4:/etc/drbd.d/
Execute drbdadm
adjust to refresh the drbd configuration on all the cluster nodes
# drbdadm adjust <resource>
1.2: Create DRBD Cluster File System Resource
Next we will create a heartbeat resource for DRBD Cluster File System, I have already explained the meaning of individual field of this command.
[root@centos8-2 ~]# pcs resource create fs_drbd ocf:heartbeat:Filesystem device=/dev/drbd1 directory=/share fstype=ext4
1.3: Verify DRBD Resource and Device Status
Next check the DRBD resource status
[root@centos8-2 ~]# pcs resource status ClusterIP (ocf::heartbeat:IPaddr2): Started centos8-4 fs_drbd (ocf::heartbeat:Filesystem): Started centos8-2
Our DRBD Cluster File System has successfully started on centos8-2
cluster node
On centos8-2
you can check the DRBD status, it shows "Primary" while the other two cluster nodes as secondary
[root@centos8-2 ~]# drbdadm status drbd1 drbd1 role:Primary disk:UpToDate centos8-3.example.com role:Secondary peer-disk:UpToDate centos8-4.example.com role:Secondary peer-disk:UpToDate
The DRBD device is also mounted on centso8-2
cluster node
[root@centos8-2 ~]# df -h /share/
Filesystem Size Used Avail Use% Mounted on
/dev/drbd1 2.0G 6.0M 1.9G 1% /share
1.4: Verify DRBD Failover
Next we will perform a DRBD failover to verify our DRBD Cluster File System. Since the resource was active on centos8-2
, I will make this KVM cluster node as standby
[root@centos8-2 ~]# pcs node standby centos8-2
Next check the status of the DRBD Cluster File System resource. It has started successfully on different KVM Cluster node i.e. centos8-3
. So our DRBD failover is working as expected
[root@centos8-2 ~]# pcs resource status ClusterIP (ocf::heartbeat:IPaddr2): Started centos8-4 fs_drbd (ocf::heartbeat:Filesystem): Started centos8-3
You can verify the same
[root@centos8-3 ~]# df -h /share/
Filesystem Size Used Avail Use% Mounted on
/dev/drbd1 2.0G 6.0M 1.9G 1% /share
The drbdadm
command also shows centos8-3
as "Primary"
[root@centos8-3 ~]# drbdadm status drbd1 drbd1 role:Primary disk:UpToDate centos8-2.example.com role:Secondary peer-disk:UpToDate centos8-4.example.com role:Secondary peer-disk:UpToDate
Method 2: Adding DRBD as a PCS Master/Slave cluster resource
2.1: Create DRBD Clone Resource
- Next we will add DRBD Storage as Master/Slave using Clone resource in the KVM Cluster
- We will retain the Cluster File System resource we created earlier as we would need that here for our Master/Slave configuration
- Since we have a three node KVM Cluster, I am using
clone-max
as 3, you can change these values based on your environment
pcs resource master
is deprecated and with Pacemaker 2.0 in RHEL 8 changed the name of master/slave resources to "promotable clone
" resources, and some command syntax changed as a result. For more details check: How do I create a promotable clone resource in a Pacemaker cluster?[root@centos8-2 ~]# pcs resource create web_drbd ocf:linbit:drbd drbd_resource=drbd1 promotable promoted-max=1 promoted-node-max=1 clone-max=3 clone-node-max=1 notify=true
Here we are creating a promotable resource web_drbd
using our DRBD resource. I have already explained the meaning of individual field of this command.
2.2: Verify DRBD Resource and Device Status
Check the DRBD resource status
[root@centos8-2 ~]# pcs resource status ClusterIP (ocf::heartbeat:IPaddr2): Started centos8-4 fs_drbd (ocf::heartbeat:Filesystem): Started centos8-3 Clone Set: web_drbd-clone [web_drbd] (promotable) Masters: [ centos8-3 ] Slaves: [ centos8-2 centos8-4 ]
Now we have single Master server where the web_drbd-clone
resource is running while the other two cluster nodes are Slaves
The drbdadm
shows the same status i.e. centos8-3
is our primary server
[root@centos8-2 ~]# drbdadm status drbd1 drbd1 role:Secondary disk:UpToDate centos8-3.example.com role:Primary peer-disk:UpToDate centos8-4.example.com role:Secondary peer-disk:UpToDate
To get more details of the clone resource
[root@centos8-2 ~]# pcs resource config web_drbd-clone Clone: web_drbd-clone Meta Attrs: clone-max=3 clone-node-max=1 notify=true promotable=true promoted-max=1 promoted-node-max=1 Resource: web_drbd (class=ocf provider=linbit type=drbd) Attributes: drbd_resource=drbd1 Operations: demote interval=0s timeout=90 (web_drbd-demote-interval-0s) monitor interval=20 role=Slave timeout=20 (web_drbd-monitor-interval-20) monitor interval=10 role=Master timeout=20 (web_drbd-monitor-interval-10) notify interval=0s timeout=90 (web_drbd-notify-interval-0s) promote interval=0s timeout=90 (web_drbd-promote-interval-0s) reload interval=0s timeout=30 (web_drbd-reload-interval-0s) start interval=0s timeout=240 (web_drbd-start-interval-0s) stop interval=0s timeout=100 (web_drbd-stop-interval-0s)
2.3: Configure Resource Constraint
I hope you are familiar with Resource Constraint. Now currently both DRBD Clone resource (web_drbd-clone
) and DRBD Cluster File System resource (fs_drbd
) are independent
We must make sure that both the services are linked
[root@centos8-2 ~]# pcs constraint colocation add fs_drbd with web_drbd-clone INFINITY with-rsc-role=Master
Also the DRBD clone service must start first before starting the DRBD Cluster file system service
[root@centos8-2 ~]# pcs constraint order promote web_drbd-clone then start fs_drbd Adding web_drbd-clone fs_drbd (kind: Mandatory) (Options: first-action=promote then-action=start)
Check the list of applied constraint
[root@centos8-2 ~]# pcs constraint show Location Constraints: Ordering Constraints: promote web_drbd-clone then start fs_drbd (kind:Mandatory) Colocation Constraints: fs_drbd with web_drbd-clone (score:INFINITY) (with-rsc-role:Master) Ticket Constraints:
2.4: Verify DRBD Failover
Since our DRBD Cluster File System and Clone resource is running on centos8-3
, I will make this cluster node as standby to verify DRBD Failover
[root@centos8-2 ~]# pcs node standby centos8-3
Verify the cluster resource status
[root@centos8-2 ~]# pcs resource status ClusterIP (ocf::heartbeat:IPaddr2): Started centos8-4 fs_drbd (ocf::heartbeat:Filesystem): Started centos8-2 Clone Set: web_drbd-clone [web_drbd] (promotable) Masters: [ centos8-2 ] Slaves: [ centos8-4 ] Stopped: [ centos8-3 ]
So as expected our DRBD resource have migrated to centso8-2
cluster node while centos8-3
has been marked as "Stopped"
You can verify these resource status also using drbdadm
. So our DRBD failover is working as expected.
[root@centos8-2 ~]# drbdadm status drbd1 drbd1 role:Primary disk:UpToDate centos8-3.example.com connection:Connecting centos8-4.example.com role:Secondary peer-disk:UpToDate
Since our centos8-3
is marked as stopped, hence drbdadm
is unable to connect to that cluster node
Once our KVM Cluster node centos8-3
becomes active, the drbdadm
will again re-establish the connection.
I will change the status of centos8-3
to active
[root@centos8-2 ~]# pcs node unstandby centos8-3
We changed the status of our cluster node centos8-3
to active and we see that drbdadm
has successfully re-established the connection with centos8-3
[root@centos8-2 ~]# drbdadm status drbd1 drbd1 role:Primary disk:UpToDate centos8-3.example.com role:Secondary peer-disk:UpToDate centos8-4.example.com role:Secondary peer-disk:UpToDate
Method 3: Adding DRBD as highly available resource
- Clone resources in a High Availability pacemaker cluster are those that can run on multiple nodes, usually on all of them, simultaneously.
- This can be useful for starting daemons like
dlm_controld
(via acontrold
resource), orclvmd
andcmirrord
(via aclvm
resource), that are needed by other highly available or load-balanced resources. - Although for DRBD Cluster, this will not be very useful because at one point of time the resource would be active only on one KVM Cluster node
I will delete the existing DRBD resource to demonstrate this example
[root@centos8-2 ~]# pcs resource delete fs_drbd Attempting to stop: fs_drbd... Stopped [root@centos8-2 ~]# pcs resource delete web_drbd Attempting to stop: web_drbd... Stopped
3.1: Create DRBD Clone resource
In this example I have added single master node with maximum 3 clone nodes because we have a 3 node KVM HA Cluster. The resource name would be web-drbd
To create a drbd clone
resource. I have already explained the meaning of individual field of this command.
[root@centos8-2 ~]# pcs resource create web_drbd ocf:linbit:drbd drbd_resource=drbd1 clone master-max=1 master-node-max=1 clone-max=3 clone-node-max=1 notify=true
Next we will create DRBD Cluster File System as we did in our earlier examples
[root@centos8-2 ~]# pcs resource create fs_drbd ocf:heartbeat:Filesystem device=/dev/drbd1 directory=/share fstype=ext4
3.2: Verify DRBD Resource and Device Status
Check the Cluster resource status to make sure both the resource have started successfully
[root@centos8-2 ~]# pcs resource status ClusterIP (ocf::heartbeat:IPaddr2): Started centos8-4 Clone Set: web_drbd-clone [web_drbd] Started: [ centos8-2 centos8-3 centos8-4 ] fs_drbd (ocf::heartbeat:Filesystem): Started centos8-2
Since the DRBD Cluster File System is running on centos8-2
, this will considered as Primary
[root@centos8-2 ~]# drbdadm status drbd1 drbd1 role:Primary disk:UpToDate centos8-3.example.com role:Secondary peer-disk:UpToDate centos8-4.example.com role:Secondary peer-disk:UpToDate
3.3: Configure Resource Constraint
I hope you are familiar with Resource Constraint. We will configure constraint same as we did for Master/Slave configuration
We will link both the cluster resource to make sure they are started on the same cluster node
[root@centos8-2 ~]# pcs constraint colocation add fs_drbd with web_drbd-clone
The clone resource must be prioritized over the file system resource
[root@centos8-2 ~]# pcs constraint order start web_drbd-clone then fs_drbd Adding web_drbd-clone fs_drbd (kind: Mandatory) (Options: first-action=start then-action=start)
Check the applied constraint rules
[root@centos8-2 ~]# pcs constraint Location Constraints: Ordering Constraints: start web_drbd-clone then start fs_drbd (kind:Mandatory) Colocation Constraints: fs_drbd with web_drbd-clone (score:INFINITY) Ticket Constraints:
3.4: Verify DRBD Failover
The Cluster resources have started successfully, so we must also verify DRBD failover scenario to make sure the resources are highly available
Since our Cluster resource is running on centos8-2
, we will make it as standby
[root@centos8-2 ~]# pcs node standby centos8-2
Next check the cluster resource status. As expected our DRBD resource has switched to a different KVM Cluster node and is started on centos8-3
[root@centos8-2 ~]# pcs resource status ClusterIP (ocf::heartbeat:IPaddr2): Started centos8-4 Clone Set: web_drbd-clone [web_drbd] Started: [ centos8-3 centos8-4 ] Stopped: [ centos8-2 ] fs_drbd (ocf::heartbeat:Filesystem): Started centos8-3
Also the state of centos8-2
is marked as "Stopped"
Check the status of drbd1
device using drbdadm
. Here also we see that drbd1
considers centos8-3
as Primary while it is unable to connect to centos8-2
[root@centos8-3 ~]# drbdadm status drbd1 drbd1 role:Primary disk:UpToDate centos8-2.example.com connection:Connecting centos8-4.example.com role:Secondary peer-disk:UpToDate
Let us brink back our standby cluster node to active state
[root@centos8-2 ~]# pcs node unstandby centos8-2
Next check the drbdadm
status for drbd1
device. Once the cluster node is active, drbd1
device was able to re-connect to the respective cluster node. hence the DRBD failover is working as expected.
[root@centos8-2 ~]# drbdadm status drbd1 drbd1 role:Secondary disk:UpToDate centos8-3.example.com role:Primary peer-disk:UpToDate centos8-4.example.com role:Secondary peer-disk:UpToDate
We can also check that all our cluster nodes are considered for the DRBD resource
[root@centos8-2 ~]# pcs resource status ClusterIP (ocf::heartbeat:IPaddr2): Started centos8-4 Clone Set: web_drbd-clone [web_drbd] Started: [ centos8-2 centos8-3 centos8-4 ] fs_drbd (ocf::heartbeat:Filesystem): Started centos8-3
So now we have a working DRBD Cluster File System using three different methods
Lastly I hope the steps from the article to configure KVM DRBD Cluster File System using Pacemaker 2.0 on RHEL/CentOS 8 Linux was helpful. So, let me know your suggestions and feedback using the comment section.
References:
Red Hat: How can I make my highly available resources dependent upon clone resources in RHEL 7 and RHEL 8 with pacemaker?
Red Hat: How do I create a promotable clone resource in a Pacemaker cluster?
At least one trivial document that explain the things directly and promptly.
Of course, as it says, one have to be friendly with Pacemaker clustering, specially resource, constraint, and clone.
My advice (as it was for me) : be familiar with a two nodes Load Balancing cluster, and this tutorial runs like a charm !
is fenceing required to be configured? i am using XCP-NG. i tried putting 1 node where DRBD is installed to standby but not getting changed to other node. HA is working.
Hi,
I tried the first method, but the auto-promote mecanism seems not working properly.
I work with a 2-nodes cluster.
When I put my node one to standby, I get this message on the node two:
kernel: drbd drbd1: Auto-promote failed: Multiple primaries not allowed by config
So the node one remain primary on drdb, and the FS can’t be mount on node 2.
Any ideas?
Paul
The problem is most likely with pacemaker cluster. You can check this article to configure 2 node cluster. I would start by looking at the cluster configuration and logs.