Before starting with GFS2 file system setup on Red Hat or CentOS cluster, you must be familiar with

 

→ What is Cluster, it’s architecture and types ?

What is Cluster resource and constraint ?

→ How to setup a Red Hat or CentOS 7 Cluster ?

→ If you only have two nodes in your cluster then you need to follow some additional steps to setup two node cluster.

→ If your requirement is to share ext4 or xfs based file system then you can also share LVM across clusters without GFS2 file system.

→ GFS2 file system requires shared storage so if not available you must manually create a shared storage using iscsi target (targetcli) on RHEL or CentOS Linux machine.

 

How to set up GFS2 with clustering on Linux ( RHEL / CentOS 7 )

 

I had written a very old article to setup a cluster using GFS2 file system on RHEL 6 but those steps are not valid for RHEL / CentOS 7.

 

I am using Oracle Virtual Box for the demonstration of this article, running on a Windows 10 laptop. I had configured my shared storage using iscsi target (targetcli) in my previous article, so I will use the same storage target on my cluster setup. You can follow my old articles if you do not have a cluster setup ready with you.

In this article we will create multiple cluster resource and order the resource start up sequence using constraint. As it is very important that these resources start up in a pre-defined order or else they will fail to start up.

So let us start with the steps to configure GFS2 file system on Red Hat or CentOS 7 Cluster

 

Why do we need cluster filesystem?

  • In some cases, it makes sense to use a cluster-aware file system.
  • The purpose of a cluster-aware file system is to allow multiple nodes to write to the file system simultaneously.
  • The default cluster-aware file system on the SUSE Linux Enterprise Server is OCFS2, and on Red Hat, it is Global File System (GFS) 2.
  • The file system is doing this by synchronizing caches between the nodes that have the filesystem resource running immediately, which means that every node always has the actual state of exactly what is happening on the file system.
  • Typically, you’ll need them in active/active scenarios, where multiple instances of the same resource are running on multiple nodes and are all active.
  • You don’t have to create a cluster file system, if you only want to run one instance of a resource at the same time.

 

Any disadvantage of using cluster filesystem?

Apart from the benefits, there are also disadvantages to using cluster file system. The most important disadvantage is that the cache has to be synchronized between all nodes involved. This makes a cluster file system slower than a stand-alone file system, in many cases, especially those that involve a lot of metadata operations. Because they also provide much stronger coupling between the nodes, it becomes harder for the cluster to prevent faults from spreading.

It is often believed that a cluster file system provides an advantage over failover times, as compared to a local node file system, because it is already mounted. However, this is not true; the file system is still paused until fencing/STONITH and journal recovery for the failed node have completed. This will freeze the clustered file system on all nodes. It is actually a set of independent local file systems that provides higher availability! Clustered file systems should be used where they are required, but only after careful planning.

 

Pre-requisities to setup GFS2 file system

Below are the mandatory requirement on your cluster, before you start working on GFS2 file system

  • CLVM (Clustered Logical Volume manager)
  • DLM (Distributed Lock Manager)

 

It is important that your cluster setup is configured with fencing/STONITH.

We have enabled fencing here on our cluster. You can enable it using “pcs property set stonith-enabled=true

[[email protected] ~]# pcs property show
Cluster Properties:
 cluster-infrastructure: corosync
 cluster-name: mycluster
 dc-version: 1.1.18-11.el7_5.3-2b07d5c5a9
 have-watchdog: false
 last-lrm-refresh: 1546059766
 no-quorum-policy: freeze
 stonith-enabled: true

Below you can see the cluster status, here I have three fencing devices configured

[[email protected] ~]# pcs status
Cluster name: mycluster
Stack: corosync
Current DC: node1.example.com (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum
Last updated: Sat Dec 29 10:33:16 2018
Last change: Sat Dec 29 10:33:01 2018 by root via cibadmin on node1.example.com

3 nodes configured
3 resources configured

Online: [ node1.example.com node2.example.com node3.example.com ]

Full list of resources:
 fence-vm1      (stonith:fence_xvm):    Started node2.example.com
 fence-vm2      (stonith:fence_xvm):    Started node1.example.com
 fence-vm3      (stonith:fence_xvm):    Started node3.example.com

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[[email protected] ~]# pcs stonith show
 fence-vm1      (stonith:fence_xvm):    Started node2.example.com
 fence-vm2      (stonith:fence_xvm):    Started node2.example.com
 fence-vm3      (stonith:fence_xvm):    Started node2.example.com

Install gfs2-utils, lvm2-cluster, dlm on all your cluster nodes if not already installed

# yum -y install gfs2-utils lvm2-cluster dlm

Change the pcs property to no-quorum-policy to freeze. This property is necessary because it means that cluster nodes will do nothing after losing quorum, and this is required for GFS2

# pcs property set no-quorum-policy=freeze

If you would leave the default setting of stop, mounted GFS2 file system cannot use the cluster to properly stop, which will result in fencing of the entire cluster.

 

Configure DLM Resource

The Distribute Block Manager, also known as controld is a mandatory part of the cluster. If, after starting, it fails a monitor test, then the nodes on which it fails need to be fenced, and that is to keep the cluster clean. And that is kind of necessary to make sure that no bad things will happen related to no-quorum policy, which is set to freeze.

NOTE:
As with the GFS2 file system itself, these resources have to be started on all nodes that require access to the file system. Pacemaker provides the clone resource for this purpose. Clone resorts can be applied for any resources that have to be activated on multiple nodes simultaneously.
[[email protected] ~]# pcs resource create dlm ocf:pacemaker:controld op monitor interval=30s on-fail=fence clone interleave=true ordered=true

Check the pcs cluster status

[[email protected] ~]# pcs status
Cluster name: mycluster
Stack: corosync
Current DC: node1.example.com (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum
Last updated: Sat Dec 29 10:57:58 2018
Last change: Sat Dec 29 10:57:52 2018 by root via cibadmin on node1.example.com

3 nodes configured
6 resources configured

Online: [ node1.example.com node2.example.com node3.example.com ]

Full list of resources:

 Clone Set: dlm-clone [dlm]
     Started: [ node1.example.com node2.example.com node3.example.com ]
 fence-vm1      (stonith:fence_xvm):    Started node2.example.com
 fence-vm2      (stonith:fence_xvm):    Started node2.example.com
 fence-vm3      (stonith:fence_xvm):    Started node2.example.com

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

So our dlm and dlm-clone resource have started properly on all our cluster nodes.

 

Configure CLVMD resource

  • If multiple nodes of the cluster require simultaneous read/write access to LVM volumes in an active/active system, then you must use CLVMD.
  • CLVMD provides a system for coordinating activation of and changes to LVM volumes across nodes of a cluster concurrently.
  • CLVMD’s clustered-locking service provides protection to LVM metadata as various nodes of the cluster interact with volumes and make changes to their layout.

To enable clustered-locking set locking_type=3 in lvm.conf

[[email protected] ~]# grep locking_type /etc/lvm/lvm.conf | egrep -v '#'
    locking_type = 3
IMPORTANT NOTE:
This is the reason halvm and clvm are not compatible for that reason, as HALVM requires locking_type as 1 while CLVMD requires

locking_type as 3

You can dynamically change this by using the below command

# lvmconf --enable-cluster

Disable and stop lvm2-lvmetad service

# systemctl disable lvm2-lvmetad --now

Next create clvmd resource

[[email protected] ~]# pcs resource create clvmd ocf:heartbeat:clvm op monitor interval=30s on-fail=fence clone interleave=true ordered=true

validate the resource status

[[email protected] ~]# pcs status
Cluster name: mycluster
Stack: corosync
Current DC: node1.example.com (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum
Last updated: Sat Dec 29 10:57:58 2018
Last change: Sat Dec 29 10:57:52 2018 by root via cibadmin on node1.example.com

3 nodes configured
9 resources configured

Online: [ node1.example.com node2.example.com node3.example.com ]

Full list of resources:

 Clone Set: dlm-clone [dlm]
     Started: [ node1.example.com node2.example.com node3.example.com ]
 Clone Set: clvmd-clone [clvmd]
     Started: [ node1.example.com node2.example.com node3.example.com ]
 fence-vm1      (stonith:fence_xvm):    Started node2.example.com
 fence-vm2      (stonith:fence_xvm):    Started node2.example.com
 fence-vm3      (stonith:fence_xvm):    Started node2.example.com

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

 

Change resource start up order

Now we need colocation constraint as well. This colocation constraint, make sure that clvmd clone is always kept together with dlm clone.

[[email protected] ~]# pcs constraint order start dlm-clone then clvmd-clone
Adding dlm-clone clvmd-clone (kind: Mandatory) (Options: first-action=start then-action=start)
[[email protected] ~]# pcs constraint colocation add clvmd-clone with dlm-clone

 

Set up shared storage on cluster nodes

From my previous article I am using iscsi target on all of my cluster nodes, which I will use to setup my cluster file system (GFS2).

So after connecting to my storage node, I have /dev/sdc available on all my cluster nodes.

[[email protected] ~]# ls -l /dev/sd*
brw-rw---- 1 root disk 8,  0 Dec 29 09:47 /dev/sda
brw-rw---- 1 root disk 8,  1 Dec 29 09:47 /dev/sda1
brw-rw---- 1 root disk 8,  2 Dec 29 09:47 /dev/sda2
brw-rw---- 1 root disk 8, 16 Dec 29 09:47 /dev/sdb
brw-rw---- 1 root disk 8, 17 Dec 29 09:47 /dev/sdb1
brw-rw---- 1 root disk 8, 32 Dec 29 10:30 /dev/sdc

I will set up logical volume on /dev/sdc on one of my cluster nodes. The same configuration will automatically get synced to all other cluster nodes

[[email protected] ~]# pvcreate /dev/sdc
  Physical volume "/dev/sdc" successfully created.
[[email protected] ~]# vgcreate -Ay -cy --shared vgclvm /dev/sdc
  Clustered volume group "vgclvm" successfully created

Here

  • -A|–autobackup y|n : Specifies if metadata should be backed up automatically after a change.
  • -c|–clustered y|n : Create a clustered VG using clvmd if LVM is compiled with cluster support. This allows multiple hosts to share a VG on shared devices. clvmd and a lock manager must be configured and running.

Display the available volume groups

[[email protected] ~]# vgs
  VG     #PV #LV #SN Attr   VSize   VFree
  centos   2   2   0 wz--n- <17.52g 1020.00m
  vgclvm   1   0   0 wz--nc 992.00m  992.00m

Create new logical volume using our shared volume group

[[email protected] ~]# lvcreate -l 100%FREE -n lvcluster vgclvm
  Logical volume "lvcluster" created.

Create a GFS2 file system on our logical volume.

[[email protected] ~]# mkfs.gfs2 -j3 -p lock_dlm -t mycluster:gfs2fs /dev/vgclvm/lvcluster
/dev/vgclvm/lvcluster is a symbolic link to /dev/dm-2
This will destroy any data on /dev/dm-2
Are you sure you want to proceed? [y/n] y
Discarding device contents (may take a while on large devices): Done
Adding journals: Done
Building resource groups: Done

Creating quota file: Done
Writing superblock and syncing: Done
Device:                    /dev/vgclvm/lvcluster
Block size:                4096
Device size:               0.97 GB (253952 blocks)
Filesystem size:           0.97 GB (253951 blocks)
Journals:                  3
Journal size:              8MB
Resource groups:           7
Locking protocol:          "lock_dlm"
Lock table:                "mycluster:gfs2fs"
UUID:                      da1e5aa6-51a3-4512-ba79-3e325455007e

Here

  • -t clustername:fsname : is used to specify the name of the locking table
  • -j nn : specifies how many journals(nodes) are used
  • -J : allows specification of the journal size. if not specified, a journal has a default size of 128 MB. Minimal size is 8 MB (NOT recommended)
NOTE:
In the command, clustername must be the pacemaker cluster name as I have used mycluster which is my cluster name.

 

Create mount point and validate

Now our logical volume is created successfully. next let us create mount point for our filesystem

NOTE:
Manually create this mount point on all the cluster nodes
# mkdir /clusterfs

 

Before we create a resource for GFS2, let us manually try to validate if our filesystem on lvcluster is working properly.

[[email protected] ~]# mount /dev/vgclvm/lvcluster /clusterfs/

Validate the same

[[email protected] ~]# mount | grep clusterfs
/dev/mapper/vgclvm-lvcluster on /clusterfs type gfs2 (rw,noatime)

So looks like the lvm got mounted successfully.

 

Create GFS2FS cluster resource

Now we can create a resource for gfs2fs for our GFS2 file system.

[[email protected] ~]# pcs resource create gfs2fs Filesystem device="/dev/vgclvm/lvcluster" directory="/clusterfs" fstype=gfs2 options=noatime op monitor interval=10s on-fail=fence clone interleave=true
Assumed agent name 'ocf:heartbeat:Filesystem' (deduced from 'Filesystem')

Validate the cluster status

[[email protected] ~]# pcs status
Cluster name: mycluster
Stack: corosync
Current DC: node1.example.com (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum
Last updated: Sat Dec 29 10:58:08 2018
Last change: Sat Dec 29 10:57:52 2018 by root via cibadmin on node1.example.com

3 nodes configured
12 resources configured

Online: [ node1.example.com node2.example.com node3.example.com ]

Full list of resources:

 Clone Set: dlm-clone [dlm]
     Started: [ node1.example.com node2.example.com node3.example.com ]
 Clone Set: clvmd-clone [clvmd]
     Started: [ node1.example.com node2.example.com node3.example.com ]
 fence-vm1      (stonith:fence_xvm):    Started node2.example.com
 fence-vm2      (stonith:fence_xvm):    Started node2.example.com
 fence-vm3      (stonith:fence_xvm):    Started node2.example.com
 Clone Set: gfs2fs-clone [gfs2fs]
     Started: [ node1.example.com node2.example.com node3.example.com ]

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

So our gfs2fs service is started automatically on all our cluster nodes.

 

Now arrange the resource start-up order for GFS2 and CLVMD so that after a node reboot the services are started in proper order or else they will fail to start

[[email protected] ~]# pcs constraint order start clvmd-clone then gfs2fs-clone
Adding clvmd-clone gfs2fs-clone (kind: Mandatory) (Options: first-action=start then-action=start)

[[email protected] ~]# pcs constraint colocation add gfs2fs-clone with clvmd-clone

 

Validate our Cluster with GFS2 file system

Now since our resource/service is running properly on our cluster nodes. Let us create a file on one of our cluster node.

[[email protected] ~]# cd /clusterfs/
[[email protected] clusterfs]# touch file

Now connect to any other cluster node, and this file should exist there as well

[[email protected] ~]# ls /clusterfs/
file

So our Cluster with GFS2 file system configuration is working as expected.

 

2 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *