This is a two part article, here I will share the steps to configure OpenStack High Availability (HA) between two controllers. In the second part I will share the steps to configure HAProxy and move keystone service endpoints to loadbalancer. By default if your bring up a controller and compute node using tripleo configuration then the controllers will by default get configured via pacemaker cluster. But if you are manually bringing up your openstack setup using packstack or devstack or by manually creating all the database and services then you will have to manually configure cluster between the controllers to configure OpenStack High Availability (HA).
Configure OpenStack High Availability (HA)
For the sake of this article I brought up two controller nodes using packstack on two different virtual machines using CentOS 7 running on Oracle VirtualBox installed on my Linux Server. After the successful completion of packstack you will observe keystonerc_admin
file in the home folder of the root user.
Installing the Pacemaker resource manager
Since we will configure OpenStack High Availability using pacemaker and corosync, first of all we need to install all the rpms required for the cluster setup. So we will install Pacemaker to manage the VIPs that we will use with HAProxy to make the web services highly available.
So install pacemaker on all the controller nodes
[root@controller2 ~]# yum install -y pcs fence-agents-all [root@controller1 ~]# yum install -y pcs fence-agents-all
Verify that the software installed correctly by running the following command:
[root@controller1 ~]# rpm -q pcs pcs-0.9.162-5.el7.centos.2.x86_64 [root@controller2 ~]# rpm -q pcs pcs-0.9.162-5.el7.centos.2.x86_64
Next, add rules to the firewall to allow cluster traffic:
[root@controller1 ~]# firewall-cmd --permanent --add-service=high-availability success [root@controller1 ~]# firewall-cmd --reload success [root@controller2 ~]# firewall-cmd --permanent --add-service=high-availability success [root@controller2 ~]# firewall-cmd --reload success
If you run into any problems during testing, you might want to disable the firewall and SELinux entirely until you have everything working. This may create significant security issues and should not be performed on machines that will be exposed to the outside world, but may be appropriate during development and testing on a protected host.
The installed packages will create a hacluster
user with a disabled password. While this is fine for running pcs
commands locally, the account needs a login password in order to perform such tasks as syncing the corosync
configuration, or starting and stopping the cluster on other nodes.
Set the password for the Pacemaker cluster on each controller node using the following command:
[root@controller1 ~]# passwd hacluster Changing password for user hacluster. New password: Retype new password: passwd: all authentication tokens updated successfully. [root@controller2 ~]# passwd hacluster Changing password for user hacluster. New password: Retype new password: passwd: all authentication tokens updated successfully.
Start the Pacemaker cluster manager on each node:
[root@controller1 ~]# systemctl start pcsd.service [root@controller1 ~]# systemctl enable pcsd.service Created symlink from /etc/systemd/system/multi-user.target.wants/pcsd.service to /usr/lib/systemd/system/pcsd.service. [root@controller2 ~]# systemctl start pcsd.service [root@controller2 ~]# systemctl enable pcsd.service Created symlink from /etc/systemd/system/multi-user.target.wants/pcsd.service to /usr/lib/systemd/system/pcsd.service.
Configure Corosync
To configure Openstack High Availability we need to configure corosync on both the nodes, use pcs cluster auth
to authenticate as the hacluster
user:
[root@controller1 ~]# pcs cluster auth controller1 controller2 Username: hacluster Password: controller2: Authorized controller1: Authorized [root@controller2 ~]# pcs cluster auth controller1 controller2 Username: hacluster Password: controller2: Authorized controller1: Authorized
Finally, run the following commands on the first node to create the cluster and start it. Here our cluster name will be openstack
[root@controller1 ~]# pcs cluster setup --start --name openstack controller1 controller2
Destroying cluster on nodes: controller1, controller2...
controller1: Stopping Cluster (pacemaker)...
controller2: Stopping Cluster (pacemaker)...
controller1: Successfully destroyed cluster
controller2: Successfully destroyed cluster
Sending 'pacemaker_remote authkey' to 'controller1', 'controller2'
controller1: successful distribution of the file 'pacemaker_remote authkey'
controller2: successful distribution of the file 'pacemaker_remote authkey'
Sending cluster config files to the nodes...
controller1: Succeeded
controller2: Succeeded
Starting cluster on nodes: controller1, controller2...
controller1: Starting Cluster...
controller2: Starting Cluster...
Synchronizing pcsd certificates on nodes controller1, controller2...
controller2: Success
controller1: Success
Restarting pcsd on the nodes in order to reload the certificates...
controller2: Success
controller1: Success
Enable the pacemaker
and corosync
services on both the controller so they can automatically start on boot
[root@controller1 ~]# systemctl enable pacemaker Created symlink from /etc/systemd/system/multi-user.target.wants/pacemaker.service to /usr/lib/systemd/system/pacemaker.service. [root@controller1 ~]# systemctl enable corosync Created symlink from /etc/systemd/system/multi-user.target.wants/corosync.service to /usr/lib/systemd/system/corosync.service. [root@controller2 ~]# systemctl enable corosync Created symlink from /etc/systemd/system/multi-user.target.wants/corosync.service to /usr/lib/systemd/system/corosync.service. [root@controller2 ~]# systemctl enable pacemaker Created symlink from /etc/systemd/system/multi-user.target.wants/pacemaker.service to /usr/lib/systemd/system/pacemaker.service.
Validate cluster using pacemaker
Verify that the cluster started successfully using the following command on both the nodes:
[root@controller1 ~]# pcs status Cluster name: openstack Stack: corosync Current DC: controller2 (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum Last updated: Tue Oct 16 11:51:13 2018 Last change: Tue Oct 16 11:50:51 2018 by root via cibadmin on controller1 2 nodes configured 0 resources configured Online: [ controller1 controller2 ] No resources Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled
[root@controller2 ~]# pcs status Cluster name: openstack WARNING: no stonith devices and stonith-enabled is not false Stack: corosync Current DC: controller2 (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum Last updated: Mon Oct 15 17:04:29 2018 Last change: Mon Oct 15 16:49:09 2018 by hacluster via crmd on controller2 2 nodes configured 0 resources configured Online: [ controller1 controller2 ] No resources Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled
How to start the Cluster
Now that corosync
is configured, it is time to start the cluster. The command below will start corosync and pacemaker on both nodes in the cluster. If you are issuing the start command from a different node than the one you ran the pcs cluster auth
command on earlier, you must authenticate on the current node you are logged into before you will be allowed to start the cluster.
[root@controller1 ~]# pcs cluster start --all
An alternative to using the pcs cluster start --all
command is to issue either of the below command sequences on each node in the cluster separately:
[root@controller1 ~]# pcs cluster start
Starting Cluster...
or
[root@controller1 ~]# systemctl start corosync.service [root@controller1 ~]# systemctl start pacemaker.service
Verify Corosync Installation
First, use corosync-cfgtool
to check whether cluster communication is happy:
[root@controller2 ~]# corosync-cfgtool -s
Printing ring status.
Local node ID 2
RING ID 0
id = 192.168.122.22
status = ring 0 active with no faults
So all looks normal with our fixed IP address (not a 127.0.0.x loopback address) listed as the id, and no faults for the status.
If you see something different, you might want to start by checking the node’s network, firewall and SELinux configurations.
Next, check the membership and quorum APIs:
[root@controller2 ~]# corosync-cmapctl | grep members
runtime.totem.pg.mrp.srp.members.1.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(192.168.122.20)
runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.1.status (str) = joined
runtime.totem.pg.mrp.srp.members.2.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.2.ip (str) = r(0) ip(192.168.122.22)
runtime.totem.pg.mrp.srp.members.2.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.2.status (str) = joined
Check the status of corosync service
[root@controller2 ~]# pcs status corosync
Membership information
----------------------
Nodeid Votes Name
1 1 controller1
2 1 controller2 (local)
You should see both nodes have joined the cluster.
Repeat the same steps on both the controller to validate the corosync services
Verify the cluster configuration
Before we make any changes, it’s a good idea to check the validity of the configuration.
[root@controller1 ~]# crm_verify -L -V error: unpack_resources: Resource start-up disabled since no STONITH resources have been defined error: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option error: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity Errors found during check: config not valid
As you can see, the tool has found some errors.
In order to guarantee the safety of your data, [5] fencing (also called STONITH
) is enabled by default. However, it also knows when no STONITH configuration has been supplied and reports this as a problem (since the cluster will not be able to make progress if a situation requiring node fencing arises).
We will disable this feature for now and configure it later. To disable STONITH, set the stonith-enabled
cluster option to false on both the controller nodes:
[root@controller1 ~]# pcs property set stonith-enabled=false [root@controller1 ~]# crm_verify -L [root@controller2 ~]# pcs property set stonith-enabled=false [root@controller2 ~]# crm_verify -L With the new cluster option set, the configuration is now valid.
stonith-enabled=false
is completely inappropriate for a production cluster. It tells the cluster to simply pretend that the nodes which fails are safely in powered off state. Some vendors will refuse to support clusters that have STONITH disabled.
I will continue this article i.e to configure OpenStack High Availability in separate part. In the next part I will share the steps to configure HAProxy and we will manage it as a resource. Also the detail steps to move OpenStack API endpoints behind the cluster load balancer.