Command Center: A HA Two-Node Server Side Cluster Using Glusterfs and CentOS

Friday, January 22, 2010

A HA Two-Node Server Side Cluster Using Glusterfs and CentOS

Category: DRBD, GlusterFS, High Availability, Linux-HA — SkyHi @ Friday, January 22, 2010

I came across Glusterfs the other day. On the surface it seemed similar to DRBD but after closer examination I did realize it was completely different. After some reading I came to realize it may offer benefits over DRDB and GNBD and it seemed extremely straight forward to implement so I decided to test a two-node cluster and run Apache on it.
Network Setup

The logical network setup is basically a two-node cluster using the server side replication capability (as opposite as the client side replication). In this fashion the client(s) which will mount the exported mount points only need to worry about serving the data trough an application, Apache in this case.

The HA will be achieved by using round-robin DNS (RRDNS) from the client to servers, that is when the client issues a request to the servers it will do so using a FQDN for the cluster, if one of the server nodes is down it will switch to the other node. When the failed node comes back the self healing properties of Glusterfs will ensure that data is replicated from the remaining node.

The nodes will be configured as follow:

* Node 0: 10.0.0.1 (client)
* Node 1: 10.0.0.2
* Node 2: 10.0.0.3
* Node 1 and node 2 will have /ha exported
* The client will mount /ha

复制代码

For my example I only have 1 client but this setup is easily extendable to 2 or more clients as well as two or more servers. My setup is of course nothing new is based on several examples at the Glusterfs-Wiki, I took a couple of them and redid the configuration to suite my test needs. In particular you should look at;

* High-availability storage using server-side AFR by Daniel Maher
* Setting up AFR on two servers with server side replication by Brandon Lamb

Prep Work

Glusterfs has some dependencies:

1. Infiniband support if you use an infiniband network
2. Berkeley DB support
3. Extended attributes support for the backend system (exported filesystem) ext3 in our case.
4. FUSE is the primary requirement for Glusterfs.

I will be using CentOS 5.2, the first 3 requirements do come standard with it, Fuse does not so you need to install it.

Install FUSE

Fuse is not available from the standard repository so you need to get it from RPMFORGE.

Follow the instructions and install support for the repository and YUM. Make sure to also configure YUM to use the priorities plug-in. This makes sure that the standard repositories are used before the RPMFORGE repositories if you use automatic updates or if you want to install an update for a particular package without breaking anything on your system.

When you have the repository support installed issue at a command prompt the following:

# yum -y install fuse fuse-devel

复制代码

The command above will install fuse and libraries plus any other packages needed by it.

Install the FUSE Kernel Module

Make sure you have the kernel-devel package for your kernel. At a prompt issue:

# yum info kernel-devel

复制代码

You should see something like this:

Installed Packages

Name : kernel-devel
Arch : i686
Version : 2.6.18
Release : 92.el5
Size : 15 M
Repo : installed
Summary : Development package for building kernel modules to match the kernel.
Description : This package provides kernel headers and makefiles sufficient to build
: modules against the kernel package.

If it says “installed” then you are ok otherwise you need to install it. Issue:

yum -y install kernel-devel-2.6.18-92.el5 kernel-headers-2.6.18-92.el5 dkms-fuse

The command above will install the kernel source and install the source for the Fuse kernel module.

Change directories to /usr/src/fuse-2.7.4-1.nodist.rf and issue: ./configure; make install;

复制代码

This will install the fuse.ko kernel module. Finally do a chkconfig fuse on to enable fuse support at boot up. Start the fuse service: service fuse start.

Repeat the above procedure on all 3 nodes (the client and the 2 servers).

Install Glusterfs

Download the latest here (1.3.9). If you are using a 64 bit architecture get the corresponding RPMs otherwise you will have to compile the RPMs from the SRPM using the following command: rpmbuild –rebuild glusterfs-1.3.9-1.src.rpm, this will create the following in /usr/src/redhat/RPMS/i386:

* glusterfs-1.3.9-1.i386.rpm
* glusterfs-devel-1.3.9-1.i386.rpm
* glusterfs-debuginfo-1.3.9-1.i386.rpm

Copy the files to the other nodes and install the RPMs with: rpm -ivh glusterfs*. Verify the installation was successful by issuing:

# glusterfs --version

复制代码

No errors should be reported.

Round-Robin DNS

A key component of the HA setup is RRDNS. Though it is used only in one instance, it is a critical function - one which helps to ensure that the data can be served continuously even in the event that one of the storage servers becomes inaccessible.

Normally in a standard configuration a client will access the servers via their IP addresses. The major drawback if in this setup is that if the server becomes inaccessible the client will unable to access the data. This can be mitigated by using a hostname rather than addresses to access the servers.

Consider the following:

$ host node1.mycluster.com
node1.mycluster.com has address10.0.0.2
$ host node2.mycluster.com
node2.mycluster.com has address 10.0.0.3
$ host cluster.mycluster.com
cluster.mycluster.com has address 192.168.0.110
cluster.mycluster.com has address 192.168.0.111
$ dig cluster.mycluster.com | grep -A 2 "ANSWER SECTION"
;; ANSWER SECTION:
cluster.mycluster.com. 3600 IN A 10.0.0.2
cluster.mycluster.com. 3600 IN A 10.0.0.3

So you need to configure the zone file for the mycluster.com zone to serve the corresponding records for all the server nodes and for the FQDN of the cluster. Configuration and setup of such DNS zone is well documented on the Internet so it will be left as an exercise for the reader.
Configure Glusterfs

Now that all the prep work, RPMS and RRDNS is in place we are ready to configure the cluster. The main key in the setup is the “AFR translator”; this is the mechanism that ensures data (“subvolumes”) is replicated between servers.

The reader is encouraged to visit the Gluster-Wiki and go over the fundamentals of Glusterfs and in particular over the performance options used in this setup (readahead, writeback, cache-size, etc).

Node 1

The following is the configuration of node 1:

[root@node1 ~]# more /etc/glusterfs/glusterfs-server.vol
# Dataspce on Node1
volume gfs-ds
type storage/posix
option directory /ha
end-volume
# posix locks
volume gfs-ds-locks
type features/posix-locks
subvolumes gfs-ds
end-volume
# Dataspace on Node2
volume gfs-node2-ds
type protocol/client
option transport-type tcp/client
option remote-host 10.0.0.3 # IP address of node2
option remote-subvolume gfs-ds-locks
option transport-timeout 5 # value in seconds
end-volume
# automatic file replication translator for dataspace
volume gfs-ds-afr
type cluster/afr
subvolumes gfs-ds-locks gfs-node2-ds
end-volume
# the actual exported volume
volume gfs
type performance/io-threads
option thread-count 8
option cache-size 64MB
subvolumes gfs-ds-afr
end-volume
volume server
type protocol/server
option transport-type tcp/server
subvolumes gfs
option auth.ip.gfs-ds- locks.allow 10.0.0.*,127.0.0.1
option auth.ip.gfs.allow 10.0.0.*,127.0.0.1
end-volume

复制代码

Node 2

The configuration of node 2:

[root@node2 ~]# more /etc/glusterfs/glusterfs-server.vol
# Dataspace on Node2
volume gfs-ds
type storage/posix
option directory /ha
end-volume
# posix locks
volume gfs-ds-locks
type features/posix-locks
subvolumes gfs-ds
end-volume
# Dataspace on Node1
volume gfs-node1-ds
type protocol/client
option transport-type tcp/client
option remote-host 10.0.0.2 # IP address of server1
option remote-subvolume gfs-ds-locks
option transport-timeout 5 # value in seconds
end-volume
# automatic file replication translator for dataspace
volume gfs-ds-afr
type cluster/afr
subvolumes gfs-ds-locks gfs-node1-ds
end-volume
# the actual exported volume
volume gfs
type performance/io-threads
option thread-count 8
option cache-size 64MB
subvolumes gfs-ds-afr
end-volume
volume server
type protocol/server
option transport-type tcp/server
subvolumes gfs
option auth.ip.gfs-ds-locks.allow 10.0.0.*,127.0.0.1
option auth.ip.gfs.allow 10.0.0.*,127.0.0.1
end-volume

复制代码

Client

Finally the client configuration:

[root@node0 ~]# more /etc/glusterfs/glusterfs-client.vol
# the exported volume to mount
volume cluster
type protocol/client
option transport-type tcp/client # For TCP/IP transport
option remote-host cluster.yourdomain.com # FQDN of server
option remote-subvolume gfs # Exported volmue
option transport-timeout 10 # Value in seconds
end-volume
# performance block for cluster # optional!
volume writeback
type performance/write-behind
option aggregate-size 131072
subvolumes cluster
end-volume
# performance block for cluster # optional!
volume readahead
type performance/read-ahead
option page-size 65536
option page-count 16
subvolumes writeback
end-volume

复制代码

Start Gluster on Servers and Clients

On both servers make sure:

* /ha exist
* If /ha is a mount point make sure the file system has been created (in our case ext3)
* Mount /ha on both servers
* The configuration files exist

On the client make sure:

* The fuse service is started and the kernel module is loaded.
* The client configuration exist

Make sure the FDQN cluster.mydomain.com resolves to both addresses (10.0.0.2, 10.0.0.3). Finally make sure the clock is synchronized on all three nodes.

Start gluster on both servers:

[root@node2 ~]# glusterfsd -f /etc/glusterfs/glusterfs-server.vol
[root@node2 ~]# tail /var/log/glusterfsd.log

复制代码

Start Gluster on the client and mount /ha:

[root@node0 ~]# glusterfs -f /etc/glusterfs/glusterfs-client.vol /ha
[root@node0 ~]# cd /ha
[root@node0 ~]# ls -l

复制代码

Test that things are working ok by creating a new file:

[root@node0 ~]# cd /ha
[root@node0 ~]# touch file.txt
[root@node0 ~]# ls
file.txt

复制代码

Now go to both servers and see if the file was created on both servers under /ha. If the file exists on both congratulations!
Configure Apache

The configuration of Apache is identical to the one we did for our two-node gnbd cluster. Copy all the files to the new root which will be /ha then on the client (node 0) change the configuration file to point to the correct root of the web server (i.e. /ha).

When done you should be able to go to the IP of the client via a web browser and get the Apache default page.
Testing Failover

To test failover simulate a failure of one of the nodes, for our test we will choose node 1. So issue:

[root@node1 ~]# killall glusterfsd

复制代码

Wait a few seconds and then browse to the IP of node 0. You should get a response if you do congrats once more! You have a working two-node cluster.

Test recovery by restarting gluster on node 1, before you do touch a file on the client:

[root@node0 ~]# echo "testing failure" > /ha/test.txt

复制代码

The file should appear on node 2 but node 1 should not have it. Start gluster on node 1 after it starts you will notice that the file “test.txt” does not appear in /ha on node 1 even though the cluster is once again up.

Self-healing will eventually synchronize the files from node 2 to node 1 when the files that are new are accessed on node 2, if you want to force self-healing to happen then use a script that access those files that have changed or simply do a:

[root@node2 ~]# ls -lR

复制代码

The above will force self-healing, after you do this “test.txt” will appear on node 1.
Conclusions

The setup presented here will give you a two-node cluster using server-to-server replication and a client that will access the data. The setup can easily scale up to have 2 o more servers and 2 or more clients.

Of course in a production environment you should:

* Used bonded interfaces on all nodes (servers and clients).
* Used a dedicated network for cluster communication, see the article by Daniel Maher at the beginning of this post
* Used 64 bit architecture servers and dedicated storage for each. This will improve performance.
* Gluster recommends using client-server replication instead of server-server replication; however I believe there are advantages to using server-server replication and freeing the client from doing anything other than do what a client does just access data.
* Release 4.0 now has added HA as a translator thus IP addresses can be used now also as elements for failover, with this new translator having an internal DNS to resolve FQDN for the cluster becomes a non-issue since you will be able to use the HA IP address of the cluster instead.
* Use CARP or Heartbeat on the client side to give you additional HA on the client side.

Finally a setup like this gives you an inexpensive way of creating a cluster without the cost of a SAN for example. The Gluster-Wiki has some examples of people using it in production environments.

http://blog.miguelsarmiento.com/ ... lusterfs-and-centos

REFERENCE
http://www.thismail.org/bbs/viewthread.php?tid=3180

Command Center

Friday, January 22, 2010

A HA Two-Node Server Side Cluster Using Glusterfs and CentOS

Labels

Blog Archive

My Blog List