Friday, January 22, 2010

Distributed and redundant network file systems using GlusterFS on CentOS 5

SkyHi @ Friday, January 22, 2010
My notes on how to install GlusterFS on CentOS 5 servers. This example uses four Gluster servers (named: gluster1, gluster2, etc) in a 2 by 2 distributed-replicated setup (i.e. two copies of the data for redundancy, over two servers for twice the capacity of one). This work was actually carried out on a cluster of KVM virtual servers (if that makes a difference to you). This is not a definitive guide, but instead a note of what I did to get it working. The only undocumented problem I had is addressed in step three. The main source of the documentation is here.

1. Install the Glusterfs RPMs from http://ftp.gluster.com/pub/gluster/glusterfs/2.0/LATEST/CentOS/ (note that the machine you will be using this cluster from doesn’t need the third glusterfs-server package).

  • glusterfs-client-2.0.6-1.i386.rpm
  • glusterfs-common-2.0.6-1.i386.rpm
  • glusterfs-server-2.0.6-1.i386.rpm

2. Install the FUSE modules, via DKMS which is avaliable in the rpmforge repos:

  • rpm -ivh http://packages.sw.be/rpmforge-release/rpmforge-release-0.3.6-1.el5.rf.i386.rpm
  • yum install dkms-fuse

3. HACK: The DKMS system wasn’t working as it couldn’t find the kernel headers needed, so a quick symlink:

  • cd /usr/src/kernels && ln -s 2.6.18-128.7.1.el5-i686 2.6.18-128.el5-i686
  • /etc/init.d/dkms_autoinstaller start

4. Now start the gluster server configuration (remember to make /data/export). Put the gluster server config file in place /etc/glusterfs/glusterfsd.vol

# file: /etc/glusterfs/glusterfsd.vol
volume posix
type storage/posix
option directory /data/export
end-volume

volume locks
type features/locks
subvolumes posix
end-volume

volume brick
type performance/io-threads
option thread-count 8
subvolumes locks
end-volume

volume server
type protocol/server
option transport-type tcp
option auth.addr.brick.allow *
subvolumes brick
end-volume

6. Restart the glusterfs service with:

/etc/init.d/glusterfsd restart

7. Now on the client (i.e. which server you want to access this data from), add this configuration in /etc/glusterfs/glusterfs-client.vol and remember to change the option remote-host to reflect the server hostname or IP.

# file: /etc/glusterfs/glusterfs-client.vol<br />volume remote1<br />type protocol/client<br />option transport-type tcp<br />option remote-host gluster1<br />option remote-subvolume brick<br />end-volume<br /><br />volume remote2<br />type protocol/client<br />option transport-type tcp<br />option remote-host gluster2<br />option remote-subvolume brick<br />end-volume<br /><br />volume remote3<br />type protocol/client<br />option transport-type tcp<br />option remote-host gluster3<br />option remote-subvolume brick<br />end-volume<br /><br />volume remote4<br />type protocol/client<br />option transport-type tcp<br />option remote-host gluster4<br />option remote-subvolume brick<br />end-volume<br /><br />volume replicate1<br />type cluster/replicate<br />subvolumes remote1 remote2<br />end-volume<br /><br />volume replicate2<br />type cluster/replicate<br />subvolumes remote3 remote4<br />end-volume<br /><br />volume distribute<br />type cluster/distribute<br />subvolumes replicate1 replicate2<br />end-volume<br /><br />volume writebehind<br />type performance/write-behind<br />option window-size 1MB<br />subvolumes distribute<br />end-volume<br /><br />volume cache<br />type performance/io-cache<br />option cache-size 512MB<br />subvolumes writebehind<br />end-volume

8. Now on the client you can manually mount the Glusterfs share with:

glusterfs -f /etc/glusterfs/glusterfs.vol /mnt/glusterfs

or automatically in your fstab file with this line:

/etc/glusterfs/glusterfs.vol /mnt/glusterfs glusterfs defaults 0 0

9. Now there are various tests you can do to try it out. I copied a load of RPMs from a repo mirror into /mnt/glusterfs and then stopped networking on the gluster4 node. When I brought it online again later the moment I tried reading from the mount it re-synced the missing data to gluster4 and it was like nothing had happened – impressive!

REFERENCE

http://blog.coopersphotos.co.uk/technology/distributed-and-redundant-network-file-systems-using-glusterfs-on-centos-5