Friday, January 22, 2010

Centos DRBD HA Howto

SkyHi @ Friday, January 22, 2010

Prerequisites

- Setup Minimal CentOS 5
- be sure that both nodes can resolve correctly names (either through dns or /etc/hosts)
- yum update (as usual ... ;-) )
- yum install heartbeat drbd kmod-drbd (available in the extras repository)
People wanting to use drbd 8.2.6 instead of 8.0.13 have to use drbd82 and kmod-drbd82 . Please keep in mind that you can't mix versions. Configuration syntax is also different
Current situation :
  • node1.yourdomain.org 172.29.156.20/24 , source disc /dev/sdb that will be replicated
  • node2.yourdomain.org 172.29.156.21/24 , target disc /dev/sdb

DRBD Configuration

We'll configure DRBD so that /dev/sdb will be replicated from one node to the other (roles can be changed at any time though) The name of the drbd resource will be "repdata" (you can of course use the name you want). Here is the content of the /etc/drbd.conf file :
#
# please have a a look at the example configuration file in
# /usr/share/doc/drbd/drbd.conf
#
global { usage-count no; }
resource repdata {
  protocol C;
  startup { wfc-timeout 0; degr-wfc-timeout     120; }
  disk { on-io-error detach; } # or panic, ...
  net {  cram-hmac-alg "sha1"; shared-secret "Cent0Sru!3z"; } # don't forget to choose a secret for auth !
  syncer { rate 10M; }
  on node1.yourdomain.org {
    device /dev/drbd0;
    disk /dev/sdb;
    address 172.29.156.20:7788;
    meta-disk internal;
  }
  on node2.yourdomain.org {
    device /dev/drbd0;
    disk /dev/sdb;
    address 172.29.156.21:7788;
    meta-disk internal;
  }
}
- replicate this config file (/etc/drbd.conf) to the second node
scp /etc/drbd.conf root@node2:/etc/
- Initialize the meta-data area on disk before starting drbd (! on both nodes!)
[root@node1 etc]# drbdadm create-md repdata
v08 Magic number not found
v07 Magic number not found
About to create a new drbd meta data block on /dev/sdb.
 . ==> This might destroy existing data! <==
Do you want to proceed? [need to type 'yes' to confirm] yes
Creating meta data... initialising activity log NOT initialized bitmap (256 KB) New drbd meta data block sucessfully created.
- start drbd on both nodes (service drbd start)
[root@node1 etc]# service drbd start
Starting DRBD resources:    [ d0 n0 ]. ......
[root@node1 etc]# cat /proc/drbd
version: 8.0.4 (api:86/proto:86) SVN Revision: 2947 build by buildsvn@c5-i386-build, 2007-07-31 19:17:18
 . 0: cs:Connected st:Secondary/Secondary ds:Inconsistent/Inconsistent C r---
  . ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0
   . resync: used:0/31 hits:0 misses:0 starving:0 dirty:0 changed:0 act_log: used:0/257 hits:0 misses:0 starving:0 dirty:0 changed:0
[root@node1 etc]# ssh root@node2 cat /proc/drbd  version: 8.0.4 (api:86/proto:86) SVN Revision: 2947 build by buildsvn@c5-i386-build, 2007-07-31 19:17:18
 . 0: cs:Connected st:Secondary/Secondary ds:Inconsistent/Inconsistent C r---
  . ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0
   . resync: used:0/31 hits:0 misses:0 starving:0 dirty:0 changed:0 act_log: used:0/257 hits:0 misses:0 starving:0 dirty:0 changed:0
As you can see , both nodes are secondary, which is normal. we need to decide which node will act as a primary now (node1) : that will initiate the first 'full sync' between the two nodes :
[root@node1 etc]# drbdadm -- --overwrite-data-of-peer primary repdata
[root@node1 etc]# watch -n 1 cat /proc/drbd  version: 8.0.4 (api:86/proto:86) SVN Revision: 2947 build by buildsvn@c5-i386-build, 2007-07-31 19:17:18
 . 0: cs:SyncTarget st:Primary/Secondary ds:Inconsistent/Inconsistent C r---
  . ns:0 nr:68608 dw:68608 dr:0 al:0 bm:4 lo:0 pe:0 ua:0 ap:0
   . [>...................] sync'ed:  0.9% (8124/8191)M finish: 0:12:05 speed: 11,432 (11,432) K/sec resync: used:0/31 hits:4283 misses:5 starving:0 dirty:0 changed:5 act_log: used:0/257 hits:0 misses:0 starving:0 dirty:0 changed:0
- we can now format /dev/drbd0 and mount it on node1 : mkfs.ext3 /dev/drbd0 ; mkdir /repdata ; mount /dev/drbd0 /repdata 
- create some fake data on node 1 :
  •  [root@node1 etc]# for i in {1..5};do dd if=/dev/zero of=/repdata/file$i bs=1M count=100;done 
- now switch manually to the second node :
[root@node1 /]# umount /repdata ; drbdadm secondary repdata
[root@node2 /]# mkdir /repdata ; drbdadm primary repdata ; mount /dev/drbd0 /repdata
[root@node2 /]# ls /repdata/ file1  file2  file3  file4  file5  lost+found
Great, data was replicated .... now let's delete/add some file :
[root@node2 /]# rm /repdata/file2 ; dd if=/dev/zero of=/repdata/file6 bs=100M count=2
- Now switch back to the first node :
[root@node2 /]# umount /repdata/ ; drbdadm secondary repdata
[root@node1 /]# drbdadm primary repdata ; mount /dev/drbd0 /repdata
[root@node1 /]# ls /repdata/ file1  file3  file4  file5  file6  lost+found
OK ... Drbd is working ... let's be sure that it will always be started : chkconfig drbd on

Heartbeat V2 Configuration

Let's configure a simple /etc/ha.d/ha.cf file :
keepalive 2
deadtime 30
warntime 10
initdead 120
bcast   eth0
node    node1.yourdomain.org
node    node2.yourdomain.org
crm yes
Create also the /etc/ha.d/authkeys (with permissions 600 !!!) :
auth 1
1 sha1 MySecret
Start the heartbeat service on node1 :
[root@node1 ha.d]# service heartbeat start
Starting High-Availability services: [OK]
Check the cluster status :
[root@node1 ha.d]# crm_mon 
Replicate now the ha.cf and authkeys to node2 and start heartbeat
[root@node1 ha.d]# scp /etc/ha.d/ha.cf /etc/ha.d/authkeys root@node2:/etc/ha.d/
[root@node2 ha.d]# service heartbeat start
Verify cluster with crm_mon :
=====
Last updated: Wed Sep 12 16:20:39 2007
Current DC: node1.centos.org (6cb712e4-4e4f-49bf-8200-4f15d6bd7385)
2 Nodes configured.
0 Resources configured.
=====
Node: node1.yourdomain.org (6cb712e4-4e4f-49bf-8200-4f15d6bd7385): online
Node: node2.yourdomain.org (f6112aae-8e2b-403f-ae93-e5fd4ac4d27e): online
Note about the gui : you can install heartbeat-gui (yum install heartbeat-gui) on a X workstation and connect to the cluster but you'll need to change the password of the hacluster user account on both nodes ! (or you can use another account but put this one in the haclient group)
We'll know create a resource group that contains an ip address (172.29.156.200) , the drbd device (name repdata) and the filesystem mount operation (mount /dev/drbd0 /repdata) Note : Using a group is easier than using single resources : it will start all the resources from a group in order (ordered=true) and on one node (collocated=true)
Here is the content of /var/lib/heartbeat/crb/cib.xml :

 ---------------------------

As you can see, we've created a rsc_location constraint so that the cluster resources will start on the prefered node.
You can now move resources through cli (crm_resource) or by using the gui (change the location constraint rule value - for example swithching from node1.yourdomain.org to node2.yourdomain.org and click on apply) . You'll be able to see all resources switching from one node to the other (ip address, drbd and filesystem mount)

Firewall considerations

You will need to make sure that the nodes can talk on ports:
DRBD: 7788
HEARTBEAT: 694

 
REFERENCES http://wiki.centos.org/HowTos/Ha-Drbd