Showing posts with label Linux-HA. Show all posts
Showing posts with label Linux-HA. Show all posts

Thursday, September 9, 2010

Tutorial Caching Story

SkyHi @ Thursday, September 09, 2010
This is a story of Caching
ed note: this is an overview of basic memcached use case, and how memcached clients work
Two plucky adventurers, Programmer and Sysadmin, set out on a journey. Together they make websites. Websites with webservers and databases. Users from all over the Internet talk to the webservers and ask them to make pages for them. The webservers ask the databases for junk they need to make the pages. Programmer codes, Sysadmin adds webservers and database servers.
One day the Sysadmin realizes that their database is sick! It's spewing bile and red stuff all over! Sysadmin declares it has a fever, a load average of 20! Programmer asks Sysadmin, "well, what can we do?" Sysadmin says, "I heard about this great thing called memcached. It really helped livejournal!" "Okay, let's try it!" says the Programmer.
Our plucky Sysadmin eyes his webservers, of which he has six. He decides to use three of them to run the 'memcached' server. Sysadmin adds a gigabyte of ram to each webserver, and starts up memcached with a limit of 1 gigabyte each. So he has three memcached instances, each can hold up to 1 gigabyte of data. So the Programmer and the Sysadmin step back and behold their glorious memcached!
"So now what?" they say, "it's not DOING anything!" The memcacheds aren't talking to anything and they certainly don't have any data. And NOW their database has a load of 25!
Our adventurous Programmer grabs the pecl/memcache client library manual, which the plucky Sysadmin has helpfully installed on all SIX webservers. "Never fear!" he says. "I've got an idea!" He takes the IP addresses and port numbers of the THREE memcacheds and adds them to an array in php.
$MEMCACHE_SERVERS = array(
    "10.1.1.1", //web1
    "10.1.1.2", //web2
    "10.1.1.3", //web3
);
Then he makes an object, which he cleverly calls '$memcache'.
$memcache = new Memcache();
foreach($MEMCACHE_SERVERS as $server){
    $memcache->addServer ( $server );
}
Now Programmer thinks. He thinks and thinks and thinks. "I know!" he says. "There's this thing on the front page that runs SELECT * FROM hugetable WHERE timestamp > lastweek ORDER BY timestamp ASC LIMIT 50000; and it takes five seconds!" "Let's put it in memcached," he says. So he wraps his code for the SELECT and uses his $memcache object. His code asks:
Are the results of this select in memcache? If not, run the query, take the results, and PUT it in memcache! Like so:
$huge_data_for_frong_page = $memcache->get("huge_data_for_frong_page");
if($huge_data_for_frong_page === false){
    $huge_data_for_frong_page = array();
    $sql = "SELECT * FROM hugetable WHERE timestamp > lastweek ORDER BY timestamp ASC LIMIT 50000";
    $res = mysql_query($sql, $mysql_connection);
    while($rec = mysql_fetch_assoc($res)){
        $huge_data_for_frong_page[] = $rec;
    }
    // cache for 10 minutes
    $memcache->set("huge_data_for_frong_page", $huge_data_for_frong_page, 600);
}
// use $huge_data_for_frong_page how you please
Programmer pushes code. Sysadmin sweats. BAM! DB load is down to 10! The website is pretty fast now. So now, the Sysadmin puzzles, "What the HELL just happened!?" "I put graphs on my memcacheds! I used cacti, and this is what I see! I see traffic to one memcached, but I made three :(." So, the Sysadmin quickly learns the ascii protocol and telnets to port 11211 on each memcached and asks it:
Hey, 'get huge_data_for_front_page' are you there?
The first memcached does not answer...
The second memcached does not answer...
The third memcached, however, spits back a huge glob of crap into his telnet session! There's the data! Only once memcached has the key that the Programmer cached!
Puzzled, he asks on the mailing list. They all respond in unison, "It's a distributed cache! That's what it does!" But what does that mean? Still confused, and a little scared for his life, the Sysadmin asks the Programmer to cache a few more things. "Let's see what happens. We're curious folk. We can figure this one out," says the Sysadmin.
"Well, there is another query that is not slow, but is run 100 times per second. Maybe that would help," says the Programmer. So he wraps that up like he did before. Sure enough, the server loads drops to 8!
So the Programmer codes more and more things get cached. He uses new techniques. "I found them on the list and the faq! What nice blokes," he says. The DB load drops; 7, 5, 3, 2, 1!
"Okay," says the Sysadmin, "let's try again." Now he looks at the graphs. ALL of the memcacheds are running! All of them are getting requests! This is great! They're all used!
So again, he takes keys that the Programmer uses and looks for them on his memcached servers. 'get this_key' 'get that_key' But each time he does this, he only finds each key on one memcached! Now WHY would you do this, he thinks? And he puzzles all night. That's silly! Don't you want the keys to be on all memcacheds?
"But wait", he thinks "I gave each memcached 1 gigabyte of memory, and that means, in total, I can cache three gigabytes of my database, instead of just ONE! Oh man, this is great," he thinks. "This'll save me a ton of cash. Brad Fitzpatrick, I love your ass!"
"But hmm, the next problem, and this one's a puzzler, this webserver right here, this one runing memcached it's old, it's sick and needs to be upgraded. But in order to do that I have to take it offline! What will happen to my poor memcache cluster? Eh, let's find out," he says, and he shuts down the box. Now he looks at his graphs. "Oh noes, the DB load, it's gone up in stride! The load isn't one, it's now two. Hmm, but still tolerable. All of the other memcacheds are still getting traffic. This ain't so bad. Just a few cache misses, and I'm almost done with my work. So he turns the machine back on, and puts memcached back to work. After a few minutes, the DB load drops again back down to 1, where it should always be.
"The cache restored itself! I get it now. If it's not available it just means a few of my requests get missed. But it's not enough to kill me. That's pretty sweet."
So, the Programmer and Sysadmin continue to build websites. They continue to cache. When they have questions, they ask the mailing list or read the faq again. They watch their graphs. And all live happily ever after.
Author: Dormando via IRC. Edited by Brian Moon for fun. Further fun editing by Emufarmers.

REFERENCES
http://code.google.com/p/memcached/wiki/TutorialCachingStory

Thursday, September 2, 2010

HOWTO Design a fault-tolerant DHCP + DNS solution

SkyHi @ Thursday, September 02, 2010
In this article, we will describe a design for a fault-tolerant (redundant) DHCP + DNS solution on Linux.
Design criteria:
  • Failure of one DHCP server should not prevent Clients from obtaining a valid IP address.
  • Failure of one DNS server should not prevent Clients from executing DNS queries.
  • The design should allow for Dynamic DNS updates.

Contents

[hide]

[edit] Design Overview

In the image below, you can see the design involving DNS Master / Slave as well as DHCP Primary / Secondary servers.
The sequence of events is as follows:
  1. The Client initiates a DORA (Discover, Offer, Request, Accept) communications sequence with the DHCP servers.
  2. Depending on the Client MAC address, one of the DHCP servers will respond with a DHCP_OFFER.
  3. The Client obtains an IP address as well as additional network settings.
  4. The DHCP server communicates the new lease to its partner.
  5. The DHCP server sends a DNS update to the DNS Master server.
  6. The DNS Slave server(s) are kept in sync using DNS Zone Transfers.

[edit] Fault-Tolerant DHCP Service

In this design, we will use the ISC DHCP daemon. One of the features includes a DHCP Primary / Secondary failover configuration, consisting of exactly 2 DHCP servers.
The DHCP servers share a pool of IP addresses, and keep the lease database for this pool in sync between them. If one DHCP server fails, the remaining server will continue to issue IP addresses. This guarantees uninterrupted service to the Clients.

[edit] Fault-Tolerant DNS Service

In this design, we will use the ISC DNS daemon. There are two ways to achieve fault tolerance:
  1. Master / Slave configuration
  2. Multi-Master configuration

[edit] DNS Master / Slave

DNS Master / Slave configuration is fairly straightforward. All zone data is kept on the DNS Master. The Master is configured to allow zone transfers from the DNS Slaves. Each DNS Slave performs zone transfers to obtain the most recent DNS information from the DNS Master.
Clients obtain a list of DNS servers through DHCP. If a DNS server fails, the Client will attempt to contact one of the remaining DNS servers. This guarantees uninterrupted DNS resolution service to the clients.
If the DNS Master fails, the situation becomes more severe. The DHCP servers communicate their updates only to the DNS Master server. In case of an outage, Dynamic DNS updates could be lost. One possible solution is the use of a DNS Multi-Master configuration.

[edit] DNS Multi-Master

DNS Multi-Master configuration is more complicated to configure and maintain than a Master / Slave configuration.
In case of a DNS server failure, both DNS Query and DNS Update services remain operational.

[edit] Design Decision

For our design, it is sufficient if DNS Queries from Clients remain operational. We will choose the DNS Master / Slave approach here as it is less complex, and still satisfies the design criteria.
how

[edit] Operational Issues

[edit] Managing combined Static and Dynamic DNS

Dynamic DNS is an all-or-nothing feature. If DDNS is enabled for a specific DNS zone, it should no longer be edited manually - your changes may be corrupted or overwritten. This often conflicts with existing systems management practices.
There are two approaches to avoid potential DNS corruption issues when managing a combined Static / Dynamic DNS environment:
Create a separate DNS zone for Dynamic DNS records
Static DNS records can still be managed in their own zone (example.local) as usual. Dynamic DNS records will be managed automatically in a separate DNS (sub-)domain, like "ddns.example.local".
Keep all DNS records in one Dynamic zone
All entries, both static and dynamic, are in the same DNS zone (example.local). Static entries should be managed using the "nsupdate" utility - this means that system management practices and tooling may need to be changed.
In this design, we will use a single Dynamic zone. Static entries will be managed using the "nsupdate" utility.




HOWTO Configure DHCP failover

HOWTO Configure DHCP failover

From Consultancy.EdVoncken.NET

Jump to: navigation, search
The ISC DHCP server currently supports failover using a maximum of 2 servers: primary and secondary. This is an active/active setup; a simple form of load balancing is used to spread the load across both servers.
In this example, we'll be setting up failover DHCP on two servers, 192.168.123.1 and 192.168.123.2. These servers also run DNS and NTP. Dynamic clients will get an address in the range 192.168.123.100-199. Static leases are defined for several networked devices.
Since the ISC DHCP server allows the use of "include-files" in the configuration, we will use them to help keep the configurations simple and in sync across servers.

Contents

[hide]

[edit] Installation

Install the following package, for example using yum:
dhcp
This example is based on version dhcp-3.0.5-18.el5.

[edit] Configuration

The configuration consists of several sections, each stored in a separate file to make maintenance easier.

[edit] Failover parameters

For the Primary, define the following failover parameters in /etc/dhcpd.conf_primary:
##########################
 # DHCP Failover, Primary #
 ##########################
 
 failover peer "example" {                   # Failover configuration
        primary;                             # I am the primary
        address 192.168.123.1;               # My IP address
        port 647;
        peer address 192.168.123.2;          # Peer's IP address
        peer port 647;
        max-response-delay 60;
        max-unacked-updates 10;
        mclt 3600;
        split 128;                           # Leave this at 128, only defined on Primary
        load balance max seconds 3;
 }
For the Secondary, define the following failover parameters in /etc/dhcpd.conf_secondary:
############################
 # DHCP Failover, Secondary #
 ############################
 
 failover peer "example" {                   # Fail over configuration
        secondary;                           # I am the secondary
        address 192.168.123.2;               # My ip address
        port 647;
        peer address 192.168.123.1;          # Peer's ip address
        peer port 647;
        max-response-delay 60;
        max-unacked-updates 10;
        mclt 3600;
        load balance max seconds 3;
 }

[edit] Subnet declaration

Write a subnet declaration using our failover pool in /etc/dhcpd.conf_subnet. This section is identical on Primary and Secondary:
subnet 192.168.123.0 netmask 255.255.255.0  # zone to issue addresses from
 {
       pool {
               failover peer "example";      # Pool for dhcp leases with failover bootp not allowed
               deny dynamic bootp clients;
               range 192.168.123.100 192.168.123.190;
       }
       pool {                                # Accomodate our bootp clients here; no replication and failover
               range 192.168.123.191 192.168.123.199;
       }
       allow unknown-clients;
 
       authoritative;
 
       option routers             192.168.123.254;
       option subnet-mask         255.255.255.0;
       option broadcast-address   192.168.123.255;
       option domain-name         "example.local.";
       option domain-name-servers 192.168.123.1, 192.168.123.2;
       option ntp-servers         192.168.123.1, 192.168.123.2;
       option netbios-node-type   8;
 
       default-lease-time         300;
       max-lease-time             600;
 
       filename                   "/pxelinux.0";
       next-server                192.168.123.1;
 }
Note: the manpage for dhcpd.conf(5) states that dynamic BOOTP leases are not compatible with failover.
Therefore, BOOTP should be disabled in in pools using failover.

[edit] Dynamic DNS

If you are configuring Dynamic DNS, write the settings in /etc/dhcpd.conf_subnet. This section is identical on Primary and Secondary:
ddns-update-style interim;
 ddns-updates on;
 ddns-domainname "example.local."; 
 ignore client-updates;
 
 # Forward zone for DNS updates
 zone example.local
 {
       primary 192.168.123.1;                # update the primary DNS
       key ddns-update;                      # key to use for the update
 }
 
 # Reverse zone for DNS updates
 zone 123.168.192.in-addr.arpa
 {
       primary 192.168.123.1;                # update the primary DNS
       key ddns-update;                      # key for update
 }
Note: for security reasons, DNS updates need to be "signed" using a public/private key mechanism.
The "key ddns-update" statement specifies that DHCP will use a key named "ddns-update" during update requests.
For more information on this key, please refer to HOWTO Configure Dynamic DNS.

[edit] Static leases

For more flexible IP address management, configure all devices to use DHCP and set up static leases for these devices.
In /etc/dhcpd.conf_static, create all static leases that you may need (outside of the DHCP/BOOTP range!). Again, this section is identical on Primary and Secondary:
# Axis Security Camera
 host cam-reception {
       hardware ethernet 00:40:12:c0:ff:ee;
       fixed-address 192.168.123.200;
 }
 
 # Axis Security Camera
 host cam-fireexit {
       hardware ethernet 00:40:fe:ed:fa:ce;
       fixed-address 192.168.123.201;
 }
 
 # Axis Security Camera
 host cam-frontdoor {
       hardware ethernet 00:40:de:ad:be:ef;
       fixed-address 192.168.123.202;
 }

[edit] Overall configuration

The configuration of the Primary and Secondary DHCP servers is mostly identical, except for the Failover parameters. By keeping the sub-configurations in sync across servers (perhaps using rsync), maintenance is reduced to a minimum.
The overall configuration file, /etc/dhcpd.conf, is only slightly different on Primary and Secondary.

[edit] Configuring /etc/dhcpd.conf on the Primary

# DHCP Server - Configuration file for Primary
 #
 # File $Id: dhcpd.conf,v 1.21 2009/07/09 16:26:57 root Exp root $
 
 # Global configuration
 set vendorclass = option vendor-class-identifier;
 
 # Dynamic DNS Updates
 include "/etc/ddns-update.dnskey";
 include "/etc/dhcpd.conf_ddns";
 
 # DHCP Failover, Primary
 include "/etc/dhcpd.conf_primary";
 
 # Subnet declaration
 include "/etc/dhcpd.conf_subnet";
 
 # Static IP addresses
 include "/etc/dhcpd.conf_static";
 
 # EOF

[edit] Configuring /etc/dhcpd.conf on the Secondary

# DHCP Server - Configuration file for Secondary
 #
 # File $Id: dhcpd.conf,v 1.9 2009/07/09 16:31:20 root Exp root $
 
 # Global configuration
 set vendorclass = option vendor-class-identifier;
 
 # Dynamic DNS Updates
 include "/etc/ddns-update.dnskey";
 include "/etc/dhcpd.conf_ddns";
 
 # DHCP Failover, Secondary
 include "/etc/dhcpd.conf_secondary";
 
 # Subnet declaration
 include "/etc/dhcpd.conf_subnet";
 
 # Static IP addresses
 include "/etc/dhcpd.conf_static";
 
 # EOF

[edit] Miscellaneous

[edit] SElinux considerations

By default, SELinux policy does not allow the BIND daemon (named) to write to files labeled with the name_zone_t type, which is used for master zone files. The zone files should be stored under /var/named/chroot/var/named/data or /var/named/chroot/var/named/dynamic.
# restorecon -R -v /var/named/chroot/var/named/data
 # restorecon -R -v /var/named/chroot/var/named/dynamic
This will reset the zone files to the named_cache_t type, hopefully solving the "SELinux is preventing named (named_t) "unlink"" error messages.

[edit] Firewall settings

Your firewall should allow inbound traffic on 69/UDP, 69/TCP and 647/TCP. Sample entries for /etc/sysconfig/iptables:
# DHCP server
 -A INPUT -p udp -m udp --dport 69 -j ACCEPT
 -A INPUT -p tcp -m tcp --dport 69 -j ACCEPT
 -A INPUT -m state --state NEW -m tcp -p tcp --dport 647 -j ACCEPT

[edit] Starting the service

On both DHCP Primary and Secondary, run the following commands as root:
# chkconfig dhcpd on
 # service dhcpd start

[edit] References







HOWTO Configure Dynamic DNS

HOWTO Configure Dynamic DNS

From Consultancy.EdVoncken.NET

Jump to: navigation, search
In this example, we will set up a DNS Master and DNS Slave server, on 192.168.123.1 and 192.168.123.2 respectively.
The configuration will also allow for Dynamic DNS updates from our DHCP servers.

Contents

[hide]

[edit] Installation

Install the following packages, for example using yum:
bind
 bind-chroot
 bind-utils
This example is based on bind-9.3.4-10.P1.el5_3.1.

[edit] Configuration

The configuration and data files for the chroot()-ed BIND DNS server can be found under /var/named/chroot/. Under /etc, you will find a symlink pointing to /var/named/chroot/etc/named.conf.

[edit] DNS Keys

For Dynamic DNS to work, the updates need to be "signed" using a transaction key. Since this is a symmetric key, it has to be shared between DNS and DHCP. It must be protected to prevent unauthorized changes being made to your DNS zones. The key has to be available on both DHCP servers. Generate the key as follows:
# cd /tmp
 # dnssec-keygen -a HMAC-MD5 -b 512 -n HOST ddns-update
These commands generate a set of .key and .private files in the current working directory. Move these files to a better name and location:
# mv Kddns-update.*.key /etc/ddns-update.key
 # cat /etc/ddns-update.key 
 ddns-update. IN KEY 512 3 157 K3EaOD3IysiC/D7lIXp+4hrYGDLyIq6la[...]9oE4kZ3O1ZFxKSMHfwG5YvUkYE7gxMHCmCg==
 
 # mv Kddns-update.*.private /etc/ddns-update.private
 # cat /etc/ddns-update.private 
 Private-key-format: v1.2
 Algorithm: 157 (HMAC_MD5)
 Key: K3EaOD3IysiC/D7lIXp+4hrYGDLyIq6la[...]9oE4kZ3O1ZFxKSMHfwG5YvUkYE7gxMHCmCg==
Note that the actual private and public keys are identical for HMAC-MD5. This is normal. The .key and .private files are needed by the nsupdate utility, later on.
We now need create a configuration file in a different format, for use by the DHCP and DNS servers- we will call this file /etc/ddns-update.dnskey. The syntax is identical to the /etc/rndc.key file. We need to set the key name and the key value properly:
# cat /etc/ddns-update.dnskey 
 key "ddns-update" {
       algorithm hmac-md5;
       secret "K3EaOD3IysiC/D7lIXp+4hrYGDLyIq6la[...]9oE4kZ3O1ZFxKSMHfwG5YvUkYE7gxMHCmCg==";
 };
Make sure it has the proper ownership and permissions:
# ls -l /etc/ddns-update.dnskey
 -rw-r----- 1 root named 145 Jul  9 12:25 ddns-update.dnskey
On the Primary DHCP / Master DNS server, the key needs to be available both as /etc/ddns-update.dnskey (for DHCP) and /var/named/chroot/etc/ddns-update.dnskey (for DNS). Creating a symlink will not work due to the SElinux policy; you will have to copy the file instead, so each copy has its own SElinux context:
# cp /etc/ddns-update.dnskey /var/named/chroot/etc/
 # ls -lZ /etc/ddns-update.dnskey /var/named/chroot/etc/ddns-update.dnskey 
 -rw-r-----  root root  root:object_r:etc_t              /etc/ddns-update.dnskey
 -rw-r-----  root named root:object_r:named_conf_t       /var/named/chroot/etc/ddns-update.dnskey

[edit] DNS Master configuration

On the Master, we will define all zones that we are authoritative for. We will also allow DNS updates to these zones from our DHCP servers.
# ISC BIND Configuration File
 #
 # Purpose:
 #   Configure BIND as caching/forwarding nameserver with authority
 #   for local networks as well as support for Dynamic DNS Updates
 #
 # File $Id: named.conf,v 1.4 2009/07/07 12:59:12 root Exp root $
 
 options {
       directory "/etc";
       pid-file "/var/run/named/named.pid";
       forwarders {
               // Put your ISP's DNS servers here
               66.159.123.200;
               66.159.123.201;
       };
       allow-query { localhost; localnets; };
 };
 
 # Key used by DHCP servers for Dynamic DNS Updates
 include "/etc/ddns-update.dnskey";
 
 zone "example.local" {
       type master;
       file "/var/named/data/example.local.zone";
       allow-transfer { 192.168.123.2; };
       allow-update { key "ddns-update"; };
 };
 
 zone "123.168.192.in-addr.arpa" {
       type master;
       file "/var/named/data/192.168.123.zone";
       allow-transfer { 192.168.123.2; };
       allow-update { key "ddns-update"; };
 };
 
 # EOF
SELinux Note: On the DNS Master, use the "data" sub-directory to store zone files.
Otherwise, you will see errors while trying to create journal files on the Master.

[edit] DNS Slave configuration

You can have multiple DNS Slave servers. Each will perform a zone transfer regularly, keeping the data in sync.
Dynamic DNS updates, originating from our DHCP servers are sent to the DNS Master only.
# ISC BIND Configuration File
 #
 # Purpose:
 #   Configure BIND as caching/forwarding slave nameserver
 #
 # File $Id: named.conf,v 1.4 2009/07/08 02:02:19 root Exp $
 
 options {
       directory "/etc";
       pid-file "/var/run/named/named.pid";
       forwarders {
               // Put your ISP's DNS servers here
               66.159.123.200;
               66.159.123.201;
       };
       allow-query { localhost; localnets; };
       allow-notify { 192.168.123.2; };
 };
 
 # Dynamic DNS Updates are only sent to the Primary DNS
 
 zone "example.com" {
       type slave;
       masters { 192.168.123.1; };
       file "/var/named/slaves/example.com.zone";
 };
 
 zone "123.168.192.in-addr.arpa" {
       type slave;
       masters { 192.168.123.1; };
       file "/var/named/slaves/192.168.123.zone";
 };
The "allow-notify" option prevents BIND from generating error messages as it apparently tries to notify itself of updates. Go figure ;-)
SELinux Note: On the DNS Slave, use the "slaves" sub-directory to store data from the DNS Master.
Otherwise, you will get a "permission denied" error on the Slave while trying to transfer the zones from the Master.

[edit] DNS Zone files

On the DNS Master, we create a minimal set of zone files (forward and reverse zones). Entries will be managed either by DHCP or nsupdate.
/var/named/data/example.local.zone:
 ; DO NOT EDIT MANUALLY - use the "nsupdate" utility to prevent data loss
 ;
 $ORIGIN example.local.
 $TTL 86400 ; 1 day
 @  IN SOA ns1.example.local. hostmaster.example.local. (
     2009074711 ; serial
     7200       ; refresh (2 hours)
     300        ; retry (5 minutes)
     604800     ; expire (1 week)
     60         ; minimum (1 minute)
     )
   IN NS ns1.example.local.
   IN NS ns2.example.local.
 ns1  IN A 192.168.123.1
 ns2  IN A 192.168.123.2
/var/named/data/192.168.123.zone:
 ; DO NOT EDIT MANUALLY - use the "nsupdate" utility to prevent data loss
 ;
 $ORIGIN 123.168.192.in-addr.arpa.
 $TTL 86400 ; 1 day
 @  IN SOA ns1.example.local. hostmaster.example.local. (
     2009074711 ; serial
     7200       ; refresh (2 hours)
     300        ; retry (5 minutes)
     604800     ; expire (1 week)
     60         ; minimum (1 minute)
     )
   IN NS ns1.example.local.
   IN NS ns2.example.local.
 1  IN PTR ns1.example.local.
 2  IN PTR ns2.example.local.

[edit] Miscellaneous

[edit] Client configuration

On RHEL/CentOS/Fedora clients, you should edit /etc/sysconfig/network-scripts/ifcfg-eth0 and set the DHCP_HOSTNAME variable to the short hostname of your machine. The client will now send its hostname to the DHCP server during IP address negotiation. The DHCP_HOSTNAME is used for updating Dynamic DNS. Sample:
# Sample Network Device
 DEVICE=eth0
 HWADDR=00:16:de:ad:be:ef
 ONBOOT=yes
 BOOTPROTO=dhcp
 DHCP_HOSTNAME=demo01

[edit] Using nsupdate to add or remove DNS entries

[edit] Adding a host (A and PTR records)

# nsupdate -k /etc/ddns-update.key
 > update add gateway.example.local 38400 A 192.168.123.254
 > 
 > update add 254.123.168.192.in-addr.arpa. 38400 PTR gateway.example.local.
 >
 > quit
Note: The empty line is necessary, it sends the update to DNS. Since we are adding records to two different zones, we need to send two separate updates.

[edit] Deleting a host (A and PTR records)

# nsupdate -k /etc/ddns-update.key 
 > update delete gateway.example.local IN A 192.168.123.254
 > 
 > update delete 254.123.168.192.in-addr.arpa PTR gateway.example.local.
 > 
 > quit

[edit] Adding a mail-host (MX records)

The domain "example.local" wishes to use "mail.example.local" as their primary mail host.
We first need to add the standard A and PTR records for the mailhost (TTL 86400 seconds), followed by the MX record for the domain:
# nsupdate -k /etc/ddns-update.key 
 > update add mail.example.nl 86400 IN A 192.168.123.25
 > 
 > update add 25.123.168.192.in-addr.arpa. 86400 PTR mail.example.local.
 > 
 > update add example.local 86400 MX 10 mail.example.local.
 > 
 > quit
Note: The mailhost should of course be accessible from the Internet and use a routable IP address instead of an RFC1918 address.
Verify the results using 'dig':
# dig example.local MX
 
 ; <<>> DiG 9.3.4-P1 <<>> example.local MX
 ;; global options:  printcmd
 ;; Got answer:
 ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 15733
 ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 2, ADDITIONAL: 3
 
 ;; QUESTION SECTION:
 ;example.local.   IN MX
 
 ;; ANSWER SECTION:
 example.local.  86400 IN MX 10 mail.example.local.
 
 ;; AUTHORITY SECTION:
 example.local.  86400 IN NS ns2.example.local.
 example.local.  86400 IN NS ns1.example.local.
 
 ;; ADDITIONAL SECTION:
 mail.example.local. 86400 IN A 192.168.123.25
 ns1.example.local. 86400 IN A 192.168.123.1
 ns2.example.local. 86400 IN A 192.168.123.2
 
 ;; Query time: 1 msec
 ;; SERVER: 127.0.0.1#53(127.0.0.1)
 ;; WHEN: Fri Jul 31 11:34:29 2009
 ;; MSG SIZE  rcvd: 134

[edit] Deleting a mail-host (MX records)

If we wish to remove the mail-host, just delete the MX, A and PTR records:
# nsupdate -k /etc/ddns-update.key 
 > update delete example.local MX 10 mail.example.local.
 > 
 > update delete mail.example.local IN A 192.168.123.25
 > 
 > update delete 25.123.168.192.in-addr.arpa PTR mail.example.local.
 > 
 > quit
Note: Mail may continue to be delivered to the old mailhost until the TTL expires!

[edit] Debugging

During development, you may want to enable some extra logging in /etc/named.conf:
logging {
       channel update_debug {
               file "/var/named/data/named-update.log";
               severity  debug 3;
               print-category yes;
               print-severity yes;
               print-time     yes;
       };
 
       channel security_info    {
               file "/var/named/data/named-auth.log";
               severity  debug 3;
               print-category yes;
               print-severity yes;
               print-time     yes;
       };
 
       category update { update_debug; };
       category security { security_info; };
 };

[edit] Starting the service

On both Master and Slave DNS, start the BIND nameserver:
# chkconfig named on
 # service named start
HOWTO Manage Dynamic DNS with nsupdate

HOWTO Manage Dynamic DNS with nsupdate

From Consultancy.EdVoncken.NET

Jump to: navigation, search

Contents

[hide]

[edit] A and PTR records

[edit] Adding a host (A and PTR records)

# nsupdate -k /etc/ddns-update.key > update add gateway.example.local 38400 A 192.168.123.254 > > update add 254.123.168.192.in-addr.arpa. 38400 PTR gateway.example.local. > > quit Note: The empty line is necessary, it sends the update to DNS. Since we are adding records to two different zones, we need to send two separate updates.

[edit] Deleting a host (A and PTR records)

# nsupdate -k /etc/ddns-update.key > update delete gateway.example.local IN A 192.168.123.254 > > update delete 254.123.168.192.in-addr.arpa PTR gateway.example.local. > > quit

[edit] MX records

[edit] Adding a mail-host

The domain "example.local" wishes to use "mail.example.local" as their primary mail host.
We first need to add the standard A and PTR records for the mailhost (TTL 86400 seconds), followed by the MX record for the domain:
# nsupdate -k /etc/ddns-update.key > update add mail.example.nl 86400 IN A 192.168.123.25 > > update add 25.123.168.192.in-addr.arpa. 86400 PTR mail.example.local. > > update add example.local 86400 MX 10 mail.example.local. > > quit Note: The mailhost should of course be accessible from the Internet and use a routable IP address instead of an RFC1918 address.
Verify the results using 'dig':
# dig example.local MX  ; <<>> DiG 9.3.4-P1 <<>> example.local MX  ;; global options: printcmd  ;; Got answer:  ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 15733  ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 2, ADDITIONAL: 3  ;; QUESTION SECTION:  ;example.local. IN MX  ;; ANSWER SECTION: example.local. 86400 IN MX 10 mail.example.local.  ;; AUTHORITY SECTION: example.local. 86400 IN NS ns2.example.local. example.local. 86400 IN NS ns1.example.local.  ;; ADDITIONAL SECTION: mail.example.local. 86400 IN A 192.168.123.25 ns1.example.local. 86400 IN A 192.168.123.1 ns2.example.local. 86400 IN A 192.168.123.2  ;; Query time: 1 msec  ;; SERVER: 127.0.0.1#53(127.0.0.1)  ;; WHEN: Fri Jul 31 11:34:29 2009  ;; MSG SIZE rcvd: 134

[edit] Deleting a mail-host

If we wish to remove the mail-host, just delete the MX, A and PTR records:
# nsupdate -k /etc/ddns-update.key > update delete example.local MX 10 mail.example.local. > > update delete mail.example.local IN A 192.168.123.25 > > update delete 25.123.168.192.in-addr.arpa PTR mail.example.local. > > quit Note: Mail may continue to be delivered to the old mailhost until the TTL expires!

[edit] Service (SRV) records

[edit] Adding SRV records for your IPA Server

After installing the IPA Server ("apollo" in this example), you should add some service-records to DNS for IPA discovery. The installer leaves a sample DNS zone file in /tmp. This is how I added the relevant records using nsupdate:
# nsupdate -k /etc/ddns-update.key > update add _ldap._tcp.example.local. 86400 IN SRV 0 100 389 apollo > > update add _kerberos._tcp.example.local. 86400 IN SRV 0 100 88 apollo > > update add _kerberos._udp.example.local. 86400 IN SRV 0 100 88 apollo > > update add _kerberos-master._tcp.example.local. 86400 IN SRV 0 100 88 apollo > > update add _kerberos-master._udp.example.local. 86400 IN SRV 0 100 88 apollo > > update add _kpasswd._tcp.example.local. 86400 IN SRV 0 100 464 apollo > > update add _kpasswd._udp.example.local. 86400 IN SRV 0 100 464 apollo > > quit

[edit] Navigation

[edit] References



Wednesday, April 21, 2010

Gratuitous ARP

SkyHi @ Wednesday, April 21, 2010

Gratuitous ARP could mean both gratuitous ARP request or gratuitous ARP reply. Gratuitous in this case means a request/reply that is not normally needed according to the ARP specification (RFC 826) but could be used in some cases. A gratuitous ARP request is an AddressResolutionProtocol request packet where the source and destination IP are both set to the IP of the machine issuing the packet and the destination MAC is the broadcast address ff:ff:ff:ff:ff:ff. Ordinarily, no reply packet will occur. A gratuitous ARP reply is a reply to which no request has been made.

Gratuitous ARPs are useful for four reasons:

  • They can help detect IP conflicts. When a machine receives an ARP request containing a source IP that matches its own, then it knows there is an IP conflict.
  • They assist in the updating of other machines' ARP tables. Clustering solutions utilize this when they move an IP from one NIC to another, or from one machine to another. Other machines maintain an ARP table that contains the MAC associated with an IP. When the cluster needs to move the IP to a different NIC, be it on the same machine or a different one, it reconfigures the NICs appropriately then broadcasts a gratuitous ARP reply to inform the neighboring machines about the change in MAC for the IP. Machines receiving the ARP packet then update their ARP tables with the new MAC.

  • They inform switches of the MAC address of the machine on a given switch port, so that the switch knows that it should transmit packets sent to that MAC address on that switch port.
  • Every time an IP interface or link goes up, the driver for that interface will typically send a gratuitous ARP to preload the ARP tables of all other local hosts. Thus, a gratuitous ARP will tell us that that host just has had a link up event, such as a link bounce, a machine just being rebooted or the user/sysadmin on that host just configuring the interface up. If we see multiple gratuitous ARPs from the same host frequently, it can be an indication of bad Ethernet hardware/cabling resulting in frequent link bounces.


Examples


  • The networking stack in many operating systems will issue a gratuitous ARP if the IP or MAC address of a network interface changes, to inform other machines on the network of the change so they can report IP address conflicts, to let other machines update their ARP tables, and to inform switches of the MAC address of the machine. The networking stack in many operating systems will also issue a gratuitous ARP on an interface every time the link to that interface has been brought to the up state. The gratuitous ARP then is used to preload the ARP table on all local hosts of the possibly new mapping between MAC and IP address (for failover clusters that do not take over the MAC address) or to let the switch relearn behind which port a certain MAC address resides (for failover clusters where you do pull the MAC address over as well or when you simply just move the network cable from one port to another on a normal nonclustered host)
  • The High-Availability Linux Project utilizes a command-line tool called send_arp to perform the gratuitous ARP needed in their failover process. A typical clustering scenario might play out like the following:

    • Two nodes in a cluster are configured to share a common IP address 192.168.1.1. Node A has a hardware address of 01:01:01:01:01:01 and node B has a hardware address of 02:02:02:02:02:02.

    • Assume that node A currently has IP address 192.168.1.1 already configured on its NIC. At this point, neighboring devices know to contact 192.168.1.1 using the MAC 01:01:01:01:01:01.

    • Using the heartbeat protocol, node B determines that node A has died.
    • Node B configures a secondary IP on an interface with ifconfig eth0:1 192.168.1.1.

    • Node B issues a gratuitous ARP with send_arp eth0 192.168.1.1 02:02:02:02:02:02 192.168.1.255. All devices receiving this ARP update their table to point to 02:02:02:02:02:02 for the IP address 192.168.1.1.


Example Traffic


Ethernet II, Src: 02:02:02:02:02:02, Dst: ff:ff:ff:ff:ff:ff
Destination: ff:ff:ff:ff:ff:ff (Broadcast)
Source: 02:02:02:02:02:02 (02:02:02:02:02:02)
Type: ARP (0x0806)
Trailer: 000000000000000000000000000000000000
Address Resolution Protocol (request/gratuitous ARP)
Hardware type: Ethernet (0x0001)
Protocol type: IP (0x0800)
Hardware size: 6
Protocol size: 4
Opcode: request (0x0001)
Sender MAC address: 02:02:02:02:02:02 (02:02:02:02:02:02)
Sender IP address: 192.168.1.1 (192.168.1.1)
Target MAC address: ff:ff:ff:ff:ff:ff (Broadcast)
Target IP address: 192.168.1.1 (192.168.1.1)
0000 ff ff ff ff ff ff 02 02 02 02 02 02 08 06 00 01 ................
0010 08 00 06 04 00 01 02 02 02 02 02 02 c0 a8 01 01 ................
0020 ff ff ff ff ff ff c0 a8 01 01 00 00 00 00 00 00 ................
0030 00 00 00 00 00 00 00 00 00 00 00 00 ............


Discussion


What's a good choice for example MACs? I picked 02:02:02:02:02:02. Is there a better one? -- RandyMcEoin

-The '02' byte at the start of the MAC indicates that this is a 'locally administered address' which has been set by the local user or system. Most normal ethernet devices are allocated a MAC with 00 as the most significant byte.

I updated the article to differentiate between gratuitous ARP request and reply.

Note that some devices will respond to the gratuitous request and some will respond to the gratuitous reply. If one is trying to write software for moving IP addresses around that works with all routers, switches and IP stacks, it is best to send both the request and the reply. These are documented by RFC 2002 and RFC 826. Software implementing the gratuitious ARP function can be found in the Linux-HA source tree. A request may be preceded by a probe to avoid polluting the address space. For an ARP Probe the Sender IP address field is 0.0.0.0. ARP probes were not considered by the original ARP RFC.

-Does the target MAC address ever matter in requests? I gather Solaris uses ff:ff:ff:ff:ff:ff in its standard ARP requests and most other OSes use 00:00:00:00:00:00 instead. Is the use of the ff:ff:ff:ff:ff:ff MAC in the target address above significant in any way? Obviously having a destination address of ff:ff:ff:ff:ff:ff is critical.yes

RFC 3927, which is based on Gratuitous ARP, specifies 00:00:00:00:00:00 for the target MAC. However many simple TCP/IP stacks have an API which permits the specification of only one MAC value, and when the Ethernet Destination field is set to 'broadcast', the ARP target is also set 'broadcast'. Note: Normal ARP requests have the same value in the ARP Packet Target MAC address as in the Ethernet Destination field.

- How can we explain if the source Ethernet MAC address is different from sender's MAC address in a GARP packet? The ARP packet value is for the ARP machine, the Ethernet value is for the Ethernet machine. Originally, they were intended to be redundant information, targeted at different layers. It is possible to consider a hypothetical network appliance that routes ARP packets, where the source Ethernet MAC address changes as the packet is routed, but normally ARP packets are not routed.

REFERENCE
http://wiki.wireshark.org/Gratuitous_ARP

Friday, January 22, 2010

Distributed and redundant network file systems using GlusterFS on CentOS 5

SkyHi @ Friday, January 22, 2010
My notes on how to install GlusterFS on CentOS 5 servers. This example uses four Gluster servers (named: gluster1, gluster2, etc) in a 2 by 2 distributed-replicated setup (i.e. two copies of the data for redundancy, over two servers for twice the capacity of one). This work was actually carried out on a cluster of KVM virtual servers (if that makes a difference to you). This is not a definitive guide, but instead a note of what I did to get it working. The only undocumented problem I had is addressed in step three. The main source of the documentation is here.

1. Install the Glusterfs RPMs from http://ftp.gluster.com/pub/gluster/glusterfs/2.0/LATEST/CentOS/ (note that the machine you will be using this cluster from doesn’t need the third glusterfs-server package).

  • glusterfs-client-2.0.6-1.i386.rpm
  • glusterfs-common-2.0.6-1.i386.rpm
  • glusterfs-server-2.0.6-1.i386.rpm

2. Install the FUSE modules, via DKMS which is avaliable in the rpmforge repos:

  • rpm -ivh http://packages.sw.be/rpmforge-release/rpmforge-release-0.3.6-1.el5.rf.i386.rpm
  • yum install dkms-fuse

3. HACK: The DKMS system wasn’t working as it couldn’t find the kernel headers needed, so a quick symlink:

  • cd /usr/src/kernels && ln -s 2.6.18-128.7.1.el5-i686 2.6.18-128.el5-i686
  • /etc/init.d/dkms_autoinstaller start

4. Now start the gluster server configuration (remember to make /data/export). Put the gluster server config file in place /etc/glusterfs/glusterfsd.vol

# file: /etc/glusterfs/glusterfsd.vol
volume posix
type storage/posix
option directory /data/export
end-volume

volume locks
type features/locks
subvolumes posix
end-volume

volume brick
type performance/io-threads
option thread-count 8
subvolumes locks
end-volume

volume server
type protocol/server
option transport-type tcp
option auth.addr.brick.allow *
subvolumes brick
end-volume

6. Restart the glusterfs service with:

/etc/init.d/glusterfsd restart

7. Now on the client (i.e. which server you want to access this data from), add this configuration in /etc/glusterfs/glusterfs-client.vol and remember to change the option remote-host to reflect the server hostname or IP.

# file: /etc/glusterfs/glusterfs-client.vol<br />volume remote1<br />type protocol/client<br />option transport-type tcp<br />option remote-host gluster1<br />option remote-subvolume brick<br />end-volume<br /><br />volume remote2<br />type protocol/client<br />option transport-type tcp<br />option remote-host gluster2<br />option remote-subvolume brick<br />end-volume<br /><br />volume remote3<br />type protocol/client<br />option transport-type tcp<br />option remote-host gluster3<br />option remote-subvolume brick<br />end-volume<br /><br />volume remote4<br />type protocol/client<br />option transport-type tcp<br />option remote-host gluster4<br />option remote-subvolume brick<br />end-volume<br /><br />volume replicate1<br />type cluster/replicate<br />subvolumes remote1 remote2<br />end-volume<br /><br />volume replicate2<br />type cluster/replicate<br />subvolumes remote3 remote4<br />end-volume<br /><br />volume distribute<br />type cluster/distribute<br />subvolumes replicate1 replicate2<br />end-volume<br /><br />volume writebehind<br />type performance/write-behind<br />option window-size 1MB<br />subvolumes distribute<br />end-volume<br /><br />volume cache<br />type performance/io-cache<br />option cache-size 512MB<br />subvolumes writebehind<br />end-volume

8. Now on the client you can manually mount the Glusterfs share with:

glusterfs -f /etc/glusterfs/glusterfs.vol /mnt/glusterfs

or automatically in your fstab file with this line:

/etc/glusterfs/glusterfs.vol /mnt/glusterfs glusterfs defaults 0 0

9. Now there are various tests you can do to try it out. I copied a load of RPMs from a repo mirror into /mnt/glusterfs and then stopped networking on the gluster4 node. When I brought it online again later the moment I tried reading from the mount it re-synced the missing data to gluster4 and it was like nothing had happened – impressive!

REFERENCE

http://blog.coopersphotos.co.uk/technology/distributed-and-redundant-network-file-systems-using-glusterfs-on-centos-5



A HA Two-Node Server Side Cluster Using Glusterfs and CentOS

SkyHi @ Friday, January 22, 2010
I came across Glusterfs the other day. On the surface it seemed similar to DRBD but after closer examination I did realize it was completely different. After some reading I came to realize it may offer benefits over DRDB and GNBD and it seemed extremely straight forward to implement so I decided to test a two-node cluster and run Apache on it.
Network Setup

The logical network setup is basically a two-node cluster using the server side replication capability (as opposite as the client side replication). In this fashion the client(s) which will mount the exported mount points only need to worry about serving the data trough an application, Apache in this case.

The HA will be achieved by using round-robin DNS (RRDNS) from the client to servers, that is when the client issues a request to the servers it will do so using a FQDN for the cluster, if one of the server nodes is down it will switch to the other node. When the failed node comes back the self healing properties of Glusterfs will ensure that data is replicated from the remaining node.

The nodes will be configured as follow:
  1.     * Node 0: 10.0.0.1 (client)
  2.     * Node 1: 10.0.0.2
  3.     * Node 2: 10.0.0.3
  4.     * Node 1 and node 2 will have /ha exported
  5.     * The client will mount /ha
复制代码
For my example I only have 1 client but this setup is easily extendable to 2 or more clients as well as two or more servers. My setup is of course nothing new is based on several examples at the Glusterfs-Wiki, I took a couple of them and redid the configuration to suite my test needs. In particular you should look at;
    * High-availability storage using server-side AFR by Daniel Maher
    * Setting up AFR on two servers with server side replication by Brandon Lamb
Prep Work

Glusterfs has some dependencies:
   1. Infiniband support if you use an infiniband network
   2. Berkeley DB support
   3. Extended attributes support for the backend system (exported filesystem) ext3 in our case.
   4. FUSE is the primary requirement for Glusterfs.
I will be using CentOS 5.2, the first 3 requirements do come standard with it, Fuse does not so you need to install it.

Install FUSE

Fuse is not available from the standard repository so you need to get it from RPMFORGE.

Follow the instructions and install support for the repository and YUM. Make sure to also configure YUM to use the priorities plug-in. This makes sure that the standard repositories are used before the RPMFORGE repositories if you use automatic updates or if you want to install an update for a particular package without breaking anything on your system.

When you have the repository support installed issue at a command prompt the following:
  1. # yum -y install fuse fuse-devel
复制代码
The command above will install fuse and libraries plus any other packages needed by it.

Install the FUSE Kernel Module

Make sure you have the kernel-devel package for your kernel. At a prompt issue:
  1. # yum info kernel-devel
复制代码
You should see something like this:

Installed Packages
Name                 : kernel-devel
Arch                 : i686
Version         : 2.6.18
Release         : 92.el5
Size                 : 15 M
Repo                 : installed
Summary         : Development package for building kernel modules to match the kernel.
Description         : This package provides kernel headers and makefiles sufficient to build
                : modules against the kernel package.
If it says “installed” then you are ok otherwise you need to install it. Issue:
yum -y install kernel-devel-2.6.18-92.el5 kernel-headers-2.6.18-92.el5 dkms-fuse
The command above will install the kernel source and install the source for the Fuse kernel module.
  1. Change directories to /usr/src/fuse-2.7.4-1.nodist.rf and issue: ./configure; make install;
复制代码
This will install the fuse.ko kernel module. Finally do a chkconfig fuse on to enable fuse support at boot up. Start the fuse service: service fuse start.

Repeat the above procedure on all 3 nodes (the client and the 2 servers).

Install Glusterfs

Download the latest here (1.3.9). If you are using a 64 bit architecture get the corresponding RPMs otherwise you will have to compile the RPMs from the SRPM using the following command: rpmbuild –rebuild glusterfs-1.3.9-1.src.rpm, this will create the following in /usr/src/redhat/RPMS/i386:
    * glusterfs-1.3.9-1.i386.rpm
    * glusterfs-devel-1.3.9-1.i386.rpm
    * glusterfs-debuginfo-1.3.9-1.i386.rpm
Copy the files to the other nodes and install the RPMs with: rpm -ivh glusterfs*. Verify the installation was successful by issuing:
  1. # glusterfs --version
复制代码
No errors should be reported.

Round-Robin DNS

A key component of the HA setup is RRDNS. Though it is used only in one instance, it is a critical function - one which helps to ensure that the data can be served continuously even in the event that one of the storage servers becomes inaccessible.

Normally in a standard configuration a client will access the servers via their IP addresses. The major drawback if in this setup is that if the server becomes inaccessible the client will unable to access the data. This can be mitigated by using a hostname rather than addresses to access the servers.

Consider the following:
$ host node1.mycluster.com
node1.mycluster.com has address10.0.0.2
$ host node2.mycluster.com
node2.mycluster.com has address 10.0.0.3
$ host cluster.mycluster.com
cluster.mycluster.com has address 192.168.0.110
cluster.mycluster.com has address 192.168.0.111
$ dig cluster.mycluster.com | grep -A 2 "ANSWER SECTION"
;; ANSWER SECTION:
cluster.mycluster.com. 3600 IN A 10.0.0.2
cluster.mycluster.com. 3600 IN A 10.0.0.3
So you need to configure the zone file for the mycluster.com zone to serve the corresponding records for all the server nodes and for the FQDN of the cluster. Configuration and setup of such DNS zone is well documented on the Internet so it will be left as an exercise for the reader.
Configure Glusterfs

Now that all the prep work, RPMS and RRDNS is in place we are ready to configure the cluster. The main key in the setup is the “AFR translator”; this is the mechanism that ensures data (“subvolumes”) is replicated between servers.

The reader is encouraged to visit the Gluster-Wiki and go over the fundamentals of Glusterfs and in particular over the performance options used in this setup (readahead, writeback, cache-size, etc).

Node 1

The following is the configuration of node 1:
  1. [root@node1 ~]# more /etc/glusterfs/glusterfs-server.vol
  2. # Dataspce on Node1
  3. volume gfs-ds
  4. type storage/posix
  5. option directory /ha
  6. end-volume
  7. # posix locks
  8. volume gfs-ds-locks
  9. type features/posix-locks
  10. subvolumes gfs-ds
  11. end-volume
  12. # Dataspace on Node2
  13. volume gfs-node2-ds
  14. type protocol/client
  15. option transport-type tcp/client
  16. option remote-host 10.0.0.3 # IP address of node2
  17. option remote-subvolume gfs-ds-locks
  18. option transport-timeout 5 # value in seconds
  19. end-volume
  20. # automatic file replication translator for dataspace
  21. volume gfs-ds-afr
  22. type cluster/afr
  23. subvolumes gfs-ds-locks gfs-node2-ds
  24. end-volume
  25. # the actual exported volume
  26. volume gfs
  27. type performance/io-threads
  28. option thread-count 8
  29. option cache-size 64MB
  30. subvolumes gfs-ds-afr
  31. end-volume
  32. volume server
  33. type protocol/server
  34. option transport-type tcp/server
  35. subvolumes gfs
  36. option auth.ip.gfs-ds- locks.allow 10.0.0.*,127.0.0.1
  37. option auth.ip.gfs.allow 10.0.0.*,127.0.0.1
  38. end-volume
复制代码
Node 2

The configuration of node 2:
  1. [root@node2 ~]# more /etc/glusterfs/glusterfs-server.vol
  2. # Dataspace on Node2
  3. volume gfs-ds
  4. type storage/posix
  5. option directory /ha
  6. end-volume
  7. # posix locks
  8. volume gfs-ds-locks
  9. type features/posix-locks
  10. subvolumes gfs-ds
  11. end-volume
  12. # Dataspace on Node1
  13. volume gfs-node1-ds
  14. type protocol/client
  15. option transport-type tcp/client
  16. option remote-host 10.0.0.2 # IP address of server1
  17. option remote-subvolume gfs-ds-locks
  18. option transport-timeout 5 # value in seconds
  19. end-volume
  20. # automatic file replication translator for dataspace
  21. volume gfs-ds-afr
  22. type cluster/afr
  23. subvolumes gfs-ds-locks gfs-node1-ds
  24. end-volume
  25. # the actual exported volume
  26. volume gfs
  27. type performance/io-threads
  28. option thread-count 8
  29. option cache-size 64MB
  30. subvolumes gfs-ds-afr
  31. end-volume
  32. volume server
  33. type protocol/server
  34. option transport-type tcp/server
  35. subvolumes gfs
  36. option auth.ip.gfs-ds-locks.allow 10.0.0.*,127.0.0.1
  37. option auth.ip.gfs.allow 10.0.0.*,127.0.0.1
  38. end-volume
复制代码
Client

Finally the client configuration:
  1. [root@node0 ~]# more /etc/glusterfs/glusterfs-client.vol
  2. # the exported volume to mount
  3. volume cluster
  4. type protocol/client
  5. option transport-type tcp/client # For TCP/IP transport
  6. option remote-host cluster.yourdomain.com # FQDN of server
  7. option remote-subvolume gfs # Exported volmue
  8. option transport-timeout 10 # Value in seconds
  9. end-volume
  10. # performance block for cluster # optional!
  11. volume writeback
  12. type performance/write-behind
  13. option aggregate-size 131072
  14. subvolumes cluster
  15. end-volume
  16. # performance block for cluster # optional!
  17. volume readahead
  18. type performance/read-ahead
  19. option page-size 65536
  20. option page-count 16
  21. subvolumes writeback
  22. end-volume
复制代码
Start Gluster on Servers and Clients

On both servers make sure:
    * /ha exist
    * If /ha is a mount point make sure the file system has been created (in our case ext3)
    * Mount /ha on both servers
    * The configuration files exist
On the client make sure:
    * The fuse service is started and the kernel module is loaded.
    * The client configuration exist
Make sure the FDQN cluster.mydomain.com resolves to both addresses (10.0.0.2, 10.0.0.3). Finally make sure the clock is synchronized on all three nodes.

Start gluster on both servers:
  1. [root@node2 ~]# glusterfsd -f /etc/glusterfs/glusterfs-server.vol
  2. [root@node2 ~]# tail /var/log/glusterfsd.log
复制代码
Start Gluster on the client and mount /ha:
  1. [root@node0 ~]# glusterfs -f /etc/glusterfs/glusterfs-client.vol /ha
  2. [root@node0 ~]# cd /ha
  3. [root@node0 ~]# ls -l
复制代码
Test that things are working ok by creating a new file:
  1. [root@node0 ~]# cd /ha
  2. [root@node0 ~]# touch file.txt
  3. [root@node0 ~]# ls
  4. file.txt
复制代码
Now go to both servers and see if the file was created on both servers under /ha. If the file exists on both congratulations!
Configure Apache

The configuration of Apache is identical to the one we did for our two-node gnbd cluster. Copy all the files to the new root which will be /ha then on the client (node 0) change the configuration file to point to the correct root of the web server (i.e. /ha).

When done you should be able to go to the IP of the client via a web browser and get the Apache default page.
Testing Failover

To test failover simulate a failure of one of the nodes, for our test we will choose node 1. So issue:
  1. [root@node1 ~]# killall glusterfsd
复制代码
Wait a few seconds and then browse to the IP of node 0. You should get a response if you do congrats once more! You have a working two-node cluster.

Test recovery by restarting gluster on node 1, before you do touch a file on the client:
  1. [root@node0 ~]# echo "testing failure" > /ha/test.txt
复制代码
The file should appear on node 2 but node 1 should not have it. Start gluster on node 1 after it starts you will notice that the file “test.txt” does not appear in /ha on node 1 even though the cluster is once again up.

Self-healing will eventually synchronize the files from node 2 to node 1 when the files that are new are accessed on node 2, if you want to force self-healing to happen then use a script that access those files that have changed or simply do a:
  1. [root@node2 ~]# ls -lR
复制代码
The above will force self-healing, after you do this “test.txt” will appear on node 1.
Conclusions

The setup presented here will give you a two-node cluster using server-to-server replication and a client that will access the data. The setup can easily scale up to have 2 o more servers and 2 or more clients.

Of course in a production environment you should:

    * Used bonded interfaces on all nodes (servers and clients).
    * Used a dedicated network for cluster communication, see the article by Daniel Maher at the beginning of this post
    * Used 64 bit architecture servers and dedicated storage for each. This will improve performance.
    * Gluster recommends using client-server replication instead of server-server replication; however I believe there are advantages to using server-server replication and freeing the client from doing anything other than do what a client does just access data.
    * Release 4.0 now has added HA as a translator thus IP addresses can be used now also as elements for failover, with this new translator having an internal DNS to resolve FQDN for the cluster becomes a non-issue since you will be able to use the HA IP address of the cluster instead.
    * Use CARP or Heartbeat on the client side to give you additional HA on the client side.

Finally a setup like this gives you an inexpensive way of creating a cluster without the cost of a SAN for example. The Gluster-Wiki has some examples of people using it in production environments.

http://blog.miguelsarmiento.com/ ... lusterfs-and-centos

REFERENCE
http://www.thismail.org/bbs/viewthread.php?tid=3180