Friday, November 12, 2010

Replicating Content Between USA, Japan (Asia) and UK (Europe) Webservers

SkyHi @ Friday, November 12, 2010
We have corporate intranet network for our web site as, (, ( How do I replicate static content stored at /var/www/corporate_lan/ such as javascript files, css files, and images between our USA, Japan and UK web servers running under UNIX or CentOS or Redhat Enterprise Linux based Apache servers?

There are various solutions exists to replicate static files and dynamic web site across the globe. Replicating set of static files is pretty easy.

Sample Setup

                                                |  |
 eth1:                                +-----------------+
+----------------------+                        |
| |------------------------+
+----------------------+       VPN/intranet     | eth1:
   /                                            +-----------------+
   /                                            |  |
   |                                            +-----------------+
   +/var/www/corporate_lan/                     |
                          /css/                 | eth1:
                          /images/              +-----------------+
                          /js/                  |  |
                          /php_cgi/             +-----------------+
  1. All server runs same version of UNIX or Linux and Apache.
  2. DocumentRoot is same for all servers.
  3. - Your main file server. You need to update or upload all static files here only. Do not upload or create files in other servers. You can now push updates or mirror directories from this server to rest of the nodes.
  4. - Your USA based web server. This server will sync to (or mirror directories from) upstream server called
  5. - Your Japan based web server. This server will sync to (or mirror directories from) upstream server called
  6. - Your UK web server. This server will sync to (or mirror directories from) upstream server called
  7. All offices are connected using secure vpn or an an intranet - a private computer network that uses Internet Protocol technologies to securely share any part of an organization's network operating system.

Solution # 1: Mirroring Using rsync

You can use rsync application to synchronizes files and directories from to another locations such as while minimizing data transfer using delta encoding when appropriate. rsync can copy directory contents and files using compression and recursion. You must install rsync on all servers. Type the following command on to replicate /var/www/corporate_lan/ to all three servers as follows:
rsync -av /var/www/corporate_lan
rsync -av /var/www/corporate_lan
rsync -av /var/www/corporate_lan
To replicate only /var/www/corporate_lan/css directory, enter:
rsync -av /var/www/corporate_lan/css
rsync -av /var/www/corporate_lan/css
rsync -av /var/www/corporate_lan/css

--delete option

You can delete files that don't exist on /var/www/corporate_lan using the following syntax. So if you type on
# rm /var/www/corporate_lan/images/new_logo.png
Remove all deleted files from the rest of the all servers i.e. keep exact mirror of, enter:
rsync -av --delete /var/www/corporate_lan
rsync -av --delete /var/www/corporate_lan
rsync -av --delete /var/www/corporate_lan
The -a option works as follows:
  • Recurse into directories
  • Copy symlinks as symlinks
  • Preserve all file permissions (so make sure you use same usernames on all servers)
  • Preserve group file permissions
  • Preserve owner file permissions (you need to run rsync as root)
  • Preserve times
You can compress file data during the transfer using -z or --compress option
rsync -z -av --delete /var/www/corporate_lan
The --compress-level=NUM with explicitly set compression level:
rsync -z --compress-level=5 -av --delete /var/www/corporate_lan

Excluding files

You can exclude files as follows:
rsync -z --compress-level=5 -av --delete --exclude='cache/*' --exclude='*~'  /var/www/corporate_lan
You can create a pattern file as follows (/root/mirror.exclude)
The --exclude-from=/root/mirror.exclude option read exclude patterns from /root/mirror.exclude:
rsync -z --compress-level=5 -av --delete --exclude-from=/root/mirror.exclude  /var/www/corporate_lan

Sample rsync server mirroring shell script

You can create a shell script (say /root/mirror.dirs) to sync every 30 minutes or as per your requirements to mirror the directories and files:
# Usage: Mirror directories and files to our US, UK and Japan based server.
# --------------------------------------------------------------------------
for e in $_servers
        [ -f "${_exclude}" ] && _opts="--exclude-from=$_exclude"
        $_rsync -z -a --delete $_opts  "$_upstream" "$e"
done &>$_log

Run once an hour using cron i.e. mirror server once an hour:
@hourly /root/mirror.dirs

How Do I Call /root/mirror.dir As Soon As New Static File Uploaded In /var/www/corporate_lan?

You can use the inotify cron daemon to monitors filesystem events and executes /root/mirror.dirs script:
/var/www/corporate_lan/css/ IN_CLOSE_WRITE,IN_CREATE,IN_DELETE /root/mirror.dirs
/var/www/corporate_lan/images/ IN_CLOSE_WRITE,IN_CREATE,IN_DELETE /root/mirror.dirs
/var/www/corporate_lan/js/ IN_CLOSE_WRITE,IN_CREATE,IN_DELETE /root/mirror.dirs
See how to configure and install inotify under Linux based systems.

Solution # 2: Mirroring Using unison

You synchronizing files between a server called and another server called while keeping the same version of files on multiple servers. Unison allows two replicas of a collection of files and directories to be stored on different servers, modified separately, and then brought up to date by propagating the changes in each replica to the other. In this example, your webmaster can upload new_logo.png to and it will get replicated to rest of all servers. Similarly if new_logo.png deleted from, it will get deleted from rest of all servers. You can use it as follows:
# unison -batch /var/www/corporate_lan ssh://
To just replicate /css/ part, enter
# unison -batch /var/www/corporate_lan/css ssh://

Sample unison server mirroring shell script

Create a shell script called /root/
_paths="/var/www/corporate_lan/css \
/var/www/corporate_lan/images \
for p in ${_paths}
 ${_unison} -batch "${p}"  "ssh://${_rserver}/${p}"

Run once an hour using cron i.e. mirror server once an hour:
@hourly /root/
As explained earlier, you can call this script on demand too using inotify cron daemon
/var/www/corporate_lan/css/ IN_CLOSE_WRITE,IN_CREATE,IN_DELETE /root/
/var/www/corporate_lan/images/ IN_CLOSE_WRITE,IN_CREATE,IN_DELETE /root/
/var/www/corporate_lan/js/ IN_CLOSE_WRITE,IN_CREATE,IN_DELETE /root/

Solution #3: Use 3rd Party Cloud Based Solution

You can use 3rd party cloud computing infrastructure (such as Amazon S3, and your data is automatically replicated across multiple Availability Zones) to sync all your data in three different data centers. The discussion regarding cloud computing is beyond the scope of this FAQ, I recommend reading AWS or similar 3rd party cloud services.

Solution # 4: Replicate Content Using 3rd Party Content Delivery Networks (CDN)

A content delivery network or content distribution network (CDN) is a system of computers containing copies of data, placed at various points in a network so as to maximize bandwidth for access to the data from clients throughout the network. A client accesses a copy of the data near to the client, as opposed to all clients accessing the same central server, so as to avoid bottleneck near that server. However, cdn may not offer speed as intranet is closer to user. If you got lots of home workers or remote clients or user are all spread around the globe it might be good idea use a cdn. Please note that you put your static files into someone else's network and if files are important (not for public view) do not host them using a cdn.

Solution #5: Content Replication Using Data Deduplication

Data deduplication is a specialized data compression technique for eliminating coarse-grained redundant data, typically to improve storage utilization. In the deduplication process, duplicate data is deleted, leaving only one copy of the data to be stored, along with references to the unique copy of data. Deduplication is able to reduce the required storage capacity since only the unique data is stored. You can use open source project such as opendedup and lessfs. Data Deduplication solutions has been designed as a filesystem for backup purposes. However it can be used as storage for virtual machine images and replicating your data too.

Data deduplication
Fig.01: Data deduplication using opendedup


  1. Almost all replication schema may requires low latency. So I recommend that you test all of the above methods and see what works out for you.
  2. Usually, open source tools are good when static files are not updated, uploaded and deleted at rapid rates (e.g. 1000 of files per second).
  3. I've also avoided discussion about commercial enterprise grade solution such as NFS over WAN using WAN accelerator such as Riverbed, HP EFS WAN accelerator and others due to cost issues.