There are various solutions exists to replicate static files and dynamic web site across the globe. Replicating set of static files is pretty easy.
Sample Setup
eth1:67.1.2.3 +-----------------+ | us.example.com | eth1:202.54.1.2 +-----------------+ +----------------------+ | | content.example.com |------------------------+ +----------------------+ VPN/intranet | eth1:87.1.2.3 / +-----------------+ / | uk.example.com | | +-----------------+ +/var/www/corporate_lan/ | /css/ | eth1:123.1.2.3 /images/ +-----------------+ /js/ | jp.example.com | /php_cgi/ +-----------------+ /perl_cgi/ /java_app/ /python_app1/Where,
- All server runs same version of UNIX or Linux and Apache.
- DocumentRoot is same for all servers.
- content.example.com - Your main file server. You need to update or upload all static files here only. Do not upload or create files in other servers. You can now push updates or mirror directories from this server to rest of the nodes.
- us.example.com - Your USA based web server. This server will sync to (or mirror directories from) upstream server called content.example.com.
- jp.example.com - Your Japan based web server. This server will sync to (or mirror directories from) upstream server called content.example.com.
- uk.example.com - Your UK web server. This server will sync to (or mirror directories from) upstream server called content.example.com.
- All offices are connected using secure vpn or an an intranet - a private computer network that uses Internet Protocol technologies to securely share any part of an organization's network operating system.
Solution # 1: Mirroring Using rsync
You can use rsync application to synchronizes files and directories from content.example.com to another locations such as us.example.com while minimizing data transfer using delta encoding when appropriate. rsync can copy directory contents and files using compression and recursion. You must install rsync on all servers. Type the following command on content.example.com to replicate /var/www/corporate_lan/ to all three servers as follows:rsync -av /var/www/corporate_lan root@us.example.com:/var/www/ rsync -av /var/www/corporate_lan root@uk.example.com:/var/www/ rsync -av /var/www/corporate_lan root@jp.example.com:/var/www/To replicate only /var/www/corporate_lan/css directory, enter:
rsync -av /var/www/corporate_lan/css root@us.example.com:/var/www/ rsync -av /var/www/corporate_lan/css root@uk.example.com:/var/www/ rsync -av /var/www/corporate_lan/css root@jp.example.com:/var/www/
--delete option
You can delete files that don't exist on /var/www/corporate_lan using the following syntax. So if you type on content.example.com:# rm /var/www/corporate_lan/images/new_logo.png
Remove all deleted files from the rest of the all servers i.e. keep exact mirror of content.example.com, enter:
rsync -av --delete /var/www/corporate_lan root@us.example.com:/var/www/ rsync -av --delete /var/www/corporate_lan root@uk.example.com:/var/www/ rsync -av --delete /var/www/corporate_lan root@jp.example.com:/var/www/The -a option works as follows:
- Recurse into directories
- Copy symlinks as symlinks
- Preserve all file permissions (so make sure you use same usernames on all servers)
- Preserve group file permissions
- Preserve owner file permissions (you need to run rsync as root)
- Preserve times
rsync -z -av --delete /var/www/corporate_lan root@us.example.com:/var/www/The --compress-level=NUM with explicitly set compression level:
rsync -z --compress-level=5 -av --delete /var/www/corporate_lan root@us.example.com:/var/www/
Excluding files
You can exclude files as follows:rsync -z --compress-level=5 -av --delete --exclude='cache/*' --exclude='*~' /var/www/corporate_lan root@us.example.com:/var/www/You can create a pattern file as follows (/root/mirror.exclude)
cache/* /dev/ /.conf/ *~The --exclude-from=/root/mirror.exclude option read exclude patterns from /root/mirror.exclude:
rsync -z --compress-level=5 -av --delete --exclude-from=/root/mirror.exclude /var/www/corporate_lan root@us.example.com:/var/www/
Sample rsync server mirroring shell script
You can create a shell script (say /root/mirror.dirs) to sync every 30 minutes or as per your requirements to mirror the directories and files:#!/bin/bash # Usage: Mirror directories and files to our US, UK and Japan based server. # -------------------------------------------------------------------------- _upstream="/var/www/corporate_lan" _servers="root@us.example.com:/var/www/ root@uk.example.com:/var/www/ root@jp.example.com:/var/www/" _rsync="/usr/bin/rsync" _exclude="/root/mirror.exclude" _log="/var/log/rsync_mirror.log" _opts="" for e in $_servers do [ -f "${_exclude}" ] && _opts="--exclude-from=$_exclude" $_rsync -z -a --delete $_opts "$_upstream" "$e" done &>$_log
Run once an hour using cron i.e. mirror server once an hour:
@hourly /root/mirror.dirs
How Do I Call /root/mirror.dir As Soon As New Static File Uploaded In /var/www/corporate_lan?
You can use the inotify cron daemon to monitors filesystem events and executes /root/mirror.dirs script:/var/www/corporate_lan/css/ IN_CLOSE_WRITE,IN_CREATE,IN_DELETE /root/mirror.dirs /var/www/corporate_lan/images/ IN_CLOSE_WRITE,IN_CREATE,IN_DELETE /root/mirror.dirs /var/www/corporate_lan/js/ IN_CLOSE_WRITE,IN_CREATE,IN_DELETE /root/mirror.dirsSee how to configure and install inotify under Linux based systems.
Solution # 2: Mirroring Using unison
You synchronizing files between a server called content.example.com and another server called us.example.com while keeping the same version of files on multiple servers. Unison allows two replicas of a collection of files and directories to be stored on different servers, modified separately, and then brought up to date by propagating the changes in each replica to the other. In this example, your webmaster can upload new_logo.png to us.example.com and it will get replicated to rest of all servers. Similarly if new_logo.png deleted from content.example.com, it will get deleted from rest of all servers. You can use it as follows:# unison -batch /var/www/corporate_lan ssh://us.example.com//var/www/corporate_lan
To just replicate /css/ part, enter
# unison -batch /var/www/corporate_lan/css ssh://us.example.com//var/www/corporate_lan/css
Sample unison server mirroring shell script
Create a shell script called /root/unison.mirror.sh:#!/bin/bash _paths="/var/www/corporate_lan/css \ /var/www/corporate_lan/images \ /var/www/corporate_lan/js" _unison=/usr/bin/unison _rserver="us.example.com uk.example.com jp.example.com" for p in ${_paths} do ${_unison} -batch "${p}" "ssh://${_rserver}/${p}" done
Run once an hour using cron i.e. mirror server once an hour:
@hourly /root/unison.mirror.sh
As explained earlier, you can call this script on demand too using inotify cron daemon
/var/www/corporate_lan/css/ IN_CLOSE_WRITE,IN_CREATE,IN_DELETE /root/unison.mirror.sh /var/www/corporate_lan/images/ IN_CLOSE_WRITE,IN_CREATE,IN_DELETE /root/unison.mirror.sh /var/www/corporate_lan/js/ IN_CLOSE_WRITE,IN_CREATE,IN_DELETE /root/unison.mirror.sh
Solution #3: Use 3rd Party Cloud Based Solution
You can use 3rd party cloud computing infrastructure (such as Amazon S3, and your data is automatically replicated across multiple Availability Zones) to sync all your data in three different data centers. The discussion regarding cloud computing is beyond the scope of this FAQ, I recommend reading AWS or similar 3rd party cloud services.Solution # 4: Replicate Content Using 3rd Party Content Delivery Networks (CDN)
A content delivery network or content distribution network (CDN) is a system of computers containing copies of data, placed at various points in a network so as to maximize bandwidth for access to the data from clients throughout the network. A client accesses a copy of the data near to the client, as opposed to all clients accessing the same central server, so as to avoid bottleneck near that server. However, cdn may not offer speed as intranet is closer to user. If you got lots of home workers or remote clients or user are all spread around the globe it might be good idea use a cdn. Please note that you put your static files into someone else's network and if files are important (not for public view) do not host them using a cdn.Solution #5: Content Replication Using Data Deduplication
Data deduplication is a specialized data compression technique for eliminating coarse-grained redundant data, typically to improve storage utilization. In the deduplication process, duplicate data is deleted, leaving only one copy of the data to be stored, along with references to the unique copy of data. Deduplication is able to reduce the required storage capacity since only the unique data is stored. You can use open source project such as opendedup and lessfs. Data Deduplication solutions has been designed as a filesystem for backup purposes. However it can be used as storage for virtual machine images and replicating your data too.Conclusion
- Almost all replication schema may requires low latency. So I recommend that you test all of the above methods and see what works out for you.
- Usually, open source tools are good when static files are not updated, uploaded and deleted at rapid rates (e.g. 1000 of files per second).
- I've also avoided discussion about commercial enterprise grade solution such as NFS over WAN using WAN accelerator such as Riverbed, HP EFS WAN accelerator and others due to cost issues.
REFERENCES
http://www.cyberciti.biz/faq/linux-unix-server-replicating-content-us-europe-asia-webservers/