Friday, January 22, 2010

Playing with DRBD (Replication)

SkyHi @ Friday, January 22, 2010
Over the past 4+ years I've worked towards moving the infrastructure at work from isolated physical systems to a centralized storage(SAN) and virtualized paradigm. I'm happy to say we are definitely a "Virtualize First" and SAN shop now. A conservative comparison of separate physical systems with local storage to the current SAN/virtualized environment shows a nearly 50% cost savings on equipment alone. The benefits of extra rack space, lower power and cooling costs, I/O and compute capacity, etc are nice as well.

This evolution has a next logical progression. Now operating in a very centralized fashion, I can (more easily) begin examining solutions for replication. The base goal was to replicate the SAN to mitigate a 100% loss of the primary server room and not require an enormous amount of time restoring from tape. Tapes are great snapshot points in time like photographs, but like photographs, trying to rebuild your entire set of memories by looking through photographs would be very time consuming (and photographs degrade). Replication is not a replacement for good backups.

From a previous post my SAN is linux based, much akin to an Openfiler solution. It provdes NFS storage to ESXi servers for VM images and iSCSI storage to the VM images that need data volumes. All physical drives (as present by the RAID controllers) are sliced up using LVM. It has been humming along without issue since installation in February 2009.

I have known of the existence of DRBD for several years but not been in a position to utilize it. In short DRBD is a block device you layer into your device chain, just like LVM. It's specialty is taking all the original block level changes, keep track of them, and send them over to another system where they can be duplicated. The DRBD website is excellent, I highly suggest spending a few minutes there. DRBD has a few very nice traits that I'd like to highlight. First off it is smart (and dumb?). It works at the block level and knows which bits may be out of sync and will only send those bits across the wire - it knows nothing of filesystems, files, etc. Secondly it can be non-destructively added to existing data volumes. There's no need to backup/install/restore. DRBD is opensource and freely available - but its creators and primary maintainers, Linbit, offer commercial support and have been around for awhile. Linbit also offers a closed source product, DRBD Proxy, that is designed for long haul, high latency(200ms) connections or greater than 2 node replication situations. If you want to replicate outside of a LAN using DRBD you'll need it. DRBD is also 'good friends' with Hearbeat for high availability / failover situations.

I setup a couple CentOS x64 based VM's for testing. DRBD is available via the standard CentOS repositories but it is naturally a bit behind the current version available directly from the DRBD website. The download is small and if you have a basic compiler toolchain, and kernel-devel package the build / install is quick and painless(make rpm). Did I already mention the DRBD website documentation is fantastic - really go read it. The required configuration to have DRBD work is quite minimum although there are lots of options to fine tune its operation. If your data's rate of change is very high, you will really want to have Gigabit connectivity between your nodes. What you'll find is your DRBD devices will only write about as fast the data can get across the wire (assuming your drives can outrun wirespeed). If you need more than wire speed and your drives are fast, take a look into the DRBD Proxy product. I spent a fair amount of time in different scenarios to see how DRBD would act and what to do as an admin in those situations. Like many things, with a little bit of time and reading, DRBD was easy to work with.

So what was I doing with all this again? The base goal was to replicate the SAN to mitigate a 100% loss of the server room. Since the SAN literally contains everything (VM's, SQL, Exchange databases, file shares) this was a fairly simple move that captures the entire datacenter to another system. To backpedal a bit, my environment is modest in size by any modern measure, but still just as important. That 'size', centralized storage, and a geographically large site made the option of placing the replica system in a local but 'distant' (fiber connected) building a perfect option. The replica runs ESXi with a CentOS VM running DRBD to replicate the data. Why ESXi on the host? What this more or less creates is my datacenter-in-a-box, transportable if needed. The CentOS VM will provide NFS access back to the host ESXi for access to all the server VM's which in turn will use iSCSI access their data. ESXi virtual switches let me create matching, non routed networks local to the replica host for the NFS and iSCSI traffic, meaning zero reconfiguration of the production server VM's. This isn't meant to be a powerhouse / failover solution. What it is, is a very cost effective solution to a worst case situation that hopefully doesn't occur. If the worst was to occur, some scripting magic transforms the replica to production status. When a new production environment is established, DRBD can be used to mirror the data back to it enabling a transition with very little downtime.