Friday, January 22, 2010

DRBD split brain recovery

SkyHi @ Friday, January 22, 2010

Manual split brain recovery

DRBD detects split brain at the time connectivity becomes available again and the peer nodes exchange the initial DRBD protocol handshake. If DRBD detects that both nodes are (or were at some point, while disconnected) in the primary role, it immediately tears down the replication connection. The tell-tale sign of this is a message like the following appearing in the system log:

Split-Brain detected, dropping connection!

After split brain has been detected, one node will always have the resource in a StandAlone connection state. The other might either also be in the StandAlone state (if both nodes detected the split brain simultaneously), or in WFConnection (if the peer tore down the connection before the other node had a chance to detect split brain).

At this point, unless you configured DRBD to automatically recover from split brain, you must manually intervene by selecting one node whose modifications will be discarded (this node is referred to as the split brain victim). This intervention is made with the following commands:

drbdadm secondary resource
drbdadm -- --discard-my-data connect resource

On the other node (the split brain survivor), if its connection state is also StandAlone, you would enter:

drbdadm connect resource

You may omit this step if the node is already in the WFConnection state; it will then reconnect automatically.

If the resource affected by the split brain is a stacked resource, use drbdadm --stacked instead of just drbdadm.

Upon connection, your split brain victim immediately changes its connection state to SyncTarget, and has its modifications overwritten by the remaining primary node.


The split brain victim is not subjected to a full device synchronization. Instead, it has its local modifications rolled back, and any modifications made on the split brain survivor propagate to the victim.

After re-synchronization has completed, the split brain is considered resolved and the two nodes form a fully consistent, redundant replicated storage system again.


Checking your Secondary’s integrity

Suppose you run into the following situation: you’ve suffered the effects of a nasty bug hidden somewhere deep inside your network stack. Say your NIC screwed up checksum offloading, or some network driver caused a kernel panic. Either way, you have reason to believe that some portion of the data you sent over that wire can’t be trusted. Suppose further that you ran DRBD over the affected network link.

How can you check the integrity of the replicated data without taking down your Primary and thus avoiding a service interruption? Here’s how.

With DRBD+, and DRBD post 8.2.3

With DRBD+ and DRBD in versions 8.2.3 and above, you simply issue drbdadm verify resource. No service interruption, no loss of redundancy in the process, no fiddling with the Secondary at all. See this post for details.

With DRBD 0.7

If you’re still on DRBD 0.7, your course of action is this:

  1. Log on to your current Secondary. Stop all Heartbeat services, this is to ensure that Heartbeat doesn’t fail over while you are conducting the tests to follow. Henceforth, I’ll refer to the node where Heartbeat is shut down, and where you’ll be doing the steps described here, as the offline node. The one where your service continues to run is, expectedly, the online node.
  2. Check /proc/drbd to make sure that all of your DRBDs are now in Secondary mode.
  3. Run drbdadm disconnect resource, where resource is the name of the DRBD resource whose integrity you are about to investigate.
  4. Check /proc/drbd again; that resource should now show cs:StandAlone as its connection state.
  5. Make the resource Primary. DRBD will allow this due to the resource’s disconnected state: drbdadm primary resource
  6. Now you can run your integrity check. Typically this would involve something like doing fsck /dev/drbdnum followed by fsck -f /dev/drbdnum to check the integrity of the file system configured on the device. You may also mount the device and run some application-specific tests.
  7. When you have completed checking, stop all applications using the resource. Unmount the device.
  8. Make your resource Secondary again on the offline node: drbdadm secondary resource
  9. Double-check /proc/drbd again, you absolutely want to make sure that the device is Secondary.
  10. Repeat. :-)
  11. Now you can connect the resource again: drbdadm connect resource
  12. DRBD will now recover from the split brain you deliberately created. You probably won’t notice except if you’re watching your kernel log, but that’s what happens behind the scenes. This behavior was changed for a reason in DRBD 8, by the way, see below for details.
  13. In /proc/drbd, you should now be able to observe that the device changes its connection state to SyncTarget, and that it is synchronizing with the online node again. Any changes you made on the offline node while it was Primary are discarded in the process, and any changes made to the online node during the connection interruption propagate.
  14. When everything is in sync, the connection status will change to Connected. Your online node will still be Primary, and the node had taken offline will be Secondary as before.

If your checks on the offline node detected any suspicious inconsistencies, you should now issue drbdadm invalidate resource on your Secondary, which will force a full sync of resource from the Primary (which has the reliable data). If you detected no inconsistencies, you have successfully validated your data and may retire for a coffee break.

Needless to say, while you are conducting this integrity check, your service is temporarily not redundant. You may need to inform your boss (or customer) of that fact and seek their approval. Temporarily not being redundant still beats the heck out of temporarily being out of service, I might add.

With DRBD 8 (pre-8.2.3)

In DRBD 8, automatic split brain recovery is disabled by default. When your DRBD cluster detects split brain, DRBD will disconnect. This is a deliberate discontinuity from DRBD 0.7 that many users asked for — after split brain, most people want manual control over the recovery. To manually restore connectivity after your deliberately-induced split brain, do:

  • on the offline node: drbdadm -- --discard-my-data connect resource
  • on the online node: drbdadm connect resource

This replaces step 12 above. After this, DRBD will reconnect and selectively copy those blocks that differ between both nodes, making sure the offline node’s data is consistent with the online node again.

If you really really really want to (hint: you don’t), you can emulate DRBD 0.7’s split brain recovery behavior. To do that, add the following lines to the net section in your resource configuration:

1net {
2    after-sb-0pri discard-younger-primary;
3    after-sb-1pri consensus;
4    after-sb-2pri disconnect;

Did I mention you don’t want to do this? Anyway, you don’t want to do this. You run into split brain, you want to know. And fix it manually, really.Other options are available; make sure you understand their implications. Check your drbd.conf man page for details.