DRBD vs rsync

Vito Leung
4 min readDec 29, 2020

Needed a solution to sync data for our NFS servers, thoughts behind going with top heavy DRBD or rsync?

Scenario

We had a farm of NFS servers in production which needed replication for two purposes: 1) failover and 2) backup.

The PROS and CONS

Pros for DRBD:

  1. The SYNCHRONOUS mode for syncing data just works. That is the magic of using DRBD where it needs to successfully write to all secondary servers and the primary server before a write is considered successful.
  2. Speed. Situated at the bottom of the I/O stack, DRBD’s write is blazing fast.
  3. DRBD comes with a set of user space admin tools to help setup and checking for status.

Cons for DRBD:

  1. Setup is a b*itch and I am just talking about between 2 servers. There’s a lot of file editing, checking names of partition name and other manual tasks involve where the ROI is not there to design a recipe or playbook which can automate the setup. And if the file system between the two servers are different, get ready to fdisk and/or dd. These are LARGE partitions so the time adds up. There is a step during the initial setup where the primary and secondary needs to sync at the block level before use. For a 7TB partition, it took 24+ hours.
  2. Failing over and back is a b*itch. For those of you who deal with NFS knows things can be a beast sometimes. Add that onto the need to mount, umount, setting or unsetting primary/secondary servers and verifying, I have yet to build confidence in a recipe or playbook where I am blindly execute.
  3. DRBD hides the partition on the destination servers and it is basically blind faith that data sync is really happening. When I first used DRBD this was really uncomfortable for me, and till this day, I am still uncomfortable not being able to access the file system. In my nightmares I find out DRBD replication wasn’t working during the one time when I really need to fail over.

Pros for rsync:

  1. By default it’s on every ‘nix system so no messy installation is needed. rsync has not changed in a long time so no need about feature and revision differences.
  2. For the Doubting Thomas like me, you can see the file system at the destination servers to inspect and do with you please at your finger tips. This is actually an important factor to consider for easier fail overs.
  3. Kind of alluding to it in the point above, failing over and back to use is simple because this is all done at the file system level. If a server is to fall out of sync or if there is a need to add another server into replication, a simple rsync command will catch things up.

Cons for rsync:

  1. Data will never be as up-to-dat as rsync is copying from an existing data source and DRBD synchronous mode is to write to all servers before deeming it successful commit.
  2. Need to write own checks for validation and monitoring as rsync is a does not provide any of that.
  3. Need to figure out the exact flags and options. This is not a big con as it’s a one time tax to be paid. I’ll share mine if it helps anyone.
rsync -qravzPSH --log-file=/var/log/rsync/rsync.log 10.19.1.128:/export/data/appone/projects/objects/ /export/data/appone/projects/objects --delete

In the end

Even though it’s not absolutely critical my system have the sync delta at the microsecond level, we sucked it up and used drbd for the main failover system to avoid needing to invent monitoring for rsync. I then have a second set of servers to rsync from the main server at 2 hours intervals. This is to cover my lack of confidence in an invisible file system using DRBD and allows me flexibility in taking the servers rsyncing out of rotation for use (eg testing).

The speed for rsync is ridiculously fast. My sync starts at the top level of a fairly deep nested folder structure and the folder size nearly 2G. My observation is that the rsync takes 6–7 mins here and there during the peak hours and mostly finish under 2 mins most of the time. And this is running at 2 hours interval. One way to optimize is to run rsync at shorter intervals and run many more rsync jobs at much lower levels in the directory tree. Though there will always be lag time when compared to synchronous mode.

--

--