Wednesday, August 19, 2009

Linux Backup: Hard Disk Clone with "dd"

SkyHi @ Wednesday, August 19, 2009
Most of Windows users may know "Norton Ghost". Norton Ghost is a backup software for hard disks. It can backup a whole hard disk or a partition to an image file. Also, Norton Ghost can copy all the contents from a hard disk to another exactly. However, Norton Ghost is a Windows software, users on other operating system (such as Linux) can not enjoy its powerful function. Fortunately, most of Unix/Linux operating system provides a command line whose function is similar to Norton Ghost, it is called "dd".

In fact, "dd" is much powerful than Norton Ghost. You can use many arguments to control it. In this short article, we only concern on how to backup a whole hard disk or a partition.
Hard Disk Clone

Suppose you have a 40GB hard disk and a removable hard disk whose capacity is 60GB, and you want to backup all the files from the hard disk to the removable disk. With "dd", it is a very easy task. Again, suppose your hard disk's Unix device name is /dev/sda and the removable disk is /dev/sdb. The following command can copy all the content from /dev/sda to /dev/sdb:

dd if=/dev/sda of=/dev/sdb

Here, if=... sets the source and of=... sets the destination. "dd" doesn't care of the contents of the hard disk. It just reads bytes from /dev/sda and writes them into /dev/sdb. It doesn't know what are files. So, the hard disk file system and how many partitions it has are not important. For example, if /dev/sda is splitted into three partitions, the /dev/sdb will have the same partitions. i.e. "destination" is completely same with "source".

Notice: to execute "dd" you should login as "root" or switch to "root" using "su" command. And you must be careful, a small mistake may cause a serious problem!
Making a Hard Disk Image File

Most of time you don't want to make a complete duplication of your hard disk. You may prefer to creating an image file of the hard disk and save it in other storage devices. The following command will create an image file "disk1.img" in your user's directory from /dev/sda:

dd if=/dev/sda of=~/disk1.img

Since you have created an image file, you can compress it with "gzip" or "bzip2":

gzip disk1.img #generates disk1.img.gz or

bzip2 disk1.img #generates disk1.img.bz2

You can save much storage space with compression. But it will take very long time.
Partition Clone

Backing up a hard disk partition is much similar to backing up a whole hard disk. The reason is that Unix/Linux uses device name, such as /dev/sda1, /dev/sda5... to indicate the partitions. For example, if you want to create an image file from the first partition of /dev/sda, use "dd" like this:

dd if=/dev/sda1 of=~/disk2.img

Also, you can compress the image file:

gzip disk2.img

By the way, you can copy a partition to another partition completely, just set "of" to the partition's device name. For example:

dd if=/dev/sda1 of=/dev/sdb5

This command will copy all the contents from /dev/sda1 to /dev/sdb5. You must be sure that the capacity of /dev/sdb5 is larger than /dev/sda1.
Restoring from an Image File

To restore a partition or a hard disk from an image file, just exchange the arguments "if" and "of". For example, restore the whole hard disk from the image file "disk1.img":

dd if=disk1.img of=/dev/sda

Restore the first partition of /dev/sda from the image file "disk2.img":

dd if=disk2.img of=/dev/sda1

There is one very serious
Submitted by Igor (not verified) on Tue, 07/21/2009 - 16:20.

There is one very serious problem with your text. You do not mention that the dd copying must be done on unmounted disks/partitions. If you clone your mounted partition and, during the copying process the system writes to it (which will almost certainly be the case due to various system activities), then you end up in a mess in the relevant blocks of your copy.

If you want to use dd, boot from a CD using a Linux distribution like Knoppix or boot from your distro installation disk and choose not install but rescue mode (some distros like Fedora provide this functionality) and then use dd to copy UNMOUNTED partitions.

To copy the contents of the mounted ones, use rsync instead.

I believe the USB cable was most likely the cause of the errors. I managed to clone the HDD over the network to another server and the image seems to be error free. If you are on server 1 and you want to make a backup clone of /dev/sda on the HDD of server 2 (IP address, the method for cloning I used was as follows (note I am running a live CD from server 1, so /dev/sda is not mounted and can be reliably cloned):
# cat /dev/sda | gzip -c | ssh "cat - > server.img.gz"
cat /dev/sda takes the entire /dev/sda device (the HDD you want to clone) and dumps it to STDOUT.
gzip -c takes STDIN, compresses it and pipes it to STDOUT
ssh opens a connection to the other server (you may be asked to enter the password for that server), and pipes STDIN to the command line there. The command line is: cat - > server.img.gz
cat - dumps the input to STDOUT, which came from server 1 thru ssh and dumps it into the file server.img.gz.
I haven't tried it yet, but to restore the image, something like this should be run:
# ssh "cat server.img.gz" | zcat >/dev/sda
Note that compression and decompression is occuring on the local server that houses /dev/sda that is being cloned. This means that compressed data is flowing over the network instead of uncompressed data reducing your network traffic.
Thirdly, if you want to check that the image is error free the following line should do it:
# ssh "cat server.img.gz" | zcat | cmp -l /dev/sda
The cmp command will return differences between its STDIN (which comes from the uncompressed data stream from the image on the other server) and the "file" /dev/sda. If there are no differences, it should return nothing. If you get 2 columns of data returned from the cmp -l, then those are the bytes that differ between the 2 data streams.
If you want something a bit more mentally reassuring than the cmp -l function (that returns nothing if it succeeds) you can use md5 to compare a checksum from the 2 respective streams:
# cat /dev/sda | md5
will give you a check-sum for the local /dev/sda
# ssh "cat server.img.gz" | zcat | md5
will give you a checksum for the image on the remote server.
If the 2 checksums match, then the 2 streams should be identical
Another point of interest is that I cloned a 500GB HDD, but the compressed image file is only <30GB. This indicates that most of the 500GB HDD is all zeroes and therefore compresses very well. The fact that the cmp worked for the network backup, but failed for the external HDD backup indicates that the most likely cause of the errors was the not-too-well shielded cable running from the server to the external HDD.

The syntax would be like

dd if=/dev/hda conv=sync,noerror bs=64k | gzip -c /path/to/output/file/hda.img.gz

replace the /dev/hda with the your HDD device. Note that you'll need to backup everything, including the partition table, so you need to do /dev/hda and not /dev/hda1, /dev/hda2 and so on.

2 other things to remember -

- the drive should be unmounted when you do dd. dd and gzip are typically on rescue disks so you can boot from the CD in recovery mode, unmount all partitions on the drive and create an image. Imaging the currently working drive would work too, but most likely you'll end up with slightly corrupt ext2/3 filesystem since your partition/node list will be imaged about 5 minutes prior to the actual files and your log files and other files that are written during imaging will be corrupt.

- obviously, the created image file needs to be on another drive.

the reverse procedure is

gzip -x /our/backup/file.img.gz | dd of=/dev/hda conv=sync,noerror bs=64K