Expert Eye

How to use Linux to
create system images – for free!

DISCLAIMER: This article is intended to be read by people who have some basic knowledge of Linux, and some rudimentary familiarity with using the command line interface on it.

Let me give you a warning before you start reading. I know this article is a bit long, and it can get a bit technical in places, but this is only because I opted to explain, with as much detail as possible how things work, so it can help people understand the process, and not follow it blindly, for their own benefit.

If you are just looking for the “simple” version, just glance over the article, and read the parts that contain the commands (lines starting with “#”), along with some brief explanations provided below said commands.

Many people pay for software like Norton Ghost, etc to take images/snapshots or even backups of their current PCs, so that they are able to recover later from them, in the event of a hard disk failure or accidental deletion, or even a migration of their current system to a newly purchased one. This method can even be used to recover data from a hard drive that has failed/is corrupt/has been deleted accidentally.

What is not a very well known fact, is that you can get all that functionality for free, using Linux.

The process that will be described below, is pretty much universal, and supports the vast majority of operating systems out there, as you can use it, even for windows machines, by using a linux live CD, without the need to actually install anything at all on your machine. You just run the system off, of the CD!

All you need in order to perform the above-mentioned tasks are:

  • A linux live CD of your preference. (Ubuntu seems to be quite popular)
  • A spare disk with enough capacity of identical, or larger capacity that the one you are trying to copy (USB works too, but, as you may expect, is a much slower process)

Well, the whole process is quite simple to begin with. What you need to do, is quite simply put the live CD in your optical drive (CDROM/DVD), and if needed be, reboot to tell your BIOS to boot from the CDROM (higher boot priority than the hard drives).

This is not a problem to leave on, for everyday use, as if there is no Bootable CD in your CDROM tray, the PC will boot from the hard drive as usual, so it’s something you, usually, only have to do once. The boot process can take a couple of minutes, as your entire system boots off, of the cdrom, so be a bit patient.

After the boot process is done, you will find yourselves using a GUI/Windows-like environment.

What you need to do at this stage, is to open a terminal (xterm/konsole/pick your favourite) and take a look at the way your hard drives have been detected by the operating system.

The command you issue to get a list of the partition, as well as the geometry of the drive (which will become apparent why is a useful thing later on) is:

# fdisk -l -u

What this command does, is give you a list of the partition table of the detected drives, and give the sizes in sectors instead of cylinders.

The output would look like this:

Disk /dev/sdb: 1199.9 GB, 1199981985792 bytes
255 heads, 63 sectors/track, 145889 cylinders, total 2343714816 sectors
Units = sectors of 1 * 512 = 512 bytes

Device Boot Start End Blocks Id System
/dev/sdb1 * 63 195318269 97659103+ 7 HPFS/NTFS
/dev/sdb2 195318270 195703829 192780 83 Linux
/dev/sdb3 203511420 2343706784 1070097682+ 83 Linux

On all systems, the first 512bytes of the disk are reserved for the Master Boot Record (a.k.a. MBR). What the MBR is, in simple words, is the place where the boot loader of the system resides, along with the partition table for the drive. This is all the information the system needs in order to boot up. The loader that boots the system, and a “map” of the partitions, so it knows where is what.

The /dev/sdb is the name the actual hard-drive is detected under. The numbers at the end, indicate the different partitions.

For those who care for the details, the MBR itself consists of three parts:

  1. The first 446 bytes contain the boot loader.
  2. The next 64 bytes contain the partition table (4 entries, each 16 bytes).
  3. The last 2 bytes contain the identifier (a hexadecimal number)

Now, in order to perform all our backups/images, we’ll be using a utility that in Linux is called “dd”.

After identifying the disk that you need to copy (easily identifiable by its size on the output of “fdisk -l -u”) we need to mount the drive we want to copy the original drive to (external/new drive).

This is just a complicated word for saying that we tell the operating system to “take the physical drive, and make it accessible to me, under X path, something similar to what daemon-tools does in Windows.

If the path we want to access the drive is, say, /mnt/sda1 (for disk sda, partition 1) we use the command below:

# mount -t vfat /dev/sda1 /mnt/sda1

If your external drive is using a different filesystem, this is not a problem, as you can provide a filesystem type for pretty much anything under the sun out there. A list follows, so adjust the option in the -t flag accordingly:

adfs, affs, autofs, coda, coherent, cramfs, devpts, efs, ext, ext2, ext3, hfs, hpfs, iso9660, jfs, minix, msdos, ncpfs, nfs, nfs4, ntfs, proc, qnx4, ramfs, reiserfs, romfs, smbfs, sysv, tmpfs, udf, ufs, umsdos, usbfs, vfat, xenix, xfs, xiafs

If you were to image an ext3 linux filesystem you would use “ext3”, if you were to use a windows XP filesystem, you would use “ntfs”, and so on.

Now to the interesting stuff. To backup your drive (assuming it was detected as /dev/sdb for the purposes of the exercise), you would issue the following command:

# dd if=/dev/sdb conv=sync,noerror bs=64k | gzip -c > /mnt/sda1/sdb.img.gz

To understand how it works, you need to know that in UNIX systems, all Input/Output operations are serialized. All devices, all interfaces, everything is working in a way where it gets a stream of data passing through. Imagine a sink where one device is the tap, with data flowing though like water would.

  • So what this command does, in simple words, it to tell “dd” to use /dev/sdb (our original drive) as the input source, and make a bit-by-bit copy of it. The “|” character is what is called a “pipe” in UNIX systems, and it means “use the output of the previous command, and pass it in as input to the next. So we get that stream of data from the original drive, and feed to gzip, which is a compression utility, to make the image smaller.
  • This is a good idea to do, because “dd” works at the bit level, so it won’t actually just copy the files over, but it will create an exact replica of the original hard drive, empty space, deleted files and all. By compressing the output, you save quite a lot on space.
  • After passing it through the compression utility, we then redirect the output of gzip (i.e. the compressed hard drive image stream) to an output file ( > ), which will be “/mnt/sda1/sdb.img.gz”.
  • The “sync” option means that we want to use synchronized I/O for the data and metadata.
  • The “noerror” option, means “continue to copy even after read errors”. This is useful because hard-drives can have bad sectors on them, that they then relocate to other healthy ones, a process that is invisible in everyday use. So if dd encounters an error, it will make sure to “pad” the space it can’t read, in a way that it makes sense to the system, while keeping all the sizes/data continuity correct (i.e. fill in the “gaps” it encounters).
  • The “bs=64k” option, tells dd to use a 64kilobyte block size. Using such a “large” block size, helps speed up the copying process. It is important to remember to use the same value when you are recovering your image. The block sizes on “taking the backup” and recovering with it should match.

To restore your system, you would issue a command similar to this:

# gunzip -cd /mnt/sda1/sdb.img.gz | dd of=/dev/sdb conv=sync,noerror bs=64k

  • Here in effect we are reversing the process, telling the compression utility to decompress (-d) the original compressed image, then pass the output, to “dd” as input, and in turn ask “dd” to write the actual decompressed raw image (output)to the device /dev/sdb (replace accordingly if you do this on a different computer, depending on what the new device has been recognized as by fdisk -l).

Now, as you know, a hard drive can contain a number of partitions, not just one. What happens if you just want to copy just one partition, out of the entire image?

Remember when I explained above, about “fdisk -l -u”, where with the -u option we get the output in sectors? This is where it comes handy now.

This is again, the output, for clarity, same as above:

Disk /dev/sdb: 1199.9 GB, 1199981985792 bytes
255 heads, 63 sectors/track, 145889 cylinders, total 2343714816 sectors
Units = sectors of 1 * 512 = 512 bytes

Device Boot Start End Blocks Id System
/dev/sdb1 * 63 195318269 97659103+ 7 HPFS/NTFS
/dev/sdb2 195318270 195703829 192780 83 Linux
/dev/sdb3 203511420 2343706784 1070097682+ 83 Linux

We can just decompress the original image of the entire hard-drive, and because it’s a bit-to-bit copy, it is exactly identical to the original hard-drive data.

So, for the example copy-paste of the command I gave above, if we were only interested in getting the second partition recovered (/dev/sdb2), we would perform a command, on the actual image we took to “trim” the part that interests us, in this case the /dev/sdb2 partition like so:

# dd if=/path/to/the/decompressed/sdb.img of=/tmp/sdb2.img bs=512 skip=195318270 count=$[195703829-195318270]

Now, I know this looks complicated, but it is not. Let me explain the command, taken apart, so you understand better:

  • We ask “dd” to take the image of the hard drive we took previously as input, instead of the actual hard-drive (assuming we don’t have it anymore, as backup recovery can be necessary at inconvenient times), and output what we “trim” (i.e. the partition we care about), to /tmp/sdb2.img, which is a new image on itself.
  • Because, as you can see on the output of fdisk above, the physical size of the sectors is 512 bytes, and therefore, it is exactly the same on the mirror image we have, we use the “bs=512” option, to tell “dd” to stick to that (as we are dealing with sectors here, so we need to keep the “units” the same, to keep the calculations simple.
  • We then tell it to skip/ignore the first 195318270 sectors (which is where the partition we want to recover starts from, as we don’t care about what precedes it), and perform the operation for the number of sectors that is equal to the length of the partition in question.
  • In this case where the partition “ends” minus where it “starts” (which gives us its length). The Linux shell is actually smart enough to handle the calculation on it’s own, if you provide the command in the format mentioned above, instead of us having to crack the calculator out!

If you know in advance that you only want to get an image of a specific partition, you only care about, the command is a much, much simpler:

# dd if=/dev/sdb2 of=mnt/sda1/sdb2.img

Obviously, to recover, you just invert the paths of “if=” and “of=”.

The more complicated method is primarily explained here because:

  1. It is relatively hard to find the information, especially explained and
  2. It fits the more generic needs of people, as it is always a good idea to backup everything, and then have the option not to recover it all, than realizing you need something you didn’t backup, after the fact.

Now, for the more advanced Linux users out there, we are going to discuss performing similar operation on the Master Boot Record (MBR), which most people will probably won’t need to do.

DISCLAIMER: This is only intended for people who know what they are doing, as fiddling with the MBR can destroy your partition table, and you can lose your data.

Remember how I explained how the MBR is divided into 3 logical parts? I include the explanation, same as provided above for your convenience:

For those who care for the details, the MBR itself consists of three parts:

  1. The first 446 bytes contain the boot loader.
  2. The next 64 bytes contain the partition table (4 entries, each 16 bytes).
  3. The last 2 bytes contain the identifier (a hexadecimal number)

To copy the entire MBR the command would be:

# dd if=/dev/sdb of=/mnt/sda1/mbr.img bs=512 count=1

  • This stores the first 512 bytes of the disk (contianing the MBR and the primary partition info – i.e. the first four primary entries) into the file “mbr.img”.
  • In simpler words, What we are effectively telling “dd”, here, to do, is “take an image of /dev/sdb, output it on /mnt/sda1/mbr.img, use a byte size of 512 bytes (the size of the MBR), and perform the operation for 1 512-byte-size cycle (count=1)”. We effectively tell it to take an image of the disk, in 512byte chunks, and stop after it has done the first chunk.

To copy the boot loader only, one must use “bs=446”. To copy the partition table only, one must use “bs=1 skip=446 count=64”.

Similarly, if you only want to restore the actual MBR code and not the primary partition table entries, just restore the first 448 bytes of the MBR:

# dd of=/dev/sdb if=/mnt/sda1/mbr.img bs=448 count=1.

I believe, it is quite obvious here that using such commands, and operating at the sector/block level on hard drives is not for everyone, but the flexibility the command line interface gives you, is much greater than any GUI-based windows utility can possibly give you.

You can “trim” data any way you like it and even use the above methods to recover data from a damaged hard drive. In the worst case scenario you will lose some data, but if you are lucky, you will be able to recover a very large chunk of it.

As a result, “dd” is a very powerful and valuable tool, for the person who understands how it works, and is a bit familiar with the command line interface.


Most Popular

To Top