First Time Linux


There are many tools for comparing files, many of them built in to programming environments and source control systems. Of course some tools are graphical and some just for the command line, but the aim is the same - to compare two (or three) files and show you what is different between them.

A patch takes this idea one step further, so that you can generate a small description of the changes. If you then apply that patch to the original file, the result is the file after making those changes, which is the second file.

This is a powerful way to communicate changes, because you don't have to send the whole huge file, you just need to send the little change. It gets especially convenient if there is a large set of files to be modified, and a single patch can describe multiple changes to multiple files.

In general, a tool like diff is used to create the descriptions, and a tool like patch is used to apply the change. The changes themselves may be called either patches or diffs.

Making a simple comparison

Firstly let's compare two versions of a simple file, using the gui tool "meld", which comes with gnome. You can do the same thing with other tools such as kdiff3.

This example uses a little text file, with some changes made to the file between versions. One line has been deleted, one modified, and one added, as you can see in the following screenshot:


Secondly, using diff on a pair of files.

$ diff firstfile secondfile
< Third line
< A line with a speling mitsake
> A line with a spelling mistake
> An extra line

This output is certainly not as easy to understand as the graphical output, but it contains the same information. The lines starting with "<" are only in the first file (and will be removed by applying the patch); the lines starting with ">" are only in the second file (and will be added by the patch). The modified line is represented by a removal and then an addition.

Making a patch

Basically the output from that diff command shown above is the patch. It describes all the changes necessary and can be used as input to the patch command. All we need to do is save the patch by redirecting the output of the diff command into a file.

$ diff firstfile secondfile > mypatch.txt

So now we've got a patch file called "mypatch.txt" which has the same contents shown above. For testing this out, we can apply the patch to a copy of the file, so we can keep the original too.

$ cp firstfile dummy.txt
$ patch dummy.txt mypatch.txt
patching file dummy.txt

Now, dummy.txt has been patched and its contents have been replaced including all the changes. So it should be the same as "secondfile".

And that's the basics of the patch. If someone has the "firstfile", and you've changed it to make "secondfile", you can just send this patch, the other person can apply the patch to their copy of the first file, and they'll get the same file as you have.

Patching multiple files

A common use for patches is for software projects, where changes need to be made to multiple files. It would be awkward to make a patch for each file, so patches have a way of specifying which files need to be patched and how. This uses the same principle as before, but employs different options to make a diff for a complete set of files instead of just for one file.

As an example we'll take the same files used in the rsync guide as follows:

Desktop file1.txt
This is the main version of the first file.
It's quite simple.
This is the second file.
This won't be modified by either side.
This one will be changed on the host but not on the laptop.
See, now the main copy has been changed on the desktop.
file4.txt file5.txt
This file only exists on the host, so it's like it's been deleted from the second set.
Laptop file1.txt
This is the second version of the first file.
It's now been edited, with some extra text added to it.
This is the second file.
This won't be modified by either side.
This one will be changed on the host but not on the laptop.
This file is new in the second set.

So now we've got two folders of files, and we can use the diff command as before to create a patch file. Except this time we diff the whole directories:

$ diff -uN a b > mypatch.txt

There are two new options here - "-u" makes a unified diff, which includes extra surrounding context as well (lines before and after each change). This makes the diff obviously a bit bigger, but makes it more reliable in case the file being patched isn't exactly the same as the one which was diffed. And it helps readability too, making it more obvious where the edit is made. The second is "-N" which lets it include the content of new files in the patch, rather than just observing that there's a new file.

Inside this new patch is a section for each file which was compared. The part for file1.txt looks like this:

diff -uN a b
diff -uN a/file1.txt b/file1.txt
--- a/file1.txt	2011-10-21 16:09:04.000000000 +0200
+++ b/file1.txt	2011-10-21 16:09:11.000000000 +0200
@@ -1,3 +1,3 @@
 This is the main version of the first file.
-It's quite simple.
+It starts off quite simple, but then has some extra text added to it.

So it generates a slightly different format but with the removed (-) and added(+) lines as before. And it contains the relative file paths a/file1.txt and b/file1.txt. This is important because it is this a/file1.txt which it will look for when applying the patch.

Interesting to note is that the patch contains nothing at all about file2.txt, it was found to be identical in both folders so wasn't even mentioned in the diff.

Applying a patch

So now imagine you've got your own folder of these files, and you've been sent this patch. Now you want to use the patch command to apply this patch to your set. It sounds straightforward, but the complication comes from the relative paths, because your directory isn't called "a".

If you're in your folder with all your files, you need to strip off one item from the paths in the patch file, so that "a/file1.txt" becomes "file1.txt" and then patch can find the file. This is especially important if there is a tree of directories involved, so that "a/somedir/file8.txt" becomes "somedir/file8.txt". The way this is done is using the -p parameter to patch, like this:

$ patch -p1 < multipatch.txt

Where -p1 specifies to strip 1 item from the path. The < multipatch.txt takes input from the patch file and patch will figure out which files should be patched. If it works, the output will tell you which files were successfully patched; if not then you will get a bunch of .rej files with the patch contents for each file. Most likely this is because it can't find the file to patch, and the answer is probably in the -p parameter.

Of course this is just scratching the surface of the diff and patch commands, there are lots of other options including making backups, merges, and reverse patches. For more information see the man pages.