There are many tools for comparing files, many of them built in to programming environments and source control systems. Of course some tools are graphical and some just for the command line, but the aim is the same - to compare two (or three) files and show you what is different between them.
A patch takes this idea one step further, so that you can generate a small description of the changes. If you then apply that patch to the original file, the result is the file after making those changes, which is the second file.
This is a powerful way to communicate changes, because you don't have to send the whole huge file, you just need to send the little change. It gets especially convenient if there is a large set of files to be modified, and a single patch can describe multiple changes to multiple files.
In general, a tool like diff
is used to create the descriptions, and a tool like patch
is used to apply the change. The changes themselves may be called either patches or diffs.
Firstly let's compare two versions of a simple file, using the gui tool "meld", which comes with gnome. You can do the same thing with other tools such as kdiff3.
This example uses a little text file, with some changes made to the file between versions. One line has been deleted, one modified, and one added, as you can see in the following screenshot:
Secondly, using diff
on a pair of files.
$ diff firstfile secondfile 3d2 < Third line 5c4 < A line with a speling mitsake --- > A line with a spelling mistake 7a7 > An extra line
This output is certainly not as easy to understand as the graphical output, but it contains the same information. The lines starting with "<" are only in the first file (and will be removed by applying the patch); the lines starting with ">" are only in the second file (and will be added by the patch). The modified line is represented by a removal and then an addition.
Basically the output from that diff command shown above is the patch. It describes all the changes necessary and can be used as input to the patch
command. All we need to do is save the patch by redirecting the output of the diff
command into a file.
$ diff firstfile secondfile > mypatch.txt
So now we've got a patch file called "mypatch.txt" which has the same contents shown above. For testing this out, we can apply the patch to a copy of the file, so we can keep the original too.
$ cp firstfile dummy.txt $ patch dummy.txt mypatch.txt patching file dummy.txt
Now, dummy.txt has been patched and its contents have been replaced including all the changes. So it should be the same as "secondfile".
And that's the basics of the patch. If someone has the "firstfile", and you've changed it to make "secondfile", you can just send this patch, the other person can apply the patch to their copy of the first file, and they'll get the same file as you have.
A common use for patches is for software projects, where changes need to be made to multiple files. It would be awkward to make a patch for each file, so patches have a way of specifying which files need to be patched and how. This uses the same principle as before, but employs different options to make a diff for a complete set of files instead of just for one file.
As an example we'll take the same files used in the rsync guide as follows:
Desktop | file1.txtThis is the main version of the first file. |
file2.txtThis is the second file. |
file3.txtThis one will be changed on the host but not on the laptop. |
file5.txtThis file only exists on the host, so it's like it's been deleted from the second set. | |
---|---|---|---|---|---|
Laptop | file1.txtThis is the second version of the first file. |
file2.txtThis is the second file. |
file3.txtThis one will be changed on the host but not on the laptop. |
file4.txtThis file is new in the second set. |
So now we've got two folders of files, and we can use the diff
command as before to create a patch file. Except this time we diff the whole directories:
$ diff -uN a b > mypatch.txt
There are two new options here - "-u
" makes a unified diff, which includes extra surrounding context as well (lines before and after each change). This makes the diff obviously a bit bigger, but makes it more reliable in case the file being patched isn't exactly the same as the one which was diffed. And it helps readability too, making it more obvious where the edit is made. The second is "-N
" which lets it include the content of new files in the patch, rather than just observing that there's a new file.
Inside this new patch is a section for each file which was compared. The part for file1.txt looks like this:
diff -uN a b diff -uN a/file1.txt b/file1.txt --- a/file1.txt 2011-10-21 16:09:04.000000000 +0200 +++ b/file1.txt 2011-10-21 16:09:11.000000000 +0200 @@ -1,3 +1,3 @@ This is the main version of the first file. -It's quite simple. +It starts off quite simple, but then has some extra text added to it.
So it generates a slightly different format but with the removed (-) and added(+) lines as before. And it contains the relative file paths a/file1.txt and b/file1.txt. This is important because it is this a/file1.txt which it will look for when applying the patch.
Interesting to note is that the patch contains nothing at all about file2.txt, it was found to be identical in both folders so wasn't even mentioned in the diff.
So now imagine you've got your own folder of these files, and you've been sent this patch. Now you want to use the patch
command to apply this patch to your set. It sounds straightforward, but the complication comes from the relative paths, because your directory isn't called "a".
If you're in your folder with all your files, you need to strip off one item from the paths in the patch file, so that "a/file1.txt" becomes "file1.txt" and then patch can find the file. This is especially important if there is a tree of directories involved, so that "a/somedir/file8.txt" becomes "somedir/file8.txt". The way this is done is using the -p
parameter to patch, like this:
$ patch -p1 < multipatch.txt
Where -p1
specifies to strip 1 item from the path. The < multipatch.txt
takes input from the patch file and patch
will figure out which files should be patched. If it works, the output will tell you which files were successfully patched; if not then you will get a bunch of .rej
files with the patch contents for each file. Most likely this is because it can't find the file to patch, and the answer is probably in the -p
parameter.
Of course this is just scratching the surface of the diff
and patch
commands, there are lots of other options including making backups, merges, and reverse patches. For more information see the man
pages.