First Time Linux

Version control

What is version control and why would you want it? Basically it's a tool to store the whole history of files, so when you change the file, you can always go back and look at any of the previous versions. It also lets you compare the versions, see what's changed, and who changed it and why. It's an essential tool for developing software, to keep track of all changes and bug fixes, but it can be used for absolutely any kind of file at all - your web site, your shopping lists, your drawing masterpieces, spreadsheets, anything.

A big benefit of using such a system is that several people can collaborate on the same set of files - and the system will make sure that noone's changes are lost. It can also simultaneously maintain multiple so-called 'branches' of the files, either for different target systems or for different customers, for example. In this demonstration though, we'll just be exploring using a local repository on the same machine, with a single user and a single trunk. With this I'll be able to keep track of all the edits to this website, for example, and any other stuff I feel like putting in there.

Subversion

I chose subversion because of its similarities to cvs and for its excellent documentation. The install was relatively simple, using urpmi, although it did ask at install time whether I wanted the Berkley DB packets or the FSFS ones. I chose FSFS, the file system variant, to try to keep things as transparent as possible. Performance is not an issue, but stability and reliability is.

Note that for just running the client you only need the subversion packages, but because we want to create and manage our own local repository, we need the subversion-repos and subversion-repo-tools packages as well.

Creating a new repository

Firstly, create a new, empty repository:

$ svnadmin create --fs-type fsfs path/to/repository

Then we'll take some simple text files and import them into the repository. We'll use the directory "testfiles" made for this very purpose:

$ svn import testfiles file:///home/me/path/to/repository --message "First import"

Note that for the import command we have to specify a URL for the database (in our case, just a file:/// URL) rather than a relative file path. This shows the flexibility of the system, that URL could be anywhere and the import (and other commands later) still work in exactly the same way.

So now we've got our files in the repository, let's look and see what's there:

$ svnlook tree testrepos/
/
 file2.txt
 subdir1/
  subfile1.txt
 file1.txt

So here there are two files (file1.txt and file2.txt) in the root, and a third file subfile1.txt in a subdirectory subdir1. In a real, working repository, some thought should go in here as to the best directory structure, perhaps various projects in their own directories, and possibly trunk and branches directories (see documentation). But here we just want to test that it works and get it going.

Checking out

The files which were imported can stay where they are, until we've checked that everything is safely stored in the repository. But from now on we won't edit those files, we'll check out a new version of the whole file set into a new, controlled directory:

$ mkdir working
$ cd working
$ svn checkout file:///path/to/repository .

And now when we do an ls we can see all the files checked out of the repository:

$ ls -l
total 12
-rw-r--r--  1 me me   48 Oct 21 15:28 file1.txt
-rw-r--r--  1 me me   24 Oct 21 15:28 file2.txt
drwxr-xr-x  3 me me 4096 Oct 21 15:28 subdir1/

Interestingly the files are writeable, so even though I haven't reserved the files or said that I want to edit them, I can edit them immediately if I want. This is subversion's model of change control, which doesn't lock the files from other users. If someone else wants to edit the same file, they can do, and will only discover the conflict when they try to check in their changes. Also note that the subversion information is also in this directory, but not displayed because it's hidden. If you really want to see what they are:

$ ls -la
total 24
drwxr-xr-x   4 me me 4096 Oct 21 15:28 ./
drwx------  59 me me 4096 Oct 21 15:27 ../
-rw-r--r--   1 me me   48 Oct 21 15:28 file1.txt
-rw-r--r--   1 me me   24 Oct 21 15:28 file2.txt
drwxr-xr-x   3 me me 4096 Oct 21 15:28 subdir1/
drwxr-xr-x   7 me me 4096 Oct 21 15:28 .svn/
$ ls -la subdir1/
total 16
drwxr-xr-x  3 me me 4096 Oct 21 15:28 ./
drwxr-xr-x  4 me me 4096 Oct 21 15:28 ../
-rw-r--r--  1 me me   36 Oct 21 15:28 subfile1.txt
drwxr-xr-x  7 me me 4096 Oct 21 15:28 .svn/

so both these directories (and all of them, in fact) contain a hidden directory called .svn which holds all the information about which version has been checked out and when. Don't mess with this data, you could corrupt everything. But it's useful to know that it's there.

Editing the checked out copy

As mentioned above, the files are already writable, so I don't need to tell subversion which ones I want to edit. I just edit them. I edit file1.txt and add a new file, file3.txt. Then after I've made sure that my changes are correct, I ask subversion to check the status of my file tree:

$ svn status
?      file3.txt
M      file1.txt

So in this coded form it tells me that file3.txt is unknown (a new file not yet in the repository) and file1.txt has been modified. Not bad! Now let's see what the changes have been to file1.txt:

$ svn diff
Index: file1.txt
===================================================================
--- file1.txt   (revision 1)
+++ file1.txt   (working copy)
@@ -1,2 +1,3 @@
 This is my first file
-as it is when I import it
+as it was when I imported it
+but now I've added an extra line

This shows me that one line has been edited (from the line with a "-" to the line with a "+") and a third line has been added.

Checking the changes back in

We've seen from the status command that one file has not been added to the repository - let's add it:

$ svn add file3.txt
A         file3.txt

The file is now marked for addition the next time we commit. Let's commit our changes now, making revision number 2, and give a short message to explain what the changes are for:

$ svn commit --message "added and edited"
Sending        file1.txt
Adding         file3.txt
Transmitting file data ..
Committed revision 2.

And now the new changes have been sent to the repository - all the files (whether they've changed or not) are now at revision 2.

We can now see both versions of file1.txt, as follows:

$ svn cat -r 1 file1.txt
This is my first file
as it is when I import it
$ svn cat -r 2 file1.txt
This is my first file
as it was when I imported it
but now I've added an extra line

So by this method we can see both (or all) historical versions of this file, compare them with diff and list the reasons for each of the changes as well. It's an extremely powerful tool, and as mentioned it's not just for program code!

Checking what's changed

Now let's say you've committed your changes in, and now two weeks later you want to see which files changed. Easy. Just go to the root of your checked-out file tree, and see what the current version is:

$ svn status -u
Status against revision:   5

This tells us that the latest checked-in version number is 5 (and also that all files in the local tree are up-to-date). Now let's see a log for version number 5 (-v means verbose, and -r5 specifies release number 5):

$ svn log -v -r5

This lists all the files which changed as a result of version 5, and the comment with which these changes were committed. Obviously you can also see logs for any version number just by changing the parameter -r5.

Backups

There are two ways you might want to perform backups - firstly you may just want to save a copy of the latest version of everything, and secondly you may want to backup the whole repository so that all historical versions are faithfully preserved.

To just save the latest copy, the simplest thing to do is of course just to copy the working directory. But remember, in this working directory there are several, hidden, administrative files saved as well, so that subversion can keep track of the files. You certainly don't need to back those up as well. In this case it would be better to export a copy of the files to a separate directory, which will be uncontrolled by subversion. So don't edit them! To export the tree, create a directory in which to save everything, and from this directory do the export command:

$ svn export file:///path/to/repository .

This copies everything (but only those files under subversion control!) from the repository into the current directory (.).

In order to backup the whole repository, assuming you've got a local, file-based (FSFS) repository, just copy the directory tree.

Other clients

One big advantage of the open format and the server/client separation, is that you can use a variety of clients to access the database, not just the provided console interface. One very powerful alternative is to embed the controls directly into your IDE, for example using the Subclipse plugin for eclipse.

I came across some problems with this, as I tried to migrate to Subclipse for an already-existing database created from the command line. I'm not sure what its problems were, but eventually the whole database seized up and I had to start from scratch again. After creating the database from Subclipse, it seems happier. So far.

More info

The wikipedia has information on various Version control systems including Subversion. The official subversion home page is at subversion.tigris.org and the excellent documentation in various formats at svnbook.org.