First Time Linux

Contributing to Open Source projects

It's quite amazing how much software and other media (literature, music, documentation, art) the open source movement has produced and made available to everyone. It's unfathomable how many person-hours of hard work has gone into these efforts, and inconceivable why so much labour would be done in people's free time and simply donated without any reward whatsoever.

It is of course possible to take all the fruits of these mammoth efforts, use them without obligation, enjoy the benefits and never be committed to do anything or pay anything in return - all the resources are freely available. However, those who do feel grateful for the benefits they've obtained from the community, and would like to contribute something back in return, can do so.

Contributing back can take many forms - for example correcting or enhancing a piece of work, submitting error reports or suggestions, translating something into an additional language, or even creating new works and making them available. It doesn't have to involve programming, it can be as simple as pointing out a spelling mistake or answering a question posted on a forum.

And why would someone do that? It's an interesting question, and different people have different motives. Some want their contribution to be recognised and have a mention by name, others are happy to know that they've given something back, however small. Others enjoy the feeling of being part of a wider community that has produced this amazing world of stuff. It's fascinating how little encouragement or prodding goes on to try to persuade people to contribute - often people complain of the opposite, that they've tried to help but been prevented, for whatever reason. The amount of volunteered time and effort available to be harnessed is phenomenal, and will presumably only grow as the open source movement gathers pace and attention.

Anyway, as one of those who have greatly benefited from the available work (as witnessed by the system being used to write this page), I have also made a few small gestures back to the community, in the hope that someone somewhere could benefit from them. One of them is this website, and hopefully others in similar boats exploring Linux have benefited from not having to repeat some of my mistakes and investigations. Some of the others are listed here, with varying degrees of success.

Project Gutenberg

My first experience of free and open source products was project Gutenberg (promo.net/pg). This isn't software, it's a massive collection of famous literary (and not so literary) works, including books, plays, poems and essays. These are works which either weren't copyrighted or their copyrights have lapsed, so they fall into the public domain and can be accessed by anyone. The people behind this project work hard to obtain, scan, convert and make available these texts at no charge. Brilliant service! As well as being cheaper than buying electronic books, these have the great advantage that they're in raw, accessible text format (or sometimes HTML) and so you can search it, copy it, reformat it, print it, read it, quote it, and in fact do what you want with it. You get much more flexibility than with some binary, undecipherable format.

Something that these texts have, however, is the occasional mistake. Either the text recognition wasn't quite right, or something got mangled somehow, and you get the odd typo here and there. With a paper book, or an encrypted binary book, you're more or less stuck with these errors, but with a text file you can just correct it. And, true to the spirit of free availability, I submitted my observations back to the team responsible. What do you know, the mistakes got corrected and a new version went online! This is a small example of the feedback that users of free stuff can provide, to actively improve the stuff on offer.

Step forward a couple of years and the texts of Cory Doctorow (craphound.com) catch my attention. He's taken the approach one step further, as well as providing the texts of his books for free online (and selling the paper versions), he makes a wiki available to give feedback. This makes it even easier to provide comments and within hours my notes had been reviewed and the corrections put immediately online.

Result: Easy to make corrections, easy to submit back to the creators and get the changes made.

Garble

As mentioned elsewhere on this site, I had been trying out a program called garble (sourceforge.net/projects/garble) to read the data from my gps and archive it on my computer. Except there were a few things not quite right, like the altitude measurements for the waypoints and track points were missing, and the screenshot function didn't work for me. After failing to contact the author, I dug around the code a little bit and after some experimentation I fixed the things which had been bothering me and got it doing what I wanted. Next step of course is to supply my changes back to the author, but further attempts to contact both him and the maintainer of GpsDrive (another application which repackages garble as part of its distribution) also failed. Trickier than I thought, this free software model.

Eventually I decided to publish my changes here as a new version of the code, and the results (as well as the archived previous versions from various websites) can be found at the garble page.

Result: Made the modifications but impossible to submit back to the author. Published changed version here instead.

Freedict

I'd been using dictd as a local dictionary server, to allow me to look up foreign words even without an internet connection. Unfortunately I noticed some mistakes in the German<->English dictionaries (obtained from freedict.de) and again, attempts to contact the originator of these files at freedict.de weren't successful. An example of the problems is that of the German verb "gehen" - with the standard files the translation into English is "to go {went" and the lookup of "to go" fails - only the lookup of "to go went" finds the corresponding entry "gehen". This is apparently a mistake by the parsing program which automatically generates these dictionary files. So with help from the word file from ding (url) I set about trying to correct these problems. After writing some perl to parse the ding file and create separate English->German and German->English files, I got the utilities dictfmt and dictzip to create the files for use by dictd. Now I've got the dictionary files built, but I still can't contact the freedict team so these results remain unpublished.

I got the rpms for the Spanish<->English dictionaries too, and added them to the dictd server, but for some reason each entry is included four times in the rpm. This means that every time a word is looked up, the answer is returned four times! I managed to extract the text file and edit it down to a single copy of the text, but again I have no way of submitting the changes back to the team who made the rpms. I checked the files at freedict and they are correct, (although still limited in vocabulary) so the duplication must have happened in the packaging to rpm.

Result: Made the modifications but impossible to submit back to the author. I will probably publish the German dictionaries here at a later date.

Filelight

As mentioned elsewhere on this site (where?), Filelight (methylblue.com/filelight) is a useful, intuitive and good-looking application to draw hierarchical pie charts of disk usage. Point it at a directory on your file system, and it will show you instantly which files or directories are taking up the most space. What I wanted to do is an extension of that, to use the same presentation engine to display any kind of data rather than just file sizes. I sent a mail to the author describing what I imagined:

I have a feature request for filelight, I hope you'll consider adding it to a future release. What I have in mind is that filelight would be able to construct an identical-looking display but for something other than file sizes. In essence this could be done by reading a "cache file" like kdirstat does, and constructing the tree from the contents of that file instead of from scanning the file system for file sizes.

The benefit of this would be that I could write a cache file based on any properties of a tree of files, not necessarily just the file size. I'm thinking of web statistics here, so I could display the number of hits across all the html pages in a tree, grouping by subdirectories and sub-subdirectories to see which parts of the tree are most popular (instead of which parts of the tree take the most space). An additional benefit (as intended by kdirstat) would be that a large (and relatively static) file tree could be scanned overnight to produce the cache file, and then filelight could operate much faster based on the cached file contents rather than scanning the tree at runtime.

The way I could see this being added to filelight is by additional command line options. With no parameters, or with a directory as a parameter, it would operate exactly as it does today. With a _file_ as a parameter, it could read the contents of this file, parsing each line of the file into file location (including relative path from the tree root) and file size. The data model which is currently built from scanning the file system would then be constructed from the contents of this file instead. Everything else, the display, the tooltips, the colours and the navigation would be exactly the same as at present.

If you determine the format of the cache file (and you can probably use kdirstat as a guide I would imagine), then I could write a similar cache file based on web hits instead of file sizes and I would get an immediate, interactive, beautiful multi-layer diagram showing exactly what's being viewed. And I would guess that there are many other applications of this kind of view if they could write a similar cache file. Maybe a source control system could write a file showing the number or sizes of edits to a file tree - plug the file into filelight and see exactly where the volatile areas of the tree are, for example. Or maybe I could generate a cache file also based on file sizes, but only considering files of a certain type, or only including files changed in the last month, or whatever. Once filelight is capable of reading the cache file, then the possibilities for generating that file can be invented at whim.

I hope you'll consider this idea, (if you haven't considered it already!) and hope you manage to implement it in a future release of filelight.

Having received no reply from the author, I tackled the task of modifying the code myself, and got it working as far as reading the file and displaying the data in the same way as for file sizes. It already provides a useful view of website statistics and even allows drilldown to see which pages within a directory are the most popular. Unfortunately as the author is still not contactable, there's no way to give my changes back to be included in future releases.

Result: Made the modifications but impossible to submit back to the author. Still a couple of bugs remaining so not publishable yet.

Digikam

I was running version 0.7.2 of Digikam and found it a great application for uploading, managing and viewing photos with a sophisticated tagging system. I upgraded to 0.8.1 using the source tarball on their site in the hope of improved searching functions and maybe other new features too. I managed to compile it successfully, but noted that there were several irritating spelling mistakes in the texts. Delving into the code myself, I figured out where the texts were stored (in a separate English language file), and noted many more spelling and grammar mistakes in there. I contacted their developers' mailing list to see how I could best submit my observations to them, but there was much confusion about who was responsible as the translations are done by a KDE team. I submitted a sed script anyway which would do a global search/replace on the project files (and required just a few tens of bytes instead of a 400kb diff file), but this was rejected by the team because they decided they wanted a diff against the current cvs version of the code instead of the released 0.8.1 code. So I went back and got the cvs code, ran my script and posted the resulting diff back to the team (and again because the format wasn't quite right). It turned out that many of my corrections had already been made the previous week, which was a lesson. All in all the developers seemed quite reluctant to accept the changes, and didn't acknowledge them once submitted. It appears they use other languages themselves, so English mistakes are perhaps lower priority.

Some eight months after submitting the spelling corrections, I noted that even the very latest CVS code still has the same mistakes in it, so I again contacted the developers. They expressed surprise that the changes had not been made, but even after prompting with the mail archives link, they still completely ignored the issue and it will probably remain so.

Result: Spelling mistakes eventually (after some effort) apparently accepted by the development team but several months later they have still been completely ignored, without comment or feedback. The latest release still has the mistakes in it, so I guess if the developers don't make a change themselves, they never accept it.

Kate

The standard text editor which comes with KDE is a great editor called Kate, which I'm using now to write this page, but there are a few annoying bugs with the search and replace function. First of all any find doesn't work at all if you're in "block mode" (Ctrl-Shift-B) but it doesn't give an error message, it just says that your term wasn't found. Very confusing, and irritating. Secondly, any regular expression search for the end of a line doesn't match, however it's entered. The "Edit" function for the regular expression offers both newline (\n) and Carriage Return (\r) as options, but neither match anything. Another bug is the "Whole words only" option of the find tool, which does not work at all. I tried reporting these bugs to bugs.kde.org but was foiled at the first hurdle - the system requires entering the KDE version affected by the bug, and my 2005 version of Mandriva has an old KDE which is no longer even an option in the bug tool. So we can assume that it's now so far out of date that any bug reports are useless. It also seems that upgrading Kate requires upgrading the whole KDE, and that would be a whole world of hassle without upgrading to Mandriva 2006, which would come with its own problems. So I'm stuck with this Kate and can't report the bugs or upgrade to a newer version.

This came as a big disappointment to me because I always assumed that the applications were independent of the operating system, and having to upgrade to a new operating system just in order to get a newer version of a text editor was quite laughable to me.

Result: Impossible to make changes, impossible to report bugs, impossible to upgrade to a newer release.

Wikipedia

A stunning example of collaborative, open-source community effort, Wikipedia (en.wikipedia.org) contains an astonishing amount of information on a bewildering variety of subjects. There are lots of useful, accurate information, and of course also useless and/or inaccurate information too. The good thing is that anyone can help the effort to improve this free resource, so if you spot a mistake, whether it's a simple typo or a factual error, or if you want to add some more information or links, you can. There's a powerful versioning system behind it, with compare functions, so all the old edits are stored for posterity, comparison, and if necessary reversion. I've gleaned lots of useful and entertaining information from there, so I contribute a meagre amount back with corrections, additions and the occasional graphic. It's completely without reward or feedback but for some strange reason, nevertheless rewarding...

Result: Modifications made available instantaneously, and the changes actually stick rather than being instantly reverted.

Openstreetmap

Another stunning volunteer effort, openstreetmap.org has the lofty goal of mapping the world, for free. As well as out-of-copyright maps and donated satellite images, they have small armies of volunteers with GPS receivers who trace out all the roads and footpaths and cycleways. In some places these free, volunteer maps are superior to the closed-source ones like Google Maps or Yahoo maps.

There are also other specialised applications of this project specifically for cycle maps, and even piste maps. And the whole thing is editable just like Wikipedia, although here registration is necessary. The tools are a bit clunky but they really work and you get to see your edits in the real maps after just a short delay.

Result: Modifications made available after about a week, and they really work.

GpsPrune

The main contribution until now (apart from this website, of course!) is the release of a nifty GPS utility software (called GpsPrune) under the GPL. And this is a chance to see things from the other perspective, with users offering suggestions, asking questions, and yes even helping to contribute. So even though the software is free, the fact that other people offer to help with the project is very rewarding!

Result: Software has been released for free and is still undergoing development. Users are contributing ideas and feedback, and language translations, and the download figures are growing with each release.

Summary

Some contributions are a lot easier to make than other ones, and it doesn't always depend on the complexity of the job. It also depends a lot on the activity being done on the project, so that proposed changes can be integrated easily. With the example of Kate, the level of activity was far too high and the integration farr too complex; with the example of Garble the activity was non-existent.

The Digikam example showed that in order to make the integration easier for the developers, it's important to provide patches in the desired format against the current development source. This places a large onus on the volunteer contributor, not just to submit observations on the released product, but to figure out how to obtain the (potentially unstable) latest source, figure out how to compile it (thereby having to remove their stable version), obtain all the dependencies, prepare and send the appropriate diff file, and so on. This can present a substantial barrier to casual contributions, which is unfortunately often necessary to prevent wasting of the developers' time. Not fully appreciating this, I wasted my own time by correcting mistakes in the released version which had already been corrected one week earlier in the latest, unreleased code.

In short, if you want to contribute you should find something you're interested in, and before you dive too deep into it just check out how easy it is to bring something to the project. Or maybe even start your own project, you can start small and watch it grow!