On the low adoption of automated testing in FOSS

A few times in the recent past I've been in the unfortunate position of using a prominent Free and Open Source Software (FOSS) program or library, and running into issues of such fundamental nature that made me wonder how those issues even made it into a release.

In all cases, the answer came quickly when I realized that, invariably, the project involved either didn't have a test suite, or, if it did have one, it was not adequately comprehensive.

I am using the term comprehensive in a very practical, non extreme way. I understand that it's often not feasible to test every possible scenario and interaction, but, at the very least, a decent test suite should ensure that under typical circumstances the code delivers all the functionality it promises to.

For projects of any value and significance, having such a comprehensive automated test suite is nowadays considered a standard software engineering practice. Why, then, don't we see more prominent FOSS projects employing this practice, or, when they do, why is it often employed poorly?

In this post I will highlight some of the reasons that I believe play a role in the low adoption of proper automated testing in FOSS projects, and argue why these reasons may be misguided. I will focus on topics that are especially relevant from a FOSS perspective, omitting considerations, which, although important, are not particular to FOSS.

My hope is that by shedding some light on this topic, more FOSS projects will consider employing an automated test suite.

As you can imagine, I am a strong proponent of automating testing, but this doesn't mean I consider it a silver bullet. I do believe, however, that it is an indispensable tool in the software engineering toolbox, which should only be forsaken after careful consideration.

1. Underestimating the cost of bugs

Most FOSS projects, at least those not supported by some commercial entity, don't come with any warranty; it's even stated in the various licenses! The lack of any formal obligations makes it relatively inexpensive, both in terms of time and money, to have the occasional bug in the codebase. This means that there are fewer incentives for the developer to spend extra resources to try to safeguard against bugs. When bugs come up, the developers can decide at their own leisure if and when to fix them and when to release the fixed version. Easy!

At first sight, this may seem like a reasonably pragmatic attitude to have. After all, if fixing bugs is so cheap, is it worth spending extra resources trying to prevent them?

Unfortunately, bugs are only cheap for the developer, not for the users who may depend on the project for important tasks. Users expect the code to work properly and can get frustrated or disappointed if this is not the case, regardless of whether there is any formal warranty. This is even more pronounced when security concerns are involved, for which the cost to users can be devastating.

Of course, lack of formal obligations doesn't mean that there is no driver for quality in FOSS projects. On the contrary, there is an exceptionally strong driver: professional pride. In FOSS projects the developers are in the spotlight and no (decent) developer wants to be associated with a low quality, bug infested codebase. It's just that, due to the mentality stated above, in many FOSS projects the trade-offs developers make seem to favor a reactive rather than proactive attitude.

2. Overtrusting code reviews

One of the development practices FOSS projects employ ardently is code reviews. Code reviews happen naturally in FOSS projects, even in small ones, since most contributors don't have commit access to the code repository and the original author has to approve any contributions. In larger projects there are often more structured procedures which involve sending patches to a mailing list or to a dedicated reviewing platform. Unfortunately, in some projects the trust on code reviews is so great, that other practices, like automated testing, are forsaken.

There is no question that code reviews are one of the best ways to maintain and improve the quality of a codebase. They can help ensure that code is designed properly, it is aligned with the overall architecture and furthers the long term goals of the project. They also help catch bugs, but only some of them, some of the time!

The main problem with code reviews is that we, the reviewers, are only human. We humans are great at creative thought, but we are also great at overlooking things, occasionally filling in the gaps with our own unicorns-and-rainbows inspired reality. Another reason is that we tend to focus more on the code changes at a local level, and less on how the code changes affect the system as a whole. This is not an inherent problem with the process itself but rather a limitation of humans performing the process. When a codebase gets large enough, it's difficult for our brains to keep all the possible states and code paths in mind and check them mentally, even in a codebase that is properly designed.

In theory, the problem of human limitations is offset by the open nature of the code. We even have the so called Linus's law which states that "given enough eyeballs, all bugs are shallow". Note the clever use of the indeterminate term "enough". How many are enough? How about the qualitative aspects of the "eyeballs"?

The reality is that most contributions to big, successful FOSS projects are reviewed on average by a couple of people. Some projects are better, most are worse, but in no case does being FOSS magically lead to a large number of reviewers tirelessly checking code contributions. This limit in the number of reviewers also limits the extent to which code reviews can stand as the only process to ensure quality.

3. It's not in the culture

In order to try out a development process in a project, developers first need to learn about it and be convinced that it will be beneficial. Although there are many resources, like books and articles, arguing in favor of automated tests, the main driver for trying new processes is still learning about them from more experienced developers when working on a project. In the FOSS world this also takes the form of studying what other projects, especially the high-profile ones, are doing.

Since comprehensive automated testing is not the norm in FOSS, this creates a negative network effect. Why should you bother doing automated tests if the high profile projects, which you consider to be role models, don't do it properly or at all?

Thankfully, the culture is beginning to shift, especially in projects using technologies in which automated testing is part of the culture of the technologies themselves. Unfortunately, many of the system-level and middleware FOSS projects are still living in the non automated test world.

4. Tests as an afterthought

Tests as an afterthought is not a situation particular to FOSS projects, but it is especially relevant to them since the way they spring and grow can disincentivize the early writing of tests.

Some FOSS projects start as small projects to scratch an itch, without any plans for significant growth or adoption, so the incentives to have tests at this stage are limited.

In addition, many projects, even the ones that start with more lofty adoption goals, follow a "release early, release often" mentality. This mentality has some benefits, but at the early stages also carries the risk of placing the focus exclusively on making the project as relevant to the public as possible, as quickly as possible. From such a perspective, spending the probably limited resources on tests instead of features seems like a bad use of developer time.

As the project grows and becomes more complex, however, more and more opportunities for bugs arise. At this point, some projects realize that adding a test suite would be beneficial for maintaining quality in the long term. Unfortunately, for many projects, it's already too late. The code by now has become test-unfriendly and significant effort is needed to change it.

The final effect is that many projects remain without an automated test suite, or, in the best case, with a poor one.

5. Missing CI infrastructure

Automated testing delivers the most value if it is combined with a CI service that runs the tests automatically for each commit or merge proposal. Until recently, access to such services was difficult to get for a reasonably low effort and cost. Developers either had to set up and host CI themselves, or pay for a commercial service, thus requiring resources which unsponsored FOSS projects were unlikely to be able to afford.

Nowadays, it's far easier to find and use free CI services, with most major code hosting platforms supporting them. Hopefully, with time, this reason will completely cease being a factor in the lack of automated testing adoption.

6. Not the hacker way

The FOSS movement originated from the hacker culture and still has strong ties to it. In the minds of some, the processes around software testing are too enterprise-y, too 9-to-5, perceived as completely contrary to the creative and playful nature of hacking.

My argument against this line of thought is that the hacker values technical excellence very highly, and, automated testing, as a tool that helps achieve such excellence, can not be inconsistent with the hacker way.

Some pseudo-hackers may also argue that their skills are so refined that their code doesn't require testing. When we are talking about a codebase of any significant size, I consider this attitude to be a sign of inexperience and immaturity rather than a testament of superior skills.


I hope this post will serve as a good starting point for a discussion about the reasons which discourage FOSS projects from adopting a comprehensive automated test suite. Identifying both valid concerns and misconceptions is the first step in convincing both fledging and mature FOSS projects to embrace automated testing, which will hopefully lead to an improvement in the overall quality of FOSS.

Bless Hex Editor 0.6.1

A long time ago, on a computer far, far away... well, actually, 14 years ago, on a computer that is still around somewhere in the basement, I wrote the first lines of source code for what would become the Bless hex editor.

For my initial experiments I used C++ with the gtkmm bindings, but C++ compilation times were so appallingly slow on my feeble computer, that I decided to give the relatively young Mono framework a try. The development experience was much better, so I continued with Mono and Gtk#. For revision control, I started out with tla (remember that?), but eventually settled on bzr.

Development continued at a steady pace until 2009, when life's responsibilities got in the way, and left me with little time to work on the project. A few attempts were made by other people to revive Bless after that, but, unfortunately, they also seem to have stagnated. The project had been inactive for almost 8 years when the gna.org hosting site closed down in 2017 and pulled the official Bless page and bzr repository with it into the abyss.

Despite the lack of development and maintenance, Bless remained surprisingly functional through the years. I, and many others it seems, have kept using it, and, naturally, a few bugs have been uncovered during this time.

I recently found some time to bring the project back to life, although, I should warn, this does not imply any intention to resume feature development on it. My free time is still scarce, so the best I can do is try to maintain it and accept contributions. The project's new official home is at https://github.com/afrantzis/bless.

To mark the start of this new era, I have released Bless 0.6.1, containing fixes for many of the major issues I could find reports for. Enjoy!

Important Note: There seems to be a bug in some versions of Mono that manifests as a crash when selecting bytes. The backtrace looks like:

free(): invalid pointer

  at <unknown> <0xffffffff>
  at (wrapper managed-to-native) GLib.SList.g_free (intptr) <0x0005f>
  at GLib.ListBase.Empty () <0x0013c>
  at GLib.ListBase.Dispose (bool) <0x0000f>
  at GLib.ListBase.Finalize () <0x0001d>
  at (wrapper runtime-invoke) object.runtime_invoke_virtual_void__this__ (object,intptr,intptr,intptr) <0x00068>

Searching for this backtrace you can find various reports of other Mono programs also affected by this bug. At the time of writing, the mono packages in Debian and Ubuntu (4.6.2) exhibit this problem. If you are affected, the solution is to update to a newer version of Mono, e.g., from https://www.mono-project.com/download/stable/.

git-c2b: An alternative workflow for Chromium's Gerrit

There are two main options to handle reviews in git. The first option is to treat commits as the unit of review. In this commit-based flow, authors work on a branch with multiple commits and submit them for review, either by pushing the branch or by creating a patch series for these commits. Typically, each commit is expected to be functional and to be reviewable independently.

Here is a feature branch in a commit-based flow, before and after changing D to D' with an interactive rebase (E and F are also changed by the rebase, to E' and F'):

A-B-C       [master]       A-B-C          [master] 
     \                          \                  
      D-E-F [feature]            D'-E'-F' [feature] or [feature-v2]

The second option is to treat branches as the unit of review. In this branch-based flow, authors work on multiple dependent branches and submit them for review by pushing them to the review system. The individual commits in each branch don't matter; only the final state of each branch is taken into account. Some review systems call this the "squash" mode.

Here are some dependent branches for a feature in a branch-based flow, before and after updating feature-1 by adding D', and then updating the other branches by merging (we could rebase, instead, if we don't care about retaining history):

A-B-C       [master]       A-B-C           [master]
     \                          \
      D     [feature-1]          D--D'     [feature-1]
       \                          \  \
        E   [feature-2]            E--E'   [feature-2]
         \                          \  \
          F [feature-3]              F--F' [feature-3]

Some people prefer to work this way, so they can update their submission without losing the history of each individual change (e.g., keep both D and D'). This reason is unconvincing, however, since one can easily preserve history in a commit-based flow, too, by checking out a different branch (e.g., 'feature-v2') to work on.

Personally, I find branch-based flows a pain to work with. Their main fault is the distracting and annoying user experience when dealing with multiple dependent changes. Setting up and maintaining the dependent branches during updates is far from straightforward. What would normally be a simple 'git rebase -i', now turns into a fight to create and maintain separate dependent branches. There are tools that can help (git-rebase-update), but they are no match for the simplicity and efficiency of rebasing interactively in a single branch.

Chromium previously used the Rietveld review system, which uses branches as its unit of review. Recently Chromium switched to Gerrit, but, instead of sticking with Gerrit's native commit-based flow, it adapted its tools to provide a branch-based flow similar to Rietveld's. Interacting with Chromium's review system is done mainly through the git-cl tool which evolved over the years to support both flows. At this point, however, the commit-based flow is essentially unsupported and broken for many uses cases. Here is what working on Chromium typically looks like:

# Create and work on first branch
$ git checkout -b feature -t origin/master
$ git commit -m 'Feature'
$ git commit -m 'Update to feature'
# Create and work on second (dependent) branch
$ git checkout -b feature-next -t feature
$ git commit -m 'Feature next'
$ git commit -m 'Update to feature next'
# Upload the changes for review
$ git checkout feature
$ git cl upload --dependencies

I wrote the git-c2b (commits-to-branches) tool to be able to maintain a commit-based git flow even when working with branch-based review systems, such as Chromium's Gerrit. The idea, and the tool itself, is simple but effective. It allows me to work as usual in a single branch, splitting changes into commits and amending them as I like. Just before submitting, I run git-c2b to produce separate dependent branches for each commit. If the branches already exist they are updated without losing any upstream metadata.

This is my current workflow with Chromium and git-c2b:

# Create patchset in branch
$ git checkout -b feature -t origin/master
$ git commit -m 'Change 1'
$ git commit -m 'Change 2'
# Use git-c2b to create branches feature-1, feature-2, ... for each commit
$ git c2b
# Upload the changes for review
$ git checkout feature-1
$ git cl upload --dependencies

To update the patches and dependent CLs:

$ git checkout feature
$ git rebase -i origin/master
# Use c2b to update the feature-1, feature-2, ... branches
$ git c2b
# Upload the changes for review
$ git checkout feature-1
$ git cl upload --dependencies

When changes start to get merged, I typically need to reupload only the commits that are left. For example, if the changes from the first two commits get merged, I will rebase on top of master, and the previously third commit will now be the first. You can tell git-c2b to start updating branches starting from a particular number using the -n flag:

# The first two changes got merged, get new master and rebase on top of it
$ git fetch
$ git checkout feature
$ git rebase -i origin/master
# At this point the first two commits will be gone, so tell c2b to update
# feature-3 from the first commit, feature-4 from the second and so on.
$ git c2b -n 3
# Upload the remaining changes for review
$ git checkout feature-3
$ git cl upload --dependencies

Although the main driver for implementing git-c2b was improving my Chromium workflow, there is nothing Chromium-specific about this tool. It can be used as a general solution to create dependent branches from commits in any branch. Enjoy!

More posts...