DejaGNU

Sorry for the long delay. Anyhow, I wrote that so that I could write this.

DejaGNU is the test harness used by gcc, gdb, the GNU binutils, and probably other programs as well. Frankly, it’s a disaster. The documentation is weak, the implementation is complex and confusing, it’s slow, it does not support running tests in parallel, it’s hard to use. It has exactly two things in its favor, and they are powerful. The first is that it mostly works. The second is that people have written many different board support packages which let it test cross-compilers on simulators and real hardware.

DejaGNU was initially written at Cygnus by Rob Savoye as a way to test gdb. I don’t recall if there was any gdb testsuite prior to DejaGNU, but if there was it was largely useless. Because gdb was a command-line program, the idea for DejaGNU was to write a test harness which could run gdb, send it commands, and examine the resulting output. That was the first mistake. It meant that all the gdb tests were required to look for syntactic details of the output which were irrelevant to the test. Tests for gdb revolve around making sure that gdb stops in the right place and can print local variables and do backtraces and so forth. That is a lot of output which DejaGNU matches using regexps. I think it would have been smarter to put the effort into adding a test harness internally to gdb itself, so that a program could query gdb’s state. This could have evolved into the MI output format which would up getting added to gdb anyhow. Or it could have evolved into a library interface for gdb, something which would have been very useful for IDEs and other purposes and still does not really exist.

Anyhow, once the decision was made to test gdb as a pure command line program, Rob looked for a program which could do that. He came across expect. The tag line of the expect paper from 1990 is “Curing Those Uncontrollable Fits of Interaction.” The expect program does a nice job of that: if you have a program that you can only interact with manually, expect lets you write a program to interact with it instead. So expect is a nice choice when you need to work with a program you don’t control. In our case, we did control gdb; choosing expect was a hack to save time modifying gdb, a hack we are still paying for nearly 20 years later.

Expect uses an embedded Tcl interpreter, so expect programs are Tcl programs. This is a good use of Tcl: it means that expect has a full programming language for writing scripts. Since interacting with other programs is all about strings, it’s perfectly reasonable to use Tcl, which is also all about strings. The consequence for DejaGNU, though, is that DejaGNU is written in Tcl.

Once Cygnus was using DejaGNU as a test harness for gdb, it seemed natural to use it as a test harness for gcc as well. But of course gcc is not an interactive program, so the advantages of using expect no longer applied. The disadvantages of Tcl remained intact.

Cygnus specialized in cross-compilers, so DejaGNU grew the ability to build programs for target boards and run them there, using various different communication mechanisms. None of this had much to do with expect or Tcl, but it was all written in Tcl because that was the mechanism available. All this support is the main reason it is difficult to move away from DejaGNU today.

At least on a native system, it’s natural to want to run tests in parallel. That’s only become more important over the years. Unfortunately, Tcl doesn’t support threads (there is a thread extension available these days, but it is written in such a way that it would have to be integrated into expect before DejaGNU could use it). It’s easy enough to write Tcl code to start gcc a bunch of times, but it’s much harder to write Tcl code to examine those results. The gcc testsuite does now run in parallel, but it does so by manually creating subsets of tests and invoking DejaGNU multiple times in parallel to run those subsets. This works but is hardly optimal.

As a highly dynamic interpreted language, Tcl is relatively slow. The expect program is quite clever and sets up pseudo terminals in order to properly interact with general programs, and effort that is useless when testing a simple program like gcc. A significant amount of the CPU time taken by a gcc testsuite run is for expect, time which is largely wasted.

The DejaGNU code is complex and hard to read. This is not entirely the fault of Tcl, but Tcl is partly to blame as DejaGNU struggles with namespace issues. Function and variable names are constructed at runtime to avoid namespace collisions, which makes it very hard to figure out what code will run. It plays games like having tests return the name of the function to run to report whether the test succeed (e.g., return “pass” to invoke the function named pass), which sounds almost clever until you realize that there is nothing which prevents you from returning an invalid value.

I could continue with more specific horrors from DejaGNU, but those are in principle fixable. I hope that the earlier points show that DejaGNU itself is broken by design. We need to move away from it.

Unfortunately DejaGNU’s large knowledge base of how to run programs on embedded systems, a knowledge base which is largely represented in hand-written Tcl code, is very hard to get around. Some of these scripts can be automatically translated into a better test harness, in that they simply set flags for various tools and set a communication mechanism. Many others will require hand conversion.

Unfortunately DejaGNU has more or less blighted the world of free test harnesses. There is CodeSourcery’s qmtest program, but I don’t know how widely that is used. Fortunately, test harness need not be particularly complex. I don’t think it would be that hard for a thoughtful person to replace DejaGNU for gcc testing, and I think the benefits would be manifold. Replacing it for gdb testing would be harder, as the gdb tests rely more on string matching. As I mentioned above that is in itself a bug, but it means that recreating the tests is hard.

There are various test scripts which are built around DejaGNU’s log files, basically attempts to parse human readable information. Those will have to change for any new test harness.

Although I don’t have a good alternative, I hope I have at least demonstrated that DejaGNU must go. Effort put into working with DejaGNU is effort wasted.

1 Comment »

  1. Joel Brobecker said,

    April 27, 2011 @ 8:01 am

    Interesting history about dejaGNU!

    We actually reached similar conclusions at AdaCore, and the only thing that prevents me from seriously suggesting that we move away from dejaGNU is the fact that we’d have so many testcases to convert. If it was to be done by hand, it would be one gigantic and mostly uninteresting task.

    As for testing GDB internally at AdaCore, we made the decision very early on to not rely on dejaGNU. Another one of the drawbacks of dejaGNU is that it doesn’t always work well out of the box on what I call the “exotic platforms” such as AIX, Tru64 or HP-UX. Any piece of code that deals with pseudo terminals is bound to have portability issues… I’ve managed to make it work, at some point in the past, but that took a lot of effort.

    For GDB, we have a python-based testsuite that allows us to write testcases in Python. We’re half-way between what you suggest, and what we currently do with dejaGNU. The testsuite provides an infrastructure that allows us to build programs, insert breakpoints, etc, without having to worry about the actual syntax of commands and their output. But we also do a fair bit of output matching. We do not use regexps in the expected output, however, except embedded inside the expected output when absolutely necessary. Regexp are powerful, but very hard to read.

    All in all, it’s been working really well for us, and we’ve been able to adapt our testsuite to things such as changes of output fairly easily.

    We do internal-inspection testing as well, but on GPS, our IDE. It’s based on using the Python interpreter that is also available to the users. Eventually, as the Python support in GDB matures, I suspect we’ll be able to do something similar as well.

RSS feed for comments on this post · TrackBack URI

Leave a Comment

You must be logged in to post a comment.