Archive for Programming

Non-free Services

As both of my faithful readers can see, my blog postings have dropped significantly. I’ve been posting my random little comments on Google+ instead.

Which leads me to the following. There is a hard-core group of people who only use free software. I’m not quite that hard-core, but in practice I do use only free software, except perhaps for some binary drivers in the kernel (I don’t actually know whether the systems I’m running use binary drivers or not, and I’m not hard-core enough to find out).

I’ve seen some people argue that if you are serious about using free software, you should also only use Internet services which are themselves free software. For example, you should not use Facebook or Google+, because the software used to run those services is not free.

I don’t agree with that argument. The key goal of free software is that I always have the right to change the software that I am running. When I use an Internet service like Google+, I am not running the software. Even if I had a copy of the software, I would not be able to run it, because I don’t have enough servers. And even if I had enough servers, it would be useless for me to run the software, because I don’t have the data. And there is no way to grant me access to the data, because that would violate the reasonable privacy choices of everybody else using the service.

When it comes to a service like Google+, whether the software is free is not important. Releasing the software would not give me any more freedom than I already have. Google+ is only interesting when many people are operating out of a single shared data base, and that data base must have privacy safeguards to ensure that it is not copied.

What matters with Google+ is not the software, but the data. It is important that I be able to retrieve all my data associated with Google+, and that I be able to retrieve it in a way that makes it possible to use with other software. That is, I should be able to retrieve my posts, my comments on other people’s posts, my list of followers, my photos, etc. And I should be able to plug them into some other software service if I so choose.

In fact Google+ does have a set of APIs which permit me to retrieve my data. I haven’t verified that all Google+ data is available via the APIs, but all the obvious stuff seems to be available. Given those APIs, it should be possible for me to move all my data to some other service which provides te required APIs itself.

So I personally don’t see any reason why even a hard-core free software supporter should avoid using a service like Google+. This isn’t to say that it wouldn’t be nice if Google freed up the software and accepted patches from outside users. It’s just that that is not a critical part of freedom to use software.

Comments (5)

Sudoku

I’ve been playing Sudoku on Google+. i’ve more or less mastered the easy and medium levels, but it takes me about 30 minutes to do a hard level, and I haven’t tried expert yet. Sudoku is a fairly dumb game in some ways; as a colleague of mine pointed out, it’s trivial to write a computer program which will win every time. But I find the game somewhat interesting because it mirrors, in reverse, the way I think about programming.

You can write a computer program more or less any way you like. So I tend to think of a program in terms of constraints. Typical constraints are: the desired behaviour; the available runtime; the algorithmic complexity; the available libraries; the language; maintainability; who is going to review the code and what they will accept. Write a program is a matter of finding the simplest solution which meets the constraints. Difficult programming problems are ones where the constraints come into conflict, and it’s hard to see your way through.

Sudoku works the same way, only in reverse. In programming you are allowed to write any code that meets the constraints. In Sudoku you know that there is only one solution, so you have to look for moves that are forced by the constraints. Solving a Sudoku puzzle is a matter of looking deeper and deeper into the problem until you have eliminated all moves but one.

My hope is that practice in this area will subconsciously encourage me to look deeper for constraints when writing code, which will save time in the long run because I will have to throw away less code. I doubt this will actually work, but it seems worth a try.

Also Sudoku is a good way to exercise short term memory, as I’m avoiding writing anything down while solving the puzzle. I used to play cards regularly (bridge, whist) and I was able to remember the location of many of the cards in other people’s hands. I noticed that I lost that facility as I’ve failed to practice it. As i write this I realize that short term memory is not too important in today’s world, but at least it makes me feel smarter.

Comments (3)

CVS SSH

Sorry for the long posting hiatus. I have enough readers now that it’s hard to write the usual random nonsense.

I was recently reminded of an old problem using CVS over SSH, which was an interesting example of various different instances of reasonable behaviour adding up to a bug. It’s possible that this bug has been fixed, but I’ll assume that it hasn’t. The bug is that “cvs diff 2>&1 | less” will sometimes drop data, leaving you looking at an incomplete diff.

When CVS invokes SSH, it sets up file descriptors 0 (standard input) and 1 (standard output), but not 2 (standard error). Thus SSH inherits file descriptor 2 from CVS. This means that any SSH errors get reported to the standard error passed to CVS, which is what you want to have happen. However, when using 2>&1, this means that SSH’s file descriptor 2 will be the same as CVS’s file descriptors 1 and 2.

SSH puts its file descriptors 0, 1, and 2 into nonblocking mode, so that it can use select to send data back and forth without blocking. This means that SSH puts CVS’s file descriptor 2 into nonblocking mode. When using 2>&1, file descriptors 1 and 2 are the same, so this puts CVS’s file descriptor 1 into nonblocking mode.

CVS uses stdio to output data to standard output. When writing to a pipe, the buffer can fill up. CVS naturally never puts the descriptor into nonblocking mode, but when SSH has done it indirectly, and the buffer fills, the data being written will be discarded. This is what causes the bug: the discarded data is never seen by the user.

So what’s the fix? It’s reasonable for SSH to put its descriptors into nonblocking mode. It’s reasonable for CVS to pass its file descriptor 2 to SSH. It’s reasonable for CVS to use stdio to output data. It’s reasonable for stdio to not specially handle a nonblocking file descriptor—any program which wants to use a nonblocking descriptor needs to handle I/O retries itself. It’s reasonable for 2>&1 to mean that file descriptors 1 and 2 refer to the same underlying pipe. It’s reasonable for the user to use 2>&1 when piping cvs diff output to less.

I think the only remaining link in the sequence leading to the bug is that when SSH sets its file descriptor 2 to nonblocking mode, this affects the file in CVS. This is a consequence of the Unix file model, in which file descriptors refer to underlying files. A file descriptor has only one flag: whether it is closed if the exec system cal is run. All the other information is attached to the underlying file. Using 2>&1 means that two file descriptors point to the same file. Forking and execing SSH does not change this–in fact, it adds two more file descriptors, in the SSH process, which point to the same file. Any change in the flags associated with that file is seen by all the associated file descriptors.

This separation of file descriptor and file is what makes 2>&1 work. It’s also what makes >> work; >> opens a file in append mode, and the append flag is inherited by other processes which refer to that file. In any case, what really counts here is not the exec, but the fork; forking a process should not change the flags associated with a file. Further, I’m sure there programs which depend on the fact that changing the flags on a file after a fork affects the file as seen by the parent process.

It’s possible to imagine that file descriptors point to a new shared structure which then points to the underlying file. The file position and some flags would stay with the underlying flie. The new shared structure would just hold some flags which need not always be shared: O_NONBLOCK, O_ASYNC, etc. Calling fork would not create a new shared structure, but calling exec would, copying the existing structure. That would let some flags not be copied across exec process boundaries.

However, that would be a significant change to the Unix file model, a model which has lasted for decades and is not clearly broken. Absent that change, we are left with a complex bug, in which all choices are reasonable.

The workaround for the problem is to invoke SSH with 2>/dev/null, and assume that SSH never writes anything useful to standard error. The 2>/dev/null disassociates SSH file descriptor 2 from CVS file descriptor 2, so CVS file descriptor 1 is not accidentally set into nonblocking mode.

Comments (7)

Executable stack

The gcc compiler implements an extension to C: nested functions. A trivial example:


int f() {
int i = 2;
int g(int j) { return i + j; }
return g(3);
}

The function f will return 5. Note in particular that the nested function g refers to the variable i defined in the enclosing function.

You can mostly treat nested functions as ordinary functions. In particular, you can take the address of a nested function, and you can pass the resulting function pointer to another function, that function can make a call through the function pointer to the nested function, and the nested function will correctly refer to variables in its caller’s stack frame. I’m not here going to go into the details of how this is implemented. What I will say is that gcc currently implements this by writing instructions to the stack and using a pointer to those instructions. This requires that the stack be executable.

This approach was implemented many years ago, before computers were routinely attacked. In the hostile Internet environment of today, an area of memory that is both writable and executable is dangerous, because it gives an attacker space to create brand new instructions to execute. Since the stack must be writable, this means that we want to make the stack non-executable if possible. Since very few programs use nested functions, this is normally possible. But we don’t want to break those few programs either.

This is how the GNU tools do it on ELF systems such as GNU/Linux. The compiler adds a new section to all code that it compiles. The section is named .note.GNU-stack. It is empty and not allocated, which means that it takes up no space at runtime. If the code being compiled does not require an executable stack—the normal case—the compiler doesn’t set any flags for the section. If the code does require an executable stack, the compiler sets the SHF_EXECINSTR flag.

When the linker links a program, it checks each input object for a .note.GNU-stack section. If there is no such section, the linker assumes that the object must be old, and therefore may require an executable stack. If there is such a section, the linker checks the section flags to see whether the code requires an executable stack. The linker discards the .note.GNU-stack sections, and creates a PT_GNU_STACK segment in the output executable. The PT_GNU_STACK segment is empty and is not part of any PT_LOAD segment. The segment flags PF_R and PF_W are always set. If the linker has determined that the program requires an executable stack, it also sets the PF_X flag.

When the Linux kernel starts a program, it looks for a PT_GNU_STACK segment. If it does not find one, it sets the stack to be executable (if appropriate for the architecture). If it does find a PT_GNU_STACK segment, it marks the stack as executable if the segment flags call for it. (It’s possible to override this and force the kernel to never use an executable stack.) Similarly, the dynamic linker looks for a PT_GNU_STACK in any executable or shared library that it loads, and changes the stack to be executable if any of them require it.

When this all works smoothly, most programs wind up with a non-executable stack, which is what we want. The most common reason that this fails these days is that part of the program is written in assembler, and the assembler code does not create a .note.GNU_stack section. If you write assembler code for GNU/Linux, you must always be careful to add the appropriate line to your file. For most targets, the line you want is:

.section .note.GNU-stack,"",@progbits

There are some linker options to control this. The -z execstack option tells the linker to mark the program as requiring an executable stack, regardless of the input files. The -z noexecstack option marks it as not requiring an executable stack. The gold linker has a --warn-execstack option which will cause the linker to warn about any object which is missing a .note.GNU-stack option or which has an executable .note.GNU-stack option.

The execstack program may also be used to query whether a program requires an executable stack, and to change its setting.

These days we could probably change the default: we could probably say that if an object file does not have a .note.GNU-stack section, then it does not require an executable stack. That would avoid the problem of files written in assembler which do not create the section. It’s possible that this would cause some programs to incorrectly get a non-executable stack, but I think that would be quite unlikely in practice. An advantage of changing the default would be that the compiler would not have to create an empty .note.GNU-stack section in all object files.

By the way, there is one thing you can do with a normal function that you can not do with a nested function: if the nested function refers to any variables in the enclosing function, you can not return a pointer to the nested function to the caller. If you do, the variable will disappear, so the variable reference in the nested function will be dangling reference. It’s worth noting here that the Go language supports nested function literals which may refer to variables in the enclosing function, and when using Go this works correctly. The compiler creates variables on the heap if necessary, so they do not disappear until the garbage collector determines that nothing refers to them any more.

Finally, I’ll mention that there are some plans to implement a different scheme for nested functions in C, one which does not require any memory to be both writable and executable, but these plans have not yet been implemented. I’ll leave the implementation as an exercise for the reader.

Comments (4)

DejaGNU

Sorry for the long delay. Anyhow, I wrote that so that I could write this.

DejaGNU is the test harness used by gcc, gdb, the GNU binutils, and probably other programs as well. Frankly, it’s a disaster. The documentation is weak, the implementation is complex and confusing, it’s slow, it does not support running tests in parallel, it’s hard to use. It has exactly two things in its favor, and they are powerful. The first is that it mostly works. The second is that people have written many different board support packages which let it test cross-compilers on simulators and real hardware.

DejaGNU was initially written at Cygnus by Rob Savoye as a way to test gdb. I don’t recall if there was any gdb testsuite prior to DejaGNU, but if there was it was largely useless. Because gdb was a command-line program, the idea for DejaGNU was to write a test harness which could run gdb, send it commands, and examine the resulting output. That was the first mistake. It meant that all the gdb tests were required to look for syntactic details of the output which were irrelevant to the test. Tests for gdb revolve around making sure that gdb stops in the right place and can print local variables and do backtraces and so forth. That is a lot of output which DejaGNU matches using regexps. I think it would have been smarter to put the effort into adding a test harness internally to gdb itself, so that a program could query gdb’s state. This could have evolved into the MI output format which would up getting added to gdb anyhow. Or it could have evolved into a library interface for gdb, something which would have been very useful for IDEs and other purposes and still does not really exist.

Anyhow, once the decision was made to test gdb as a pure command line program, Rob looked for a program which could do that. He came across expect. The tag line of the expect paper from 1990 is “Curing Those Uncontrollable Fits of Interaction.” The expect program does a nice job of that: if you have a program that you can only interact with manually, expect lets you write a program to interact with it instead. So expect is a nice choice when you need to work with a program you don’t control. In our case, we did control gdb; choosing expect was a hack to save time modifying gdb, a hack we are still paying for nearly 20 years later.

Expect uses an embedded Tcl interpreter, so expect programs are Tcl programs. This is a good use of Tcl: it means that expect has a full programming language for writing scripts. Since interacting with other programs is all about strings, it’s perfectly reasonable to use Tcl, which is also all about strings. The consequence for DejaGNU, though, is that DejaGNU is written in Tcl.

Once Cygnus was using DejaGNU as a test harness for gdb, it seemed natural to use it as a test harness for gcc as well. But of course gcc is not an interactive program, so the advantages of using expect no longer applied. The disadvantages of Tcl remained intact.

Cygnus specialized in cross-compilers, so DejaGNU grew the ability to build programs for target boards and run them there, using various different communication mechanisms. None of this had much to do with expect or Tcl, but it was all written in Tcl because that was the mechanism available. All this support is the main reason it is difficult to move away from DejaGNU today.

At least on a native system, it’s natural to want to run tests in parallel. That’s only become more important over the years. Unfortunately, Tcl doesn’t support threads (there is a thread extension available these days, but it is written in such a way that it would have to be integrated into expect before DejaGNU could use it). It’s easy enough to write Tcl code to start gcc a bunch of times, but it’s much harder to write Tcl code to examine those results. The gcc testsuite does now run in parallel, but it does so by manually creating subsets of tests and invoking DejaGNU multiple times in parallel to run those subsets. This works but is hardly optimal.

As a highly dynamic interpreted language, Tcl is relatively slow. The expect program is quite clever and sets up pseudo terminals in order to properly interact with general programs, and effort that is useless when testing a simple program like gcc. A significant amount of the CPU time taken by a gcc testsuite run is for expect, time which is largely wasted.

The DejaGNU code is complex and hard to read. This is not entirely the fault of Tcl, but Tcl is partly to blame as DejaGNU struggles with namespace issues. Function and variable names are constructed at runtime to avoid namespace collisions, which makes it very hard to figure out what code will run. It plays games like having tests return the name of the function to run to report whether the test succeed (e.g., return “pass” to invoke the function named pass), which sounds almost clever until you realize that there is nothing which prevents you from returning an invalid value.

I could continue with more specific horrors from DejaGNU, but those are in principle fixable. I hope that the earlier points show that DejaGNU itself is broken by design. We need to move away from it.

Unfortunately DejaGNU’s large knowledge base of how to run programs on embedded systems, a knowledge base which is largely represented in hand-written Tcl code, is very hard to get around. Some of these scripts can be automatically translated into a better test harness, in that they simply set flags for various tools and set a communication mechanism. Many others will require hand conversion.

Unfortunately DejaGNU has more or less blighted the world of free test harnesses. There is CodeSourcery’s qmtest program, but I don’t know how widely that is used. Fortunately, test harness need not be particularly complex. I don’t think it would be that hard for a thoughtful person to replace DejaGNU for gcc testing, and I think the benefits would be manifold. Replacing it for gdb testing would be harder, as the gdb tests rely more on string matching. As I mentioned above that is in itself a bug, but it means that recreating the tests is hard.

There are various test scripts which are built around DejaGNU’s log files, basically attempts to parse human readable information. Those will have to change for any new test harness.

Although I don’t have a good alternative, I hope I have at least demonstrated that DejaGNU must go. Effort put into working with DejaGNU is effort wasted.

Comments (1)

« Previous entries Next Page » Next Page »