Archive for Programming

Exception Destruction

Languages that support exceptions need to support destructors or they need to support a try/finally construct. Otherwise using exceptions is too difficult, because if you have some local state to clean up in a function, you have to catch and rethrow every exception.

The goal of exceptions in C++ is that code which does not throw an exception should be just as efficient as code which is compiled without any support for exceptions. Unfortunately, this is impossible. When any function can throw an exception, and when there are destructors which must be run if an exception is thrown, the compiler is limited in its ability to move instructions across function calls. Of course it is not generally possible to move instructions which change global or heap memory across a function call, but in the absence of exceptions it is generally possible to move instructions which do not change memory or which change only stack memory. This means that exceptions limit what the compiler is able to do, and it follows that compiling with exception support generates code which is less efficient than compiling without exception support.

Of course exceptions still have their uses, but lets consider programming without them (this is easy for me to imagine–I didn’t use exceptions in the gold linker). If you program without exceptions, how useful are destructors and/or try/finally? What comes to mind is functions with multiple return points, loops with multiple exits, and RAII coding.

C has neither destructors nor try/finally. Does it miss them? I would say yes. A common workaround I’ve seen is to change all return points and loop exit points to use a goto to a label which does cleanups.

The gcc compiler has an extension to C to support, in effect, destructors. You can use __attribute__ ((__cleanup__ (function))) with any local variable. When the variable goes out of scope, the function will be called, passing it the address of the variable. This is an effective extension, but it is not widely used.

Comments (5)

Abolish Syntax

Although I can’t find it now, I think it was Dan Bernstein who said somewhere that programs should avoid syntax when possible. Using syntax means permitting syntax errors. Avoiding syntax means making syntax errors impossible.

Even if it wasn’t Bernstein who said this, you can see the idea in action in things like his tinydns configuration file format. There are no keywords or grouping constructs. Each statement is a single line. The first character on the line indicates the type of statement.

The expectation is that if people want a more comprehensible syntax, they will write a separate program which will read something and generate the un-syntax. That way any problems are isolated to that separate program. (Actually tinydns-data is itself a separate program which reads the un-syntax and turns it into a binary form for the tinydns program.)

I think this idea deserves wider use. It fits with the general idea that modules should be independent. When you have a program which needs to read some data, that format of that data should be as simple as possible. When it is desirable to permit a more complicated representation, that should be done by providing a mechanism to convert the complicated representation to the simple one.

Programming languages are of course a sinkhole of syntax. That said, an example of a programming language with minimal syntax is sed. For those with only a passing familiarity with sed, it is surprisingly powerful, and the t and T commands make it Turing complete. It would not be a satisfactory language for general purpose programming, but it is quite effective in its own domain. Another language with minimal syntax is, of course, APL.

Comments (3)

GCC Inline Assembler

GCC’s inline assembler syntax is very powerful and is the best mechanism I know of to mix assembly code and optimized C/C++ code. It lets you take advantage of assembly features like add-with-carry or direct calls into the kernel without losing optimizations. I don’t know of any other approach which supports that.

That said, the inline assembler syntax is also a set of traps for the unwary. Because the compiler applies optimizations around the assembler code, the inline assembler construct must precisely describe what the inline assembler code does. This is done by using constraints and by listing registers and memory that are clobbered—changed in a way which can not be easily described. Constraints are underdocumented, machine specific, and easy to get wrong.

For a complex and underdocumented construct like inline assembler, it is naturally tempting to simply copy some existing example. Unfortunately, even minor changes to the assembler code can require changes to the constraints. Unfortunately, there is no automated way to check whether you got them right. Unfortunately, it is common for incorrect constraints to work fine in simple cases and break in complex one, or to work fine with one gcc release and break with another.

So using inline assembler really requires reading and understanding the documentation. In particular the = and & constraints must be used correctly. On non-orthogonal machines like the x86 the register class constraints must be used correctly. In many cases it will be better to simply write the assembler code in a separate file and call it.

Several years ago I sketched out a different approach that might be easier to use in some cases. However, actually implementing something along those lines requires embedded the assembler into the compiler. This is unlikely to ever actually happen. I’m certainly not working on it.

Comments (6)

Multi Debugging

Many programs these days are written using multiple threads, multiple processes, and multiple languages. Our current debugging solutions don’t cope particularly well with any of those.

gdb supports multiple threads. However, the interface is hard to work with. You have to select which thread you want to look at. Threads are referred to using numbers which are relatively arbitrary; it would be helpful to be able to say things like “show me the server thread.” When a thread releases a lock, it would be helpful to be able to automatically switch to the thread which acquires the lock.

When debugging multiple processes, the most interesting case is handling remote calls between the processes. It would be desirable to be able to switch easily from the process making the call to the process executing the call. Naturally multiple processes may be running on different systems, so this requires communicating with different machines during the debugging session. Multi process core files would also be interesting.

Debugging multiple languages is a difficult case, but one applicable to many web based applications. It’s normal for code to move in and out of a scripting language, such as Python, and underlying C/C++/Java code. In the multiple process case you may also have some code running in a browser written in Javascript.

For code with strong interfaces, multi-process and multi-language debugging is less interesting. However, the reality of today’s programs is that they aren’t written with strong interfaces, and program logic moves between different components. A flexible and powerful debugger could be very useful.

There is a lot of interesting work going on with gdb these days. Making gdb more powerful is hard, but I hope that it will be possible.

Comments

Version Control Wish

A lot of smart people have thought much harder than I have about version control systems, and I am by no means an expert on them. That said, this is what I want from a VCS, beyond the obvious: I want to be able to name a patch. I want to be able to easily transfer that patch from one branch to another. I want to be able to add chunks to the patch, and modify existing chunks. If I earlier transferred the patch to another branch, I want to be able to easily move the modifications I made.

Clearly there is a sense in which a patch is a branch. But it isn’t a branch in the usual sense. I may have several active patches which live on my development branch. When I update my development branch–sync it to the master sources, or in general to other repositories–I want my patches to update also. When I want to move a patch to a release branch, I want the VCS to roll the patch back to the current merge point of the development branch and the release branch, and to apply that modified patch to the release branch.

For example, let’s say that patch P was started on the development branch at version Rd. Let’s say that release branch B was branched off of the development branch at version Rb. I do some work on P, and then I update the development branch to version Re, and then I do some more work on P. Now I’m happy with patch P and I want to put it on the release branch. I want the VCS to get P out of the development branch. I want it to reverse apply the diffs from Re back to Rb. I want it to take the resulting diff and apply it to the release branch.

Then i want to work on patch P some more, and then move it over to the release branch again. Now I want the VCS to pick up the changes since I last moved it over and only apply those changes–after, of course, removing any changes I dragged in from other people.

I want P to have a name, not a revision number, and I want these operations to be simple VCS commands, not complicated scripts.

Naturally merge conflicts are possible at several different stages here, and the final output may have to include several different bits of source code for each conflict. Or perhaps the VCS could ask me what to do as it goes along, that would be OK.

These are the sorts of operations I find myself doing fairly regularly. Obviously I can do them with any VCS, by using manual bookkeeping and attention to detail. I have yet to find any VCS which makes them simple.

Comments (5)

« Previous entries ·