Archive for Programming

Layered Programming

Many programs today are written at a very high level. They are run in an interpreted environment, not a compiler. Often many different components running in different interpreted environments are hooked together. HTML and XML, for example, started out as markup languages, but now they are often also used as components of programs hooking together the output of different servers.

Computer programming has always been based on layering and abstraction. The processor abtracts the transistor, the traditional programming language abstracts the processor, the kernel abstracts the hardware. What seems fairly new to me is the speed at which these layers change and their complexity. New ideas are implemented in the form of extensive libraries. Each library can be learned in isolation, but there is no unifying principle across libraries.

It is becoming increasingly difficult to be a systems expert. When I learned to program, it was possible to understand your entire program from the source code, in whatever language, down to the machine code. When writing a modern Ajax application, that is simply impossible. There are too many different interpreters. There is too much code involved. Even fixing on a new base level above the processor–perhaps the browser–doesn’t help. This all leads to decreased performance, which is sometimes important, and decreased security, which is often important.

We can’t go back. What I wonder is whether we will again cohere to a programming model which can be understood at all relevant layers. Or whether things are just going to get increasingly complicated.

Comments (3)

Peer Review

Peer review can be a useful technique when programming. It ensures that at least one other person has read the code. It can catch dumb bugs and help ensure that the code is not unnecessarily obscure. Several popular programming methodologies use it. (Pair programming has the same benefits.)

Peer review has one obvious disadvantage: it slows down coding. In order for peer review to be meaningful, you have to present digestible chunks for review. And that
mean waiting for the review, or using some sort of patch management to permit continued coding until the review is complete and to incorporate changes suggested by the review.

I generally have not worked on project that require peer review. The gcc project requires maintainer approval of all changes, but maintainers are permitted to commit their own changes without review. I can see the advantages of a peer review system, provided there is some mechanism to ensure that reviews happen quickly. If reviews can linger, then projects can stall very quickly.

gcc has a difficult enough time getting patches reviewed as it is. It’s hard to recommend anything which would make it slower. One approach that might make it more acceptable would be to say that if a maintainer writes a patch, the peer review can be done by anybody–it would not have to be another maintainer. That is, require a reviewer for every patch, but only require that either the author or the reviewer be a maintainer.

I’m not sure whether this would be a good idea or not. It would be good to improve the quality of the gcc code base, but the quality is not so bad that drastic measures are required. Only a small additional cost would be acceptable.

Comments (2)

Linker relro

gcc, the GNU linker, and the glibc dynamic linker cooperate to implement an idea called read-only relocations, or relro. This permits the linker to designate a part of an executable or (more commonly) a shared library as being read-only after dynamic relocations have been applied.

This may be used for read-only global variables which are initialized to something which requires a relocation, such as the address of a function or a different global variable. Because the global variable requires a runtime initialization in the form of a dynamic relocation, it can not be placed in a read-only segment. However, because it is declared to be constant, and therefore may not be changed by the program, the dynamic linker can mark it as read-only after the dynamic relocation has been applied.

For some targets this technique may also be used for the PLT or parts of the GOT.

Making these pages read-only helps catch some cases of memory corruption, and making the PLT in particular read-only helps prevent some types of buffer overflow exploits.

The first step is in gcc. When gcc sees a variable which is constant but requires a dynamic relocation, it puts it into a section named .data.rel.ro (this functionality unfortunately relies on magic section names). A variable which requires a dynamic relocation against a local symbol is put into a .data.rel.ro.local section; this helps group such variables together, so that the dynamic linker may apply the relocations, which will always be RELATIVE relocations, more efficiently, especially when using combreloc.

The linker groups .data.rel.ro and .data.rel.ro.local sections as usual. The new step is that the linker then emits a PT_GNU_RELRO program segment which covers these sections. If the PLT and/or GOT can be read-only after dynamic relocations, they are put next to the .data.rel.ro sections and also become part of the new segment. This segment will enclosed within a PT_LOAD segment. The p_vaddr field of the PT_GNU_RELRO segment gives the virtual address of the start of the read-only after dynamic relocations code, and the p_memsz field gives its length.

When the dynamic linker sees a PT_GNU_RELRO segment, it uses mprotect to mark the pages as read-only after the dynamic relocations have been applied. Of course this only works if the segment does in fact cover an entire page. The linker will try to force this to happen.

Note that the current dynamic linker code will only work correctly if the PT_GNU_RELRO segment starts on a page boundary. This is because the dynamic linker rounds the p_vaddr field down to the previous page boundary. If there is anything on the page which should not be read-only, the program is likely to fail at runtime. So in effect the linker must only emit a PT_GNU_RELRO segment if it ensures that it starts on a page boundary.

I see this as a relatively minor security benefit. It is not an optimization as far as I can see. I am documenting it here as part of my general documentation of obscure linker features. The current description of this feature in the GNU linker manual is rather obscure.

Comments (1)

GCC in C++

It is time to start using C++ in gcc. gcc was originally written in C. C++ has now advanced to the point where we can reasonably take advantage of the new features that it provides. The most obvious advantage would be in data structures. gcc implements data structures which are awkward to use for different types. With C++ they could become much simpler. The target structure is naturally implemented as a base class, which would simplify target code. The double-wide integer values could be naturally represented as a small class with operators, again simplifying the code and making it easier to understand.

This would be an easy transition, as the code is already almost completely written in the shared subset of C and C++. One of the arguments against converting to C++ is that the code would be less efficient, but it’s not as though the C code would become less efficient because we were compiling with a C++ compiler. Certainly we would have to pay close attention to efficiency with new changes, but that is no different from what we do today.

The other argument against C++ is that the language has too many complicated features. I think that gcc’s review system will ensure that new code is at least as readable as the old code. In any case programmers these days learn C++ in school. It is not so complex that gcc developers can not understand it.

The only real technical difficulty I see is that we would have to make bootstrapping work with the right libstdc++. I’m sure this is possible. We would also have to explicitly make sure that new versions of gcc can be compiled with old versions of gcc. This would be an addition to the release testing.

In the past Richard Stallman has objected to using C++ for gcc. I don’t know who he feels about it today. However, I believe that this sort of decision should be made by the actual developers.

If anybody has a principled argument against using C++ for gcc, I would very much like to hear it.

Comments (10)

Linker combreloc

The GNU linker has a -z combreloc option, which is enabled by default (it can be turned off via -z nocombreloc). I just implemented this in gold as well. This option directs the linker to sort the dynamic relocations. The sorting is done in order to optimize the dynamic linker.

The dynamic linker in glibc uses a one element cache when processing relocs: if a relocation refers to the same symbol as the previous relocation, then the dynamic linker reuses the value rather than looking up the symbol again. Thus the dynamic linker gets the best results if the dynamic relocations are sorted so that all dynamic relocations for a given dynamic symbol are adjacent.

Other than that, the linker sorts together all relative relocations, which don’t have symbols. Two relative relocations, or two relocations against the same symbol, are sorted by the address in the output file. This tends to optimize paging and caching when there are two references from the same page.

This may seem like a micro-optimization, but it can have a real effect on program startup time, especially if the program has lots of shared libraries. I’ve seen a case where a program starts up 16% faster because the relocations were sorted.

Comments

« Previous entries ·