Archive for Programming

Archive Alignment

gold normally simply mmaps input files and reads the data directly from the mapped memory. In general this requires that all the data structures be properly aligned in the file, which is guaranteed by the ELF standard. The x86, of course, does not require proper alignment for memory loads and stores, so this was never tested. When David Miller tested on a SPARC host, he saw crashes.

It turns out that the problem was object files in archives. Archives only guarantee an alignment to 16-bit boundaries. ELF data structures require 32-bit or 64-bit boundaries (the latter for 64-bit ELF). Thus it is possible for an object file stored in an archive to be misaligned with respect to mmap.

I fixed this in gold in the obvious way. But it suggests that changing the archive format to enforce 64-bit alignment could give better linker performance. Unfortunately the archive format, which dates back the old a.out format, is quite simple. The size of the object file is stored, but not the size that it takes up in the archive. And there is no global table of contents pointing to each object file. So a general archive file reader must walk through the archive file one object file at a time. Archive file readers must align to 16-bit boundaries as they go. There is no provision for recording the amount of required alignment.

The only way I see to get optimal performance would be to actually define a new archive format, with a new magic string. It’s not clear that the compatibility hassles would be worth it.

Comments (9)

GCC Exception Frames

When an exception is thrown in C++ and caught by one of the calling functions, the supporting libraries need to unwind the stack. With gcc this is done using a variant of DWARF debugging information. The unwind information is loaded at runtime, but is not read unless an exception is thrown. That means that the unwind library needs to have some way of finding the appropriate unwind information at runtime.

On some systems, this is done by registering the exception frame information when the program starts. The registration is done with a variant of the handling of C++ constructors. This becomes interesting when one shared library can throw an exception which is caught by another shared library. It is possible for such a case to arise when the executable itself never throws exceptions and therefore has no frames to register. Obviously the unwinder needs to be able to find the unwind information for both shared libraries, which means that both shared libraries need to use the same registration functions. With gcc this is normally ensured by putting the unwind code in a shared library, libgcc_s.so. Each shared library, and sometimes the executable, will use libgcc_s.so. That ensures a single copy of the registration and unwind functions, so the library will be able to reliably unwind across shared libraries. With gcc the use of libgcc_s.so can be controlled with the -shared-libgcc and -static-libgcc options. Normally the right thing will happen by default.

That approach has a cost: there is an extra shared library, and there is a small cost of registering the unwind information at program startup or library load time (and unregistering it if a shared library is unloaded via dlclose). There is now a better way, which requires linker support.

Both gold and the GNU linker support the command line option --eh-frame-hdr. With this option, when the linker sees the .eh_frame sections used to hold the unwind information, it automatically builds a header. This header is a sorted array mapping program counter addresses to unwind information. The header is recorded as a program segment of type PT_GNU_EH_FRAME. (This is a little bit ugly since the .eh_frame sections are recognized only by name; ideally they should have a special section type.)

At runtime, the unwind library can use the dl_iterate_phdr function to find the program segments of the executable and all currently loaded shared libraries. It can use that to find the PT_GNU_EH_FRAME segments, and use the sorted array in those segments to quickly find the unwind information.

This approach means that no registration functions are required. It also means that it is not necessary to have a single shared library, since dl_iterate_phdr is available no matter which shared library throws the exception.

This all only works if you have a linker which supports generating PT_GNU_EH_FRAME sections, if all the shared libraries and the executable are linked by such a linker, and if you have a working dl_iterate_phdr function in your C library or dynamic linker. I think that pretty much restricts this approach to GNU/Linux and possibly other free operating systems. For those scenarios, I hope that gcc will soon be able to stop using libgcc_s.so by default.

Comments (2)

Concurrent linking

There is still work to do on gold. But once that is done, what is the next big step? In the long term we need an incremental linker, but I’m also interested in an idea which I call concurrent linking. A concurrent linker runs at the same time as the compiler. As each object file is compiled, the linker is notified, and the linker adds the object file to the executable that it is building. When the last compilation is completed, the linker writes out the fully linked executable.

The idea is to keep the linker from being a serializing step in a compilation. It is normally easy to run many compilations in parallel, as each compilation is independent. Traditionally, however, the linker can not start until all the compilations are complete, and it must read all the input files at that time.

A concurrent linker can instead run at the same time as the compilations. There is no long wait after the compilations are complete. Also, the newly generated object file should be in the disk cache, and so the linker will have to do less actual disk I/O. I expect that this would only be a noticeable improvement for large programs, but then those are the cases where the linker really is a bottleneck.

The key to making this work will be careful control of the symbol table, and efficient tracking of relocations. In general relocations for an object file may only be resolved when they refer to symbols defined in objects which appear on the link command line. In other cases we must remember the relocation. If the symbol is already known to be defined in an object which appears later on the link line, it would probably be appropriate to resolve the relocation to that symbol, but to record it in case the symbol definition is changed later.

In general an executable linked in this way may have additional text and data segments, and they won’t be as tightly packed. Thus the resulting executable will most likely have slightly worse paging behaviour and will run slightly slower. So this technique is only appropriate during the development cycle, not during a release build.

Comments (15)

Gold Released

I have finally released gold, the new ELF linker I’ve been working on, to the free software world. It is now part of the GNU binutils. I sent an announcement to the binutils mailing list.

Comments (6)

Compiler Warnings

There is an ongoing issue within gcc as to where warnings should be issued. The question is whether warnings should be issued only by the frontend, or whether it is also OK for the optimizers to issue them.

The advantage of issuing warnings in the frontend are that warnings are independent of optimization level. Also, the warnings are issued much closer to the user’s code, which means that it is easier for the user to understand what they mean, and it is easier to give very good location information.

The advantage of issuing warnings from the optimizers is that some warnings require the data structures built by the optimizers. A good example in gcc is the new strict aliasing warning. This warns about code which may violate the type aliasing restrictions in the language standard. Detecting this requires tracking how pointers are used in the program.

Some other warnings that gcc emits after the frontend are:

  • Warnings about large stack frames.
  • Warnings about optimizations which rely on undefined signed overflow.
  • Warnings about variables which may be changed by setjmp.
  • Some warnings about comparisons which are always true or false.
  • Warnings about unsafe loop optimizations.
  • Warnings about unused values.
  • Warnings about unreachable code.
  • Warnings about uninitialized variables.

The last two in this list are particularly troublesome. They are simple warnings, but whether they are issued changes based on the optimization level and changes from one compiler release to another. This is confusing and frustrating for users: a new compiler release can mean a new set of warnings.

On the other hand, the optimizers can do a better job. For example, the uninitialized variable warning can warn about uninitialized fields in a structure. It can also warn about variables which are passed to an inlined function by address, but are never initialized by that function. It would be possible to do these in the frontend, but harder.

The right choice may be to separate the warnings: do some basic checking in the frontend, and use different options for the warnings issued by the optimizers.

Comments (4)

« Previous Page« Previous entries « Previous Page · Next Page » Next entries »Next Page »