Archive for September, 2007

Linkers part 18

Incremental Linking

Often a programmer will make change a single source file and recompile and relink the application. A standard linker will need to read all the input objects and libraries in order to regenerate the executable with the change. For a large application, this is a lot of work. If only one input object file changed, it is a lot more work than really needs to be done. One solution is to use an incremental linker. An incremental linker makes incremental changes to an existing executable or shared library, rather than rebuilding them from scratch.

I’ve never actually written or worked on an incremental linker, but the general idea is straightforward enough. When the linker writes the output file, it must attach additional information.

  • The linker must create a mapping of object files to areas in the output file, so that an incremental link will know what to remove when replacing an object file.
  • The linker must retain all the relocations for each input object which refer to symbols defined in other objects, so that it can reprocess them when symbols change. The linker should store the relocations mapped by symbol, so that it can quickly find the relevant relocations.
  • The linker should leave extra space in the text and data segments, to allow for object files to grow to a limited extent without requiring rewriting the whole executable. It must keep a map of where this extra space is, as it will tend to move over time over the course of incremental links.
  • The linker should keep a list of object file timestamps in the output file, so that it can quickly determine which objects have changed.

With this information, the linker can identify which object files have changed since the last time the output file was linked, and replace them in the existing output file. When an object file changes, the linker can identify all the relocations which refer to symbols defined in the object file, and reprocess them.

When an object file gets too large to fit in the available space in a text or data segment, then the linker has the option of creating additional text or data segments at different addresses. This requires some care to ensure that the new code does not collide with the heap, depending upon how the local malloc implementation works. Alternatively, the incremental linker could fall back on doing a full link, and allocating more space again.

Incremental linking can greatly speed up the edit/compile/debug cycle. Unfortunately it is not implemented in most common linkers. Of course an incremental link is not equivalent to a final link, and in particular some linker optimizations are difficult to implement while acting incrementally. An incremental link is really only suitable for use during the development cycle, which is course the time when the speed of the linker is most important.

More on Monday.

Comments (2)

Linkers part 17

Warning Symbols

The GNU linker supports a weird extension to ELF used to issue warnings when symbols are referenced at link time. This was originally implemented for a.out using a special symbol type. For ELF, I implemented it using a special section name.

If you create a section named .gnu.warning.SYMBOL, then if and when the linker sees an undefined reference to SYMBOL, it will issue a warning. The warning is triggered by seeing an undefined symbol with the right name in an object file. Unlike the warning about an undefined symbol, it is not triggered by seeing a relocation entry. The text of the warning is simply the contents of the .gnu.warning.SYMBOL section.

The GNU C library uses this feature to warn about references to symbols like gets which are required by standards but are generally considered to be unsafe. This is done by creating a section named .gnu.warning.gets in the same object file which defines gets.

The GNU linker also supports another type of warning, triggered by sections named .gnu.warning (without the symbol name). If an object file with a section of that name is included in the link, the linker will issue a warning. Again, the text of the warning is simply the contents of the .gnu.warning section. I don’t know if anybody actually uses this feature.

Short entry today, more tomorrow.

Comments (1)

Linkers part 16

C++ Template Instantiation

There is still more C++ fun at link time, though somewhat less related to the linker proper. A C++ program can declare templates, and instantiate them with specific types. Ideally those specific instantiations will only appear once in a program, not once per source file which instantiates the templates. There are a few ways to make this work.

For object file formats which support COMDAT and vague linkage, which I described yesterday, the simplest and most reliable mechanism is for the compiler to generate all the template instantiations required for a source file and put them into the object file. They should be marked as COMDAT, so that the linker discards all but one copy. This ensures that all template instantiations will be available at link time, and that the executable will have only one copy. This is what gcc does by default for systems which support it. The obvious disadvantages are the time required to compile all the duplicate template instantiations and the space they take up in the object files. This is sometimes called the Borland model, as this is what Borland’s C++ compiler did.

Another approach is to not generate any of the template instantiations at compile time. Instead, when linking, if we need a template instantiation which is not found, invoke the compiler to build it. This can be done either by running the linker and looking for error messages or by using a linker plugin to handle an undefined symbol error. The difficulties with this approach are to find the source code to compile and to find the right options to pass to the compiler. Typically the source code is placed into a repository file of some sort at compile time, so that it is available at link time. The complexities of getting the compilation steps right are why this approach is not the default. When it works, though, it can be faster than the duplicate instantiation approach. This is sometimes called the Cfront model.

gcc also supports explicit template instantiation, which can be used to control exactly where templates are instantiated. This approach can work if you have complete control over your source code base, and can instantiate all required templates in some central place. This approach is used for gcc’s C++ library, libstdc++.

C++ defines a keyword export which is supposed to permit exporting template definitions in such a way that they can be read back in by the compiler. gcc does not support this keyword. If it worked, it could be a slightly more reliable way of using a repository when using the Cfront model.

Exception Frames

C++ and other languages support exceptions. When an exception is thrown in one function and caught in another, the program needs to reset the stack pointer and registers to the point where the exception is caught. While resetting the stack pointer, the program needs to identify all local variables in the part of the stack being discarded, and run their destructors if any. This process is known as unwinding the stack.

The information needed to unwind the stack is normally stored in tables in the program. Supporting library code is used to read the tables and perform the necessary operations. I’m not going to describe the details of those tables here. However, there is a linker optimization which applies to them.

The support libraries need to be able to find the exception tables at runtime when an exception occurs. An exception can be thrown in one shared library and caught in a different shared library, so finding all the required exception tables can be a nontrivial operation. One approach that can be used is to register the exception tables at program startup time or shared library load time. The registration can be done at the right time using the global constructor mechanism.

However, this approach imposes a runtime cost for exceptions, in that it takes longer for the program to start. Therefore, this is not ideal. The linker can optimize this by building tables which can be used to find the exception tables. The tables built by the GNU linker are sorted for fast lookup by the runtime library. The tables are put into a PT_GNU_EH_FRAME segment. The supporting libraries then need a way to look up a segment of this type. This is done via the dl_iterate_phdr API provided by the GNU dynamic linker.

Note that if the compiler believes that the linker will generate a PT_GNU_EH_FRAME segment, it won’t generate the startup code to register the exception tables. Thus the linker must not fail to create this segment.

Since the GNU linker needs to look at the exception tables in order to generate the PT_GNU_EH_FRAME segment, it will also optimize by discarding duplicate exception table information.

I know this is section is rather short on details. I hope the general idea is clear.

More tomorrow.


Linkers part 15

COMDAT sections

In C++ there are several constructs which do not clearly live in a single place. Examples are inline functions defined in a header file, virtual tables, and typeinfo objects. There must be only a single instance of each of these constructs in the final linked program (actually we could probably get away with multiple copies of a virtual table, but the others must be unique since it is possible to take their address). Unfortunately, there is not necessarily a single object file in which they should be generated. These types of constructs are sometimes described as having vague linkage.

Linkers implement these features by using COMDAT sections (there may be other approaches, but this is the only I know of). COMDAT sections are a special type of section. Each COMDAT section has a special string. When the linker sees multiple COMDAT sections with the same special string, it will only keep one of them.

For example, when the C++ compiler sees an inline function f1 defined in a header file, but the compiler is unable to inline the function in all uses (perhaps because something takes the address of the function), the compiler will emit f1 in a COMDAT section associated with the string f1. After the linker sees a COMDAT section f1, it will discard all subsequent f1 COMDAT sections.

This obviously raises the possibility that there will be two entirely different inline functions named f1, defined in different header files. This would be an invalid C++ program, violating the One Definition Rule (often abbreviated ODR). Unfortunately, if no source file included both header files, the compiler would be unable to diagnose the error. And, unfortunately, the linker would simply discard the duplicate COMDAT sections, and would not notice the error either. This is an area where some improvements are needed (at least in the GNU tools; I don’t know whether any other tools diagnose this error correctly).

The Microsoft PE object file format provides COMDAT sections. These sections can be marked so that duplicate COMDAT sections which do not have identical contents cause an error. That is not as helpful as it seems, as different compiler options may cause valid duplicates to have different contents. The string associated with a COMDAT section is stored in the symbol table.

Before I learned about the Microsoft PE format, I introduced a different type of COMDAT sections into the GNU ELF linker, following a suggestion from Jason Merrill. Any section whose name starts with “.gnu.linkonce.” is a COMDAT section. The associated string is simply the section name itself. Thus the inline function f1 would be put into the section “.gnu.linkonce.f1”. This simple implementation works well enough, but it has a flaw in that some functions require data in multiple sections; e.g., the instructions may be in one section and associated static data may be in another section. Since different instances of the inline function may be compiled differently, the linker can not reliably and consistently discard duplicate data (I don’t know how the Microsoft linker handles this problem).

Recent versions of ELF introduce section groups. These implement an officially sanctioned version of COMDAT in ELF, and avoid the problem of “.gnu.linkonce” sections. I described these briefly in an earlier blog entry. A special section of type SHT_GROUP contains a list of section indices in the group. The group is retained or discarded as a whole. The string associated with the group is found in the symbol table. Putting the string in the symbol table makes it awkward to retrieve, but since the string is generally the name of a symbol it means that the string only needs to be stored once in the object file; this is a minor optimization for C++ in which symbol names may be very long.

More tomorrow.

Comments (2)

Linkers part 14

Link Time Optimization

I’ve already mentioned some optimizations which are peculiar to the linker: relaxation and garbage collection of unwanted sections. There is another class of optimizations which occur at link time, but are really related to the compiler. The general name for these optimizations is link time optimization or whole program optimization.

The general idea is that the compiler optimization passes are run at link time. The advantage of running them at link time is that the compiler can then see the entire program. This permits the compiler to perform optimizations which can not be done when sources files are compiled separately. The most obvious such optimization is inlining functions across source files. Another is optimizing the calling sequence for simple functions–e.g., passing more parameters in registers, or knowing that the function will not clobber all registers; this can only be done when the compiler can see all callers of the function. Experience shows that these and other optimizations can bring significant performance benefits.

Generally these optimizations are implemented by having the compiler write a version of its intermediate representation into the object file, or into some parallel file. The intermediate representation will be the parsed version of the source file, and may already have had some local optimizations applied. Sometimes the object file contains only the compiler intermediate representation, sometimes it also contains the usual object code. In the former case link time optimization is required, in the latter case it is optional.

I know of two typical ways to implement link time optimization. The first approach is for the compiler to provide a pre-linker. The pre-linker examines the object files looking for stored intermediate representation. When it finds some, it runs the link time optimization passes. The second approach is for the linker proper to call back into the compiler when it finds intermediate representation. This is generally done via some sort of plugin API.

Although these optimizations happen at link time, they are not part of the linker proper, at least not as I defined it. When the compiler reads the stored intermediate representation, it will eventually generate an object file, one way or another. The linker proper will then process that object file as usual. These optimizations should be thought of as part of the compiler.

Initialization Code

C++ permits globals variables to have constructors and destructors. The global constructors must be run before main starts, and the global destructors must be run after exit is called. Making this work requires the compiler and the linker to cooperate.

The a.out object file format is rarely used these days, but the GNU a.out linker has an interesting extension. In a.out symbols have a one byte type field. This encodes a bunch of debugging information, and also the section in which the symbol is defined. The a.out object file format only supports three sections–text, data, and bss. Four symbol types are defined as sets: text set, data set, bss set, and absolute set. A symbol with a set type is permitted to be defined multiple times. The GNU linker will not give a multiple definition error, but will instead build a table with all the values of the symbol. The table will start with one word holding the number of entries, and will end with a zero word. In the output file the set symbol will be defined as the address of the start of the table.

For each C++ global constructor, the compiler would generate a symbol named __CTOR_LIST__ with the text set type. The value of the symbol in the object file would be the global constructor function. The linker would gather together all the __CTOR_LIST__ functions into a table. The startup code supplied by the compiler would walk down the __CTOR_LIST__ table and call each function. Global destructors were handled similarly, with the name __DTOR_LIST__.

Anyhow, so much for a.out. In ELF, global constructors are handled in a fairly similar way, but without using magic symbol types. I’ll describe what gcc does. An object file which defines a global constructor will include a .ctors section. The compiler will arrange to link special object files at the very start and very end of the link. The one at the start of the link will define a symbol for the .ctors section; that symbol will wind up at the start of the section. The one at the end of the link will define a symbol for the end of the .ctors section. The compiler startup code will walk between the two symbols, calling the constructors. Global destructors work similarly, in a .dtors section.

ELF shared libraries work similarly. When the dynamic linker loads a shared library, it will call the function at the DT_INIT tag if there is one. By convention the ELF program linker will set this to the function named _init, if there is one. Similarly the DT_FINI tag is called when a shared library is unloaded, and the program linker will set this to the function named _fini.

As I mentioned earlier, three are also DT_INIT_ARRAY, DT_PREINIT_ARRAY, and DT_FINI_ARRAY tags, which are set based on the SHT_INIT_ARRAY, SHT_PREINIT_ARRAY, and SHT_FINI_ARRAY section types. This is a newer approach in ELF, and does not require relying on special symbol names.

More tomorrow.

Comments (2)

« Previous Page« Previous entries « Previous Page · Next Page » Next entries »Next Page »