Linkers part 15

COMDAT sections

In C++ there are several constructs which do not clearly live in a single place. Examples are inline functions defined in a header file, virtual tables, and typeinfo objects. There must be only a single instance of each of these constructs in the final linked program (actually we could probably get away with multiple copies of a virtual table, but the others must be unique since it is possible to take their address). Unfortunately, there is not necessarily a single object file in which they should be generated. These types of constructs are sometimes described as having vague linkage.

Linkers implement these features by using COMDAT sections (there may be other approaches, but this is the only I know of). COMDAT sections are a special type of section. Each COMDAT section has a special string. When the linker sees multiple COMDAT sections with the same special string, it will only keep one of them.

For example, when the C++ compiler sees an inline function f1 defined in a header file, but the compiler is unable to inline the function in all uses (perhaps because something takes the address of the function), the compiler will emit f1 in a COMDAT section associated with the string f1. After the linker sees a COMDAT section f1, it will discard all subsequent f1 COMDAT sections.

This obviously raises the possibility that there will be two entirely different inline functions named f1, defined in different header files. This would be an invalid C++ program, violating the One Definition Rule (often abbreviated ODR). Unfortunately, if no source file included both header files, the compiler would be unable to diagnose the error. And, unfortunately, the linker would simply discard the duplicate COMDAT sections, and would not notice the error either. This is an area where some improvements are needed (at least in the GNU tools; I don’t know whether any other tools diagnose this error correctly).

The Microsoft PE object file format provides COMDAT sections. These sections can be marked so that duplicate COMDAT sections which do not have identical contents cause an error. That is not as helpful as it seems, as different compiler options may cause valid duplicates to have different contents. The string associated with a COMDAT section is stored in the symbol table.

Before I learned about the Microsoft PE format, I introduced a different type of COMDAT sections into the GNU ELF linker, following a suggestion from Jason Merrill. Any section whose name starts with “.gnu.linkonce.” is a COMDAT section. The associated string is simply the section name itself. Thus the inline function f1 would be put into the section “.gnu.linkonce.f1″. This simple implementation works well enough, but it has a flaw in that some functions require data in multiple sections; e.g., the instructions may be in one section and associated static data may be in another section. Since different instances of the inline function may be compiled differently, the linker can not reliably and consistently discard duplicate data (I don’t know how the Microsoft linker handles this problem).

Recent versions of ELF introduce section groups. These implement an officially sanctioned version of COMDAT in ELF, and avoid the problem of “.gnu.linkonce” sections. I described these briefly in an earlier blog entry. A special section of type SHT_GROUP contains a list of section indices in the group. The group is retained or discarded as a whole. The string associated with the group is found in the symbol table. Putting the string in the symbol table makes it awkward to retrieve, but since the string is generally the name of a symbol it means that the string only needs to be stored once in the object file; this is a minor optimization for C++ in which symbol names may be very long.

More tomorrow.

2 Comments »

  1. tromey said,

    September 20, 2007 @ 12:36 pm

    FWIW, for the compile server I’m looking into a repository-like approach for things that would ordinarily have vague linkage. Or, perhaps I’ll generate them once and then link each into the object files requested by the compilation job. The latter approach may be somewhat slower but has the benefit of creating objects with the expected contents.

    Luckily all this is a ways off, so I don’t have to make any hard decisions soon.

  2. Joe Buck said,

    October 8, 2007 @ 8:04 pm

    There’s another related feature of most C++ implementations, invented by Stroustrup (or one of his colleagues), used also by g++. Rather than emitting the virtual function table definition in every object file and using COMDAT, it is emitted in the .o file that contains the definition of the first non-inline virtual function. By the one-definition rule there must be only one such file; doing it this way saves considerable space in .o files. COMDAT is used if all of the functions are defined inline or in the class definition. The typeinfo object for the class is handled in the same way, as are “out-of-line” definitions for virtual functions that are inline.

    This optimization sometimes leads to confusing messages from the linker if there is a missing definition for this first virtual function. I recall that Sun’s linker would generate a message saying something like

    virtual function table for class Foo is undefined
    [ hint: see if the first non-inline virtual function of Foo is defined ]

    while the GNU linker only gave the first message (or would complain about a missing typeinfo object).

RSS feed for comments on this post · TrackBack URI

Leave a Comment

You must be logged in to post a comment.