Archive for September, 2007

Linkers part 8

ELF Segments

Earlier I said that executable file formats were normally the same as object file formats. That is true for ELF, but with a twist. In ELF, object files are composed of sections: all the data in the file is accessed via the section table. Executables and shared libraries normally contain a section table, which is used by programs like nm. But the operating system and the dynamic linker do not use the section table. Instead, they use the segment table, which provides an alternative view of the file.

All the contents of an ELF executable or shared library which are to be loaded into memory are contained within a segment (an object file does not have segments). A segment has a type, some flags, a file offset, a virtual address, a physical address, a file size, a memory size, and an alignment. The file offset points to a contiguous set of bytes which are the contents of the segment, the bytes to load into memory. When the operating system or the dynamic linker loads a file, it will do so by walking through the segments and loading them into memory (typically by using the mmap system call). All the information needed by the dynamic linker–the dynamic relocations, the dynamic symbol table, etc.–are accessed via information stored in special segments.

Although an ELF executable or shared library does not, strictly speaking, require any sections, they normally do have them. The contents of a loadable section will fall entirely within a single segment.

The program linker reads sections from the input object files. It sorts and concatenates them into sections in the output file. It maps all the loadable sections into segments in the output file. It lays out the section contents in the output file segments respecting alignment and access requirements, so that the segments may be mapped directly into memory. The sections are mapped to segments based on the access requirements: normally all the read-only sections are mapped to one segment and all the writable sections are mapped to another segment. The address of the latter segment will be set so that it starts on a separate page in memory, permitting mmap to set different permissions on the mapped pages.

The segment flags are a bitmask which define access requirements. The defined flags are PF_R, PF_W, and PF_X, which mean, respectively, that the contents must be made readable, writable, or executable.

The segment virtual address is the memory address at which the segment contents are loaded at runtime. The physical address is officially undefined, but is often used as the load address when using a system which does not use virtual memory. The file size is the size of the contents in the file. The memory size may be larger than the file size when the segment contains uninitialized data; the extra bytes will be filled with zeroes. The alignment of the segment is mainly informative, as the address is already specified.

The ELF segment types are as follows:

  • PT_NULL: A null entry in the segment table, which is ignored.
  • PT_LOAD: A loadable entry in the segment table. The operating system or dynamic linker load all segments of this type. All other segments with contents will have their contents contained completely within a PT_LOAD segment.
  • PT_DYNAMIC: The dynamic segment. This points to a series of dynamic tags which the dynamic linker uses to find the dynamic symbol table, dynamic relocations, and other information that it needs.
  • PT_INTERP: The interpreter segment. This appears in an executable. The operating system uses it to find the name of the dynamic linker to run for the executable. Normally all executables will have the same interpreter name, but on some operating systems different interpreters are used in different emulation modes.
  • PT_NOTE: A note segment. This contains system dependent note information which may be used by the operating system or the dynamic linker. On GNU/Linux systems shared libraries often have a ABI tag note which may be used to specify the minimum version of the kernel which is required for the shared library. The dynamic linker uses this when selecting among different shared libraries.
  • PT_SHLIB: This is not used as far as I know.
  • PT_PHDR: This indicates the address and size of the segment table. This is not too useful in practice as you have to have already found the segment table before you can find this segment.
  • PT_TLS: The TLS segment. This holds the initial values for TLS variables.
  • PT_GNU_EH_FRAME (0x6474e550): A GNU extension used to hold a sorted table of unwind information. This table is built by the GNU program linker. It is used by gcc’s support library to quickly find the appropriate handler for an exception, without requiring exception frames to be registered when the program start.
  • PT_GNU_STACK (0x6474e551): A GNU extension used to indicate whether the stack should be executable. This segment has no contents. The dynamic linker sets the permission of the stack in memory to the permissions of this segment.
  • PT_GNU_RELRO (0x6474e552): A GNU extension which tells the dynamic linker to set the given address and size to be read-only after applying dynamic relocations. This is used for const variables which require dynamic relocations.

ELF Sections

Now that we’ve done segments, lets take a quick look at the details of ELF sections. ELF sections are more complicated than segments, in that there are more types of sections. Every ELF object file, and most ELF executables and shared libraries, have a table of sections. The first entry in the table, section 0, is always a null section.

ELF sections have several fields.

  • Name.
  • Type. I discuss section types below.
  • Flags. I discuss section flags below.
  • Address. This is the address of the section. In an object file this is normally zero. In an executable or shared library it is the virtual address. Since executables are normally accessed via segments, this is essentially documentation.
  • File offset. This is the offset of the contents within the file.
  • Size. The size of the section.
  • Link. Depending on the section type, this may hold the index of another section in the section table.
  • Info. The meaning of this field depends on the section type.
  • Address alignment. This is the required alignment of the section. The program linker uses this when laying out the section in memory.
  • Entry size. For sections which hold an array of data, this is the size of one data element.

These are the types of ELF sections which the program linker may see.

  • SHT_NULL: A null section. Sections with this type may be ignored.
  • SHT_PROGBITS: A section holding bits of the program. This is an ordinary section with contents.
  • SHT_SYMTAB: The symbol table. This section actually holds the symbol table itself. The section contents are an array of ELF symbol structures.
  • SHT_STRTAB: A string table. This type of section holds null-terminated strings. Sections of this type are used for the names of the symbols and the names of the sections themselves.
  • SHT_RELA: A relocation table. The link field holds the index of the section to which these relocations apply. These relocations include addends.
  • SHT_HASH: A hash table used by the dynamic linker to speed symbol lookup.
  • SHT_DYNAMIC: The dynamic tags used by the dynamic linker. Normally the PT_DYNAMIC segment and the SHT_DYNAMIC section will point to the same contents.
  • SHT_NOTE: A note section. This is used in system dependent ways. A loadable SHT_NOTE section will become a PT_NOTE segment.
  • SHT_NOBITS: A section which takes up memory space but has no associated contents. This is used for zero-initialized data.
  • SHT_REL: A relocation table, like SHT_RELA but the relocations have no addends.
  • SHT_SHLIB: This is not used as far as I know.
  • SHT_DYNSYM: The dynamic symbol table. Normally the DT_SYMTAB dynamic tag will point to the same contents as this section (I haven’t discussed dynamic tags yet, though).
  • SHT_INIT_ARRAY: This section holds a table of function addresses which should each be called at program startup time, or, for a shared library, when the library is opened by dlopen.
  • SHT_FINI_ARRAY: Like SHT_INIT_ARRAY, but called at program exit time or dlclose time.
  • SHT_PREINIT_ARRAY: Like SHT_INIT_ARRAY, but called before any shared libraries are initialized. Normally shared libraries initializers are run before the executable initializers. This section type may only be linked into an executable, not into a shared library.
  • SHT_GROUP: This is used to group related sections together, so that the program linker may discard them as a unit when appropriate. Sections of this type may only appear in object files. The contents of this type of section are a flag word followed by a series of section indices.
  • SHT_SYMTAB_SHNDX: ELF symbol table entries only provide a 16-bit field for the section index. For a file with more than 65536 sections, a section of this type is created. It holds one 32-bit word for each symbol. If a symbol’s section index is SHN_XINDEX, the real section index may be found by looking in the SHT_SYMTAB_SHNDX section.
  • SHT_GNU_LIBLIST (0x6ffffff7): A GNU extension used by the prelinker to hold a list of libraries found by the prelinker.
  • SHT_GNU_verdef (0x6ffffffd): A Sun and GNU extension used to hold version definitions (I’ll take about symbol versions at some point).
  • SHT_GNU_verneed (0x6ffffffe): A Sun and GNU extension used to hold versions required from other shared libraries.
  • SHT_GNU_versym (0x6fffffff): A Sun and GNU extension used to hold the versions for each symbol.

These are the types of section flags.

  • SHF_WRITE: Section contains writable data.
  • SHF_ALLOC: Section contains data which should be part of the loaded program image. For example, this would normally be set for a SHT_PROGBITS section and not set for a SHT_SYMTAB section.
  • SHF_EXECINSTR: Section contains executable instructions.
  • SHF_MERGE: Section contains constants which the program linker may merge together to save space. The compiler can use this type of section for read-only data whose address is unimportant.
  • SHF_STRINGS: In conjunction with SHF_MERGE, this means that the section holds null terminated string constants which may be merged.
  • SHF_INFO_LINK: This flag indicates that the info field in the section holds a section index.
  • SHF_LINK_ORDER: This flag tells the program linker that when it combines sections, this section must appear in the same relative order as the section in the link field. This can be used to ensure that address tables are built in the expected order.
  • SHF_OS_NONCONFORMING: If the program linker sees a section with this flag, and does not understand the type or all other flags, then it must issue an error.
  • SHF_GROUP: This section appears in a group (see SHT_GROUP, above).
  • SHF_TLS: This section holds TLS data.

Comments (1)

Linkers part 7

As we’ve seen, what linkers do is basically quite simple, but the details can get complicated. The complexity is because smart programmers can see small optimizations to speed up their programs a little bit, and somtimes the only place those optimizations can be implemented is the linker. Each such optimizations makes the linker a little more complicated. At the same time, of course, the linker has to run as fast as possible, since nobody wants to sit around waiting for it to finish. Today I’ll talk about a classic small optimization implemented by the linker.

Thread Local Storage

I’ll assume you know what a thread is. It is often useful to have a global variable which can take on a different value in each thread (if you don’t see why this is useful, just trust me on this). That is, the variable is global to the program, but the specific value is local to the thread. If thread A sets the thread local variable to 1, and thread B then sets it to 2, then code running in thread A will continue to see the value 1 for the variable while code running in thread B sees the value 2. In Posix threads this type of variable can be created via pthread_key_create and accessed via pthread_getspecific and pthread_setspecific.

Those functions work well enough, but making a function call for each access is awkward and inconvenient. It would be more useful if you could just declare a regular global variable and mark it as thread local. That is the idea of Thread Local Storage (TLS), which I believe was invented at Sun. On a system which supports TLS, any global (or static) variable may be annotated with __thread. The variable is then thread local.

Clearly this requires support from the compiler. It also requires support from the program linker and the dynamic linker. For maximum efficiency–and why do this if you aren’t going to get maximum efficiency?–some kernel support is also needed. The design of TLS on ELF systems fully supports shared libraries, including having multiple shared libraries, and the executable itself, use the same name to refer to a single TLS variable. TLS variables can be initialized. Programs can take the address of a TLS variable, and pass the pointers between threads, so the address of a TLS variable is a dynamic value and must be globally unique.

How is this all implemented? First step: define different storage models for TLS variables.

  • Global Dynamic: Fully general access to TLS variables from an executable or a shared object.
  • Local Dynamic: Permits access to a variable which is bound locally within the executable or shared object from which it is referenced. This is true for all static TLS variables, for example. It is also true for protected symbols–I described those back in part 5.
  • Initial Executable: Permits access to a variable which is known to be part of the TLS image of the executable. This is true for all TLS variables defined in the executable itself, and for all TLS variables in shared libraries explicitly linked with the executable. This is not true for accesses from a shared library, nor for accesses to TLS variables defined in shared libraries opened by dlopen.
  • Local Executable: Permits access to TLS variables defined in the executable itself.

These storage models are defined in decreasing order of flexibility. Now, for efficiency and simplicity, a compiler which supports TLS will permit the developer to specify the appropriate TLS model to use (with gcc, this is done with the -ftls-model option, although the Global Dynamic and Local Dynamic models also require using -fpic). So, when compiling code which will be in an executable and never be in a shared library, the developer may choose to set the TLS storage model to Initial Executable.

Of course, in practice, developers often do not know where code will be used. And developers may not be aware of the intricacies of TLS models. The program linker, on the other hand, knows whether it is creating an executable or a shared library, and it knows whether the TLS variable is defined locally. So the program linker gets the job of automatically optimizing references to TLS variables when possible. These references take the form of relocations, and the linker optimizes the references by changing the code in various ways.

The program linker is also responsible for gathering all TLS variables together into a single TLS segment (I’ll talk more about segments later, for now think of them as a section). The dynamic linker has to group together the TLS segments of the executable and all included shared libraries, resolve the dynamic TLS relocations, and has to build TLS segments dynamically when dlopen is used. The kernel has to make it possible for access to the TLS segments be efficient.

That was all pretty general. Let’s do an example, again for i386 ELF. There are three different implementations of i386 ELF TLS; I’m going to look at the gnu implementation. Consider this trivial code:

__thread int i;
int foo() { return i; }

In global dynamic mode, this generates i386 assembler code like this:

leal i@TLSGD(,%ebx,1), %eax
call ___tls_get_addr@PLT
movl (%eax), %eax

Recall from part 4 that %ebx holds the address of the GOT table. The first instruction will have a R_386_TLS_GD relocation for the variable i; the relocation will apply to the offset of the leal instruction. When the program linker sees this relocation, it will create two consecutive entries in the GOT table for the TLS variable i. The first one will get a R_386_TLS_DTPMOD32 dynamic relocation, and the second will get a R_386_TLS_DTPOFF32 dynamic relocation. The dynamic linker will set the DTPMOD32 GOT entry to hold the module ID of the object which defines the variable. The module ID is an index within the dynamic linker’s tables which identifies the executable or a specific shared library. The dynamic linker will set the DTPOFF32 GOT entry to the offset within the TLS segment for that module. The __tls_get_addr function will use those values to compute the address (this function also takes care of lazy allocation of TLS variables, which is a further optimization specific to the dynamic linker). Note that __tls_get_addr is actually implemented by the dynamic linker itself; it follows that global dynamic TLS variables are not supported (and not necessary) in statically linked executables.

At this point you are probably wondering what is so inefficient aboutpthread_getspecific. The real advantage of TLS shows when you see what the program linker can do. The leal; call sequence shown above is canonical: the compiler will always generate the same sequence to access a TLS variable in global dynamic mode. The program linker takes advantage of that fact. If the program linker sees that the code shown above is going into an executable, it knows that the access does not have to be treated as global dynamic; it can be treated as initial executable. The program linker will actually rewrite the code to look like this:

movl %gs:0, %eax
subl $i@GOTTPOFF(%ebx), %eax

Here we see that the TLS system has coopted the %gs segment register, with cooperation from the operating system, to point to the TLS segment of the executable. For each processor which supports TLS, some such efficiency hack is made. Since the program linker is building the executable, it builds the TLS segment, and knows the offset of i in the segment. The GOTTPOFF is not a real relocation; it is created and then resolved within the program linker. It is, of course, the offset from the GOT table to the address of i in the TLS segment. The movl (%eax), %eax from the original sequence remains to actually load the value of the variable.

Actually, that is what would happen if i were not defined in the executable itself. In the example I showed, i is defined in the executable, so the program linker can actually go from a global dynamic access all the way to a local executable access. That looks like this:

movl %gs:0,%eax
subl $i@TPOFF,%eax

Here i@TPOFF is simply the known offset of i within the TLS segment. I’m not going to go into why this uses subl rather than addl; suffice it to say that this is another efficiency hack in the dynamic linker.

If you followed all that, you’ll see that when an executable accesses a TLS variable which is defined in that executable, it requires two instructions to compute the address, typically followed by another one to actually load or store the value. That is significantly more efficient than calling pthread_getspecific. Admittedly, when a shared library accesses a TLS variable, the result is not much better than pthread_getspecific, but it shouldn’t be any worse, either. And the code using __thread is much easier to write and to read.

That was a real whirlwind tour. There are three separate but related TLS implementations on i386 (known as sun, gnu, and gnu2), and 23 different relocation types are defined. I’m certainly not going to try to describe all the details; I don’t know them all in any case. They all exist in the name of efficient access to the TLS variables for a given storage model.

Is TLS worth the additional complexity in the program linker and the dynamic linker? Since those tools are used for every program, and since the C standard global variable errno in particular can be implemented using TLS, the answer is most likely yes.

Comments (18)

« Previous Page « Previous Page Next entries »