Linkers part 6

So many things to talk about. Let’s go back and cover relocations in some more detail, with some examples.

Relocations

As I said back in part 2, a relocation is a computation to perform on the contents. And as I said yesterday, a relocation can also direct the linker to take other actions, like creating a PLT or GOT entry. Let’s take a closer look at the computation.

In general a relocation has a type, a symbol, an offset into the contents, and an addend.
From the linker’s point of view, the contents are simply an uninterpreted series of bytes. A relocation changes those bytes as necessary to produce the correct final executable. For example, consider the C code g = 0; where g is a global variable. On the i386, the compiler will turn this into an assembly language instruction, which will most likely be movl $0, g (for position dependent code–position independent code would loading the address of g from the GOT). Now, the g in the C code is a global variable, and we all more or less know what that means. The g in the assembly code is not that variable. It is a symbol which holds the address of that variable.

The assembler does not know the address of the global variable g, which is another way of saying that the assembler does not know the value of the symbol g. It is the linker that is going to pick that address. So the assembler has to tell the linker that it needs to use the address of g in this instruction. The way the assembler does this is to create a relocation. We don’t use a separate relocation type for each instruction; instead, each processor will have a natural set of relocation types which are appropriate for the machine architecture. Each type of relocation expresses a specific computation.

In the i386 case, the assembler will generate these bytes:

c7 05 00 00 00 00 00 00 00 00

The c7 05 are the instruction (movl constant to address). The first four 00 bytes are the 32-bit constant 0. The second four 00 bytes are the address. The assembler tells the linker to put the value of the symbol g into those four bytes by generating (in this case) a R_386_32 relocation. For this relocation the symbol will be g, the offset will be to the last four bytes of the instruction, the type will be R_386_32, and the addend will be 0 (in the case of the i386 the addend is stored in the contents rather than in the relocation itself, but this is a detail). The type R_386_32 expresses a specific computation, which is: put the 32-bit sum of the value of the symbol and the addend into the offset. Since for the i386 the addend is stored in the contents, this can also be expressed as: add the value of the symbol to the 32-bit field at the offset. When the linker performs this computation, the address in the instruction will be the address of the global variable g. Regardless of the details, the important point to note is that the relocation adjusts the contents by applying a specific computation selected by the type.

An example of a simple case which does use an addend would be


char a[10]; // A global array.
char* p = &a[1]; // In a function.

The assignment to p will wind up requiring a relocation for the symbol a. Here the addend will be 1, so that the resulting instruction references a + 1 rather than a + 0.

To point out how relocations are processor dependent, let’s consider g = 0; on a RISC processor: the PowerPC (in 32-bit mode). In this case, multiple assembly language instructions are required:


li 1,0 // Set register 1 to 0
lis 9,g@ha // Load high-adjusted part of g into register 9
stw 1,g@l(9) // Store register 1 to address in register 9 plus low adjusted part g

The lis instruction loads a value into the upper 16 bits of register 9, setting the lower 16 bits to zero. The stw instruction adds a signed 16 bit value to register 9 to form an address, and then stores the value of register 1 at that address. The @hapart of the operand directs the assembler to generate a R_PPC_ADDR16_HA reloc. The @l produces a R_PPC_ADDR16_LO reloc. The goal of these relocs is to compute the value of the symbol g and use it as the store address.

That is enough information to determine the computations performed by these relocs. The R_PPC_ADDR16_HA reloc computes (SYMBOL >> 16) + ((SYMBOL & 0x8000) ? 1 : 0). The R_PPC_ADDR16_LO computes SYMBOL & 0xffff. The extra computation for R_PPC_ADDR16_HA is because the stw instruction adds the signed 16-bit value, which means that if the low 16 bits appears negative we have to adjust the high 16 bits accordingly. The offsets of the relocations are such that the 16-bit resulting values are stored into the appropriate parts of the machine instructions.

The specific examples of relocations I’ve discussed here are ELF specific, but the same sorts of relocations occur for any object file format.

The examples I’ve shown are for relocations which appear in an object file. As discussed in part 4, these types of relocations may also appear in a shared library, if they are copied there by the program linker. In ELF, there are also specific relocation types which never appear in object files but only appear in shared libraries or executables. These are the JMP_SLOT, GLOB_DAT, and RELATIVE relocations discussed earlier. Another type of relocation which only appears in an executable is a COPY relocation, which I will discuss later.

Position Dependent Shared Libraries

I realized that in part 4 I forgot to say one of the important reasons that ELF shared libraries use PLT and GOT tables. The idea of a shared library is to permit mapping the same shared library into different processes. This only works at maximum efficiency if the shared library code looks the same in each process. If it does not look the same, then each process will need its own private copy, and the savings in physical memory and sharing will be lost.

As discussed in part 4, when the dynamic linker loads a shared library which contains position dependent code, it must apply a set of dynamic relocations. Those relocations will change the code in the shared library, and it will no longer be sharable.

The advantage of the PLT and GOT is that they move the relocations elsewhere, to the PLT and GOT tables themselves. Those tables can then be put into a read-write part of the shared library. This part of the shared library will be much smaller than the code. The PLT and GOT tables will be different in each process using the shared library, but the code will be the same.

I’ll be taking a vacation for the long weekend. My next post will most likely be on Tuesday.


Posted

in

by

Tags:

Comments

10 responses to “Linkers part 6”

  1. ncm Avatar

    I’m hoping your linker will implement the omit-uncalled-virtuals optimization, as implemented in Symantecs’s linker years back. Naive linkers see the reference to a virtual function implementation in a virtual function table, itself referenced in a constructor, and link the function even though that function cannot be called by the program. You can tell because that offset into the vtable is never used. You can be smarter: that offset is never used with a static “this” type at or below it in the derivation hierarchy. You can be smarter yet: if the “this” type is below it, and that type or one on the way there provides its own implementation, that can’t call yours.

    It’s tempting to argue that virtual functions are all in shared libraries, these days, or that program size doesn’t matter any more, or that virtual functions aren’t so important any more. However, big programs and embedded programs are often linked statically, and cache/VM footprint still matters, and people still insist on making derivation hierarchies.

  2. […] Linkers part 6 – Relocations, Position Dependent Shared Libraries. […]

  3. Ian Lance Taylor Avatar

    The current GNU linker implemented that optimization for a while, using special relocation types to indicate virtual function calls and the class heirarchy. This information was fed into the garbage collector. This was implemented for eCos. I don’t think anybody really uses it, though, and I don’t know whether it still works correctly.

    Using relocation types was the wrong approach. It should be done using a separate side table in an unloaded section. In any case, this optimization requires cooperation with the compiler. It’s not particularly hard to implement in the linker as part of a garbage collector to discard unreferenced sections.

  4. ncm Avatar

    The optimization would have to be the default, because nobody would know about it; or, if they did, they’d need apparatus in configure to tell whether it was there and turn it on.

    Is there something unsafe about the optimization? I suppose a program could dlopen a library that doesn’t construct a type T, but uses a virtual member of T not referenced in the main program. You’d like to get an unresolved-symbol error at dlopen time, then, just as when the library references a regular symbol that’s not present. But, you ‘d also like to have a way to tell the program linker to retain unused virtuals meant to be available for use by dlopened libraries.

  5. jlh Avatar
    jlh

    In the last paragraph, you say that Position Independant Code moves the relocation targets elsewere, in GOT and PLT. To be more precise, I would say that the relocations targets are moved to the GOT only, which lives in the read/write segment of the memory. The PLT lives in the read/only segment and leverages the GOT to store the resolved function address.

    At least, that’s what I know about PLT and GOT internals on i386, but I think it is the same on other architectures.

  6. Ian Lance Taylor Avatar

    jlh: The 64-bit PowerPC, for example, uses a different scheme. The PLT is not initialized by ld, and lives in uninitialized writable memory. The R_PPC64_JMP_SLOT reloc refers to the PLT.

  7. AR Avatar
    AR

    Why ld wouldn’t fill-in PLT (or GOT) that would be calculated for a given preferred load address?

    In case of preferred address matching actual load address, ldd needs not do anything, all symbols are already resolved. Dynamic linker would, however, have to check several things: the library must be exactly the same as the one used for program linking, it would also have to check if LD_PRELOAD is specified; maybe a few other checks, but in most cases I would expect a match and thus pre-linking would be beneficial.

    For PLT in shared libraries themselves, the same thing could be applied.

    To avoid address colissions and thus relocations, something similar to windows ‘rebase’ could be used to rewrite PLT (or GOT) for the given load address for most common libraries.

  8. Ian Lance Taylor Avatar

    Thanks for the comment.

    This is often called prelinking. On GNU/Linux the prelink tool will do this. It’s also possible to do it in ld itself; in fact, I once implemented it in GNU ld for x86, although the patches were sufficiently complex, and the advantage sufficiently small, that I didn’t contribute them back. The main advantage that prelink has over ld is that prelink can look at all the libraries at once, but ld necessarily sees them one at a time. To make it work you need to put the shared libraries at nonoverlapping addresses. There are a number of complexities which arise, such as symbols defined in multiple shared libraries. These complexities mean that the dynamic linker still has to do something in some cases.

  9. Jay Avatar
    Jay

    When a loader is performing dynamic relocations, it has to work out a physical address for symbol + addend. I’m assuming it does this by looking up a segment and working out what address that segment is loaded at. So does it use the segment containing the symbol’s vaddr, or the segment containing (symbol’s vaddr + addend)?

  10. Ian Lance Taylor Avatar

    Most of the standard PLT and GOT relocations do not use an addend. That said, for those relocations where an addend is used, the dynamic linker will generally look up the symbol value and then apply the addend to the final value.

Leave a Reply