I recently added support for STT_GNU_IFUNC to gold.
As you can tell by the name, STT_GNU_IFUNC is a GNU specific ELF symbol type. It is defined in the range reserved for operating system specific types. A symbol with type STT_GNU_IFUNC is a function, but the symbol does not provide the address of the function as usual. Instead, the symbol provides the address of a function which returns a pointer to the actual function. The idea is that this lets a program use a version of a function which is customized for a particular system. A typical example would be memcpy; the program would choose a version of memcpy optimized for the current CPU. This is in fact what happens in the mainline version of glibc.
STT_GNU_IFUNC is implemented with a little bit of support in the compiler and assembler, basically just enough to set the symbol type. The rest of the implementation is in the linker and the dynamic linker. Here I’ll discuss the support required in the linker.
An STT_GNU_IFUNC symbol always uses a PLT entry, and all references to the symbol go through the PLT. This is true even for a local symbol, although local symbols normally do not require PLT entries. The PLT entry will refer to a GOT entry as usual. The GOT entry will be given an IRELATIVE reloc (a new reloc type) rather than the usual JUMP_SLOT reloc. For a normal PLT entry the GOT entry is initialized to point to the PLT entry, in a way that causes the PLT entry to initialize itself the first time it is called. For an STT_GNU_IFUNC PLT entry, the GOT entry instead points to the symbol’s value, which is the function to call to get the real function address.
Normally, when we need a GOT entry for a function because position independent code takes its address, and we know the value of the function because we are, say, linking the executable, we can set the GOT entry to the value of the symbol. But for an STT_GNU_IFUNC symbol, that would mean that calling the function pointer would return the function to call, rather than calling the actual function. So instead we set the GOT entry to point to the offset in the PLT.
When a global or static variable in a position independent executable is initialized to the address of a local function, we would normally use a RELATIVE reloc. For an STT_GNU_IFUNC symbol, we instead use an IRELATIVE reloc. That will match the address seen in a shared library.
A statically linked executable normally does not have any dynamic relocs. In order to make the IRELATIVE relocs work in a static executable, they are grouped together with symbols to mark the begin and end of the group. The glibc startup code then uses those symbols to resolve all IRELATIVE relocs when the program starts. The symbols are
__rel_iplt_end. For a target which uses SHT_RELA relocs, the symbol names use
rela instead of
That is pretty much it for the linker support.
Ideally this would permit STT_GNU_IFUNC symbols to be initialized lazily: the first time the function is called, the PLT entry would jump to the function which returns the real function address to use, and that would be stored into the corresponding GOT entry for future calls through the PLT. However, there is no code there to actually update the PLT entry. Normal PLT initialization calls into the dynamic linker; this process would not. It might be possible to make this work, but the current glibc dynamic linker does not try. Instead, all the IRELATIVE relocs are processed when the program starts up, even if they are in the set of relocs which are normally resolved lazily.
That is all a bit of pain, but overall it’s not too hard to implement. Is it really worth it? Calling an STT_GNU_IFUNC function will jump to the PLT entry, which will load a value from the GOT and jump to it. On x86 this is a single jump indirect instruction. For the CPU this requires loading the jump indirection instruction, loading the actual address, and executing the jump.
An alternative to all this linker stuff would be a variable holding a function pointer. The function could then be written in assembler to do the indirect jump. The variable would be initialized at program startup time. The efficiency would be the same. The address of the function would be the address of the indirect jump, so function pointers would compare consistently. Of course it wouldn’t be quite so automatic, but on the other hand it wouldn’t require special support in the compiler, assembler, linker, and dynamic linker. I’m not sure why that approach was not taken. There may be a good reason.