Archive for September, 2007

Linkers part 13

Symbol Versions Redux

I’ve talked about symbol versions from the linker’s point of view. I think it’s worth discussing them a bit from the user’s point of view.

As I’ve discussed before, symbol versions are an ELF extension designed to solve a specific problem: making it possible to upgrade a shared library without changing existing executables. That is, they provide backward compatibility for shared libraries. There are a number of related problems which symbol versions do not solve. They do not provide forward compatibility for shared libraries: if you upgrade your executable, you may need to upgrade your shared library also (it would be nice to have a feature to build your executable against an older version of the shared library, but that is difficult to implement in practice). They only work at the shared library interface: they do not help with a change to the ABI of a system call, which is at the kernel interface. They do not help with the problem of sharing incompatible versions of a shared library, as may happen when a complex application is built out of several different existing shared libraries which have incompatible dependencies.

Despite these limitations, shared library backward compatibility is an important issue. Using symbol versions to ensure backward compatibility requires a careful and rigorous approach. You must start by applying a version to every symbol. If a symbol in the shared library does not have a version, then it is impossible to change it in a backward compatible fashion. Then you must pay close attention to the ABI of every symbol. If the ABI of a symbol changes for any reason, you must provide a copy which implements the old ABI. That copy should be marked with the original version. The new symbol must be given a new version.

The ABI of a symbol can change in a number of ways. Any change to the parameter types or the return type of a function is an ABI change. Any change in the type of a variable is an ABI change. If a parameter or a return type is a struct or class, then any change in the type of any field is an ABI change–i.e., if a field in a struct points to another struct, and that struct changes, the ABI has changed. If a function is defined to return an instance of an enum, and a new value is added to the enum, that is an ABI change. In other words, even minor changes can be ABI changes. The question you need to ask is: can existing code which has already been compiled continue to use the new symbol with no change? If the answer is no, you have an ABI change, and you must define a new symbol version.

You must be very careful when writing the symbol implementing the old ABI, if you don’t just copy the existing code. You must be certain that it really does implement the old ABI.

There are some special challenges when using C++. Adding a new virtual method to a class can be an ABI change for any function which uses that class. Providing the backward compatible version of the class in such a situation is very awkward–there is no natural way to specify the name and version to use for the virtual table or the RTTI information for the old version.

Naturally, you must never delete any symbols.

Getting all the details correct, and verifying that you got them correct, requires great attention to detail. Unfortunately, I don’t know of any tools to help people write correct version scripts, or to verify them. Still, if implemented correctly, the results are good: existing executables will continue to run.

Static Linking vs. Dynamic Linking

There is, of course, another way to ensure that existing executables will continue to run: link them statically, without using any shared libraries. That will limit their ABI issues to the kernel interface, which is normally significantly smaller than the library interface.

There is a performance tradeoff with static linking. A statically linked program does not get the benefit of sharing libraries with other programs executing at the same time. On the other hand, a statically linked program does not have to pay the performance penalty of position independent code when executing within the library.

Upgrading the shared library is only possible with dynamic linking. Such an upgrade can provide bug fixes and better performance. Also, the dynamic linker can select a version of the shared library appropriate for the specific platform, which can also help performance.

Static linking permits more reliable testing of the program. You only need to worry about kernel changes, not about shared library changes.

Some people argue that dynamic linking is always superior. I think there are benefits on both sides, and which choice is best depends on the specific circumstances.

More on Monday. If you think I should write about any specific linker related topics which have not already been mentioned in the comments, please let me know.

Comments (2)

Linkers part 12

I apologize for the pause in posts. We moved over the weekend. Last Friday at&t told me that the new DSL was working at our new house. However, it did not actually start working outside the house until Wednesday. Then a problem with the internal wiring meant that it was not working inside the house until today. I am now finally back online at home.

Symbol Resolution

I find that symbol resolution is one of the trickier aspects of a linker. Symbol resolution is what the linker does the second and subsequent times that it sees a particular symbol. I’ve already touched on the topic in a few previous entries, but let’s look at it in a bit more depth.

Some symbols are local to a specific object files. We can ignore these for the purposes of symbol resolution, as by definition the linker will never see them more than once. In ELF these are the symbols with a binding of STB_LOCAL.

In general, symbols are resolved by name: every symbol with the same name is the same entity. We’ve already seen a few exceptions to that general rule. A symbol can have a version: two symbols with the same name but different versions are different symbols. A symbol can have non-default visibility: a symbol with hidden visibility in one shared library is not the same as a symbol with the same name in a different shared library.

The characteristics of a symbol which matter for resolution are:

  • The symbol name
  • The symbol version.
  • Whether the symbol is the default version or not.
  • Whether the symbol is a definition or a reference or a common symbol.
  • The symbol visibility.
  • Whether the symbol is weak or strong (i.e., non-weak).
  • Whether the symbol is defined in a regular object file being included in the output, or in a shared library.
  • Whether the symbol is thread local.
  • Whether the symbol refers to a function or a variable.

The goal of symbol resolution is to determine the final value of the symbol. After all symbols are resolved, we should know the specific object file or shared library which defines the symbol, and we should know the symbol’s type, size, etc. It is possible that some symbols will remain undefined after all the symbol tables have been read; in general this is only an error if some relocation refers to that symbol.

At this point I’d like to present a simple algorithm for symbol resolution, but I don’t think I can. I’ll try to hit all the high points, though. Let’s assume that we have two symbols with the same name. Let’s call the symbol we saw first A and the new symbol B. (I’m going to ignore symbol visibility in the algorithm below; the effects of visibility should be obvious, I hope.)

  1. If A has a version:
    • If B has a version different from A, they are actually different symbols.
    • If B has the same version as A, they are the same symbol; carry on.
    • If B does not have a version, and A is the default version of the symbol, they are the same symbol; carry on.
    • Otherwise B is probably a different symbol. But note that if A and B are both undefined references, then it is possible that A refers to the default version of the symbol but we don’t yet know that. In that case, if B does not have a version, A and B really are the same symbol. We can’t tell until we see the actual definition.
  2. If A does not have a version:
    • If B does not have a version, they are the same symbol; carry on.
    • If B has a version, and it is the default version, they are the same symbol; carry on.
    • Otherwise, B is probably a different symbol, as above.
  3. If A is thread local and B is not, or vice-versa, then we have an error.
  4. If A is an undefined reference:
    • If B is an undefined reference, then we can complete the resolution, and more or less ignore B.
    • If B is a definition or a common symbol, then we can resolve A to B.
  5. If A is a strong definition in an object file:
    • If B is an undefined reference, then we resolve B to A.
    • If B is a strong definition in an object file, then we have a multiple definition error.
    • If B is a weak definition in an object file, then A overrides B. In effect, B is ignored.
    • If B is a common symbol, then we treat B as an undefined reference.
    • If B is a definition in a shared library, then A overrides B. The dynamic linker will change all references to B in the shared library to refer to A instead.
  6. If A is a weak definition in an object file, we act just like the strong definition case, with one exception: if B is a strong definition in an object file. In the original SVR4 linker, this case was treated as a multiple definition error. In the Solaris and GNU linkers, this case is handled by letting B override A.
  7. If A is a common symbol in an object file:
    • If B is a common symbol, we set the size of A to be the maximum of the size of A and the size of B, and then treat B as an undefined reference.
    • If B is a definition in a shared library with function type, then A overrides B (this oddball case is required to correctly handle some Unix system libraries).
    • Otherwise, we treat A as an undefined reference.
  8. If A is a definition in a shared library, then if B is a definition in a regular object (strong or weak), it overrides A. Otherwise we act as though A were defined in an object file.
  9. If A is a common symbol in a shared library, we have a funny case. Symbols in shared libraries must have addresses, so they can’t be common in the same sense as symbols in an object file. But ELF does permit symbols in a shared library to have the type STT_COMMON (this is a relatively recent addition). For purposes of symbol resolution, if A is a common symbol in a shared library, we still treat it as a definition, unless B is also a common symbol. In the latter case, B overrides A, and the size of B is set to the maximum of the size of A and the size of B.

I hope I got all that right.

More tomorrow, assuming the Internet connection holds up.

Comments (2)

Linkers part 11

Archives

Archives are a traditional Unix package format. They are created by the ar program, and they are normally named with a .a extension. Archives are passed to a Unix linker with the -l option.

Although the ar program is capable of creating an archive from any type of file, it is normally used to put object files into an archive. When it is used in this way, it creates a symbol table for the archive. The symbol table lists all the symbols defined by any object file in the archive, and for each symbol indicates which object file defines it. Originally the symbol table was created by the ranlib program, but these days it is always created by ar by default (despite this, many Makefiles continue to run ranlib unnecessarily).

When the linker sees an archive, it looks at the archive’s symbol table. For each symbol the linker checks whether it has seen an undefined reference to that symbol without seeing a definition. If that is the case, it pulls the object file out of the archive and includes it in the link. In other words, the linker pulls in all the object files which defines symbols which are referenced but not yet defined.

This operation repeats until no more symbols can be defined by the archive. This permits object files in an archive to refer to symbols defined by other object files in the same archive, without worrying about the order in which they appear.

Note that the linker considers an archive in its position on the command line relative to other object files and archives. If an object file appears after an archive on the command line, that archive will not be used to defined symbols referenced by the object file.

In general the linker will not include archives if they provide a definition for a common symbol. You will recall that if the linker sees a common symbol followed by a defined symbol with the same name, it will treat the common symbol as an undefined reference. That will only happen if there is some other reason to include the defined symbol in the link; the defined symbol will not be pulled in from the archive.

There was an interesting twist for common symbols in archives on old a.out-based SunOS systems. If the linker saw a common symbol, and then saw a common symbol in an archive, it would not include the object file from the archive, but it would change the size of the common symbol to the size in the archive if that were larger than the current size. The C library relied on this behaviour when implementing the stdin variable.

My next posting should be on Monday.

Comments (11)

Linkers part 10

Parallel Linking

It is possible to parallelize the linking process somewhat. This can help hide I/O latency and can take better advantage of modern multi-core systems. My intention with gold is to use these ideas to speed up the linking process.

The first area which can be parallelized is reading the symbols and relocation entries of all the input files. The symbols must be processed in order; otherwise, it will be difficult for the linker to resolve multiple definitions correctly. In particular all the symbols which are used before an archive must be fully processed before the archive is processed, or the linker won’t know which members of the archive to include in the link (I guess I haven’t talked about archives yet). However, despite these ordering requirements, it can be beneficial to do the actual I/O in parallel.

After all the symbols and relocations have been read, the linker must complete the layout of all the input contents. Most of this can not be done in parallel, as setting the location of one type of contents requires knowing the size of all the preceding types of contents. While doing the layout, the linker can determine the final location in the output file of all the data which needs to be written out.

After layout is complete, the process of reading the contents, applying relocations, and writing the contents to the output file can be fully parallelized. Each input file can be processed separately.

Since the final size of the output file is known after the layout phase, it is possible to use mmap for the output file. When not doing relaxation, it is then possible to read the input contents directly into place in the output file, and to relocation them in place. This reduces the number of system calls required, and ideally will permit the operating system to do optimal disk I/O for the output file.

Just a short entry tonight. More tomorrow.

Comments

Linkers part 9

Symbol Versions

A shared library provides an API. Since executables are built with a specific set of header files and linked against a specific instance of the shared library, it also provides an ABI. It is desirable to be able to update the shared library independently of the executable. This permits fixing bugs in the shared library, and it also permits the shared library and the executable to be distributed separately. Sometimes an update to the shared library requires changing the API, and sometimes changing the API requires changing the ABI. When the ABI of a shared library changes, it is no longer possible to update the shared library without updating the executable. This is unfortunate.

For example, consider the system C library and the stat function. When file systems were upgraded to support 64-bit file offsets, it became necessary to change the type of some of the fields in the stat struct. This is a change in the ABI of stat. New versions of the system library should provide a stat which returns 64-bit values. But old existing executables call stat expecting 32-bit values. This could be addressed by using complicated macros in the system header files. But there is a better way.

The better way is symbol versions, which were introduced at Sun and extended by the GNU tools. Every shared library may define a set of symbol versions, and assign specific versions to each defined symbol. The versions and symbol assignments are done by a script passed to the program linker when creating the shared library.

When an executable or shared library A is linked against another shared library B, and A refers to a symbol S defined in B with a specific version, the undefined dynamic symbol reference S in A is given the version of the symbol S in B. When the dynamic linker sees that A refers to a specific version of S, it will link it to that specific version in B. If B later introduces a new version of S, this will not affect A, as long as B continues to provide the old version of S.

For example, when stat changes, the C library would provide two versions of stat, one with the old version (e.g., LIBC_1.0), and one with the new version (LIBC_2.0). The new version of stat would be marked as the default–the program linker would use it to satisfy references to stat in object files. Executables linked against the old version would require the LIBC_1.0 version of stat, and would therefore continue to work. Note that it is even possible for both versions of stat to be used in a single program, accessed from different shared libraries.

As you can see, the version effectively is part of the name of the symbol. The biggest difference is that a shared library can define a specific version which is used to satisfy an unversioned reference.

Versions can also be used in an object file (this is a GNU extension to the original Sun implementation). This is useful for specifying versions without requiring a version script. When a symbol name containts the @ character, the string before the @ is the name of the symbol, and the string after the @ is the version. If there are two consecutive @ characters, then this is the default version.

Relaxation

Generally the program linker does not change the contents other than applying relocations. However, there are some optimizations which the program linker can perform at link time. One of them is relaxation.

Relaxation is inherently processor specific. It consists of optimizing code sequences which can become smaller or more efficient when final addresses are known. The most common type of relaxation is for call instructions. A processor like the m68k supports different PC relative call instructions: one with a 16-bit offset, and one with a 32-bit offset. When calling a function which is within range of the 16-bit offset, it is more efficient to use the shorter instruction. The optimization of shrinking these instructions at link time is known as relaxation.

Relaxation is applied based on relocation entries. The linker looks for relocations which may be relaxed, and checks whether they are in range. If they are, the linker applies the relaxation, probably shrinking the size of the contents. The relaxation can normally only be done when the linker recognizes the instruction being relocated. Applying a relaxation may in turn bring other relocations within range, so relaxation is typically done in a loop until there are no more opportunities.

When the linker relaxes a relocation in the middle of a contents, it may need to adjust any PC relative references which cross the point of the relaxation. Therefore, the assembler needs to generate relocation entries for all PC relative references. When not relaxing, these relocations may not be required, as a PC relative reference within a single contents will be valid whereever the contents winds up. When relaxing, though, the linker needs to look through all the other relocations that apply to the contents, and adjust PC relatives one where appropriate. This adjustment will simply consist of recomputing the PC relative offset.

Of course it is also possible to apply relaxations which do not change the size of the contents. For example, on the MIPS the position independent calling sequence is normally to load the address of the function into the $25 register and then to do an indirect call through the register. When the target of the call is within the 18-bit range of the branch-and-call instruction, it is normally more efficient to use branch-and-call, since then the processor does not have to wait for the load of $25 to complete before starting the call. This relaxation changes the instruction sequence without changing the size.

More tomorrow. I apologize for the haphazard arrangement of these linker notes. I’m just writing about ideas as I think of them, rather than being organized about that. If I do collect these notes into an essay, I’ll try to make them more structured.

Comments (6)

« Previous Page« Previous entries « Previous Page · Next Page » Next entries »Next Page »