Archive for January, 2010

Version Scripts

I recently spent some time sorting through linker version script issues, so I’m going to document what I discovered.

Linker symbol versioning was invented at Sun. The Solaris linker lets you use a version script when you create a shared library. This script assigns versions to specific named symbols, and defines a version hierarchy. When an executable is linked against the shared library, the versions that it uses are recorded in the executable. If you later try to dynamically link the executable with a shared library which does not provide the required versions, you get a sensible error message.

Sun’s scheme (as I understand it) only permits you to add new versions and new symbols. Once a symbol has been defined at a specific version, you can not change that in later releases. if you change the behaviour of a symbol, you don’t change the version of the symbol itself, instead you add a new version to the library even if it does not define any symbols. That is sufficient to ensure that an executable will not be dynamically linked against a version of the shared library which is too old.

Eric Youngdale and Ulrich Drepper introduced a more sophisticated symbol versioning scheme in the GNU linker and the GNU/Linux dynamic linker. The GNU linker permits symbols to have multiple versions, of which only one is the default. These versions are specified in the object files linked together to form the shared library. The assembler .symver directive is used to assign a version to a symbol (the version is simply encoded in the name of the symbol). This scheme permits using symbol versioning to actually change the behaviour of a symbol; older executables will continue to use the old version. This also permits deleting symbols, by removing the default version. The older versions of the symbol remain but are inaccessible.

That is all fine. The problems come in with the extensions to the version script language. First, the GNU linker permits wildcards in version scripts. Second, the GNU linker permits symbols to match against demangled names, again typically using wildcards. Third, the GNU linker permits the version script to hide symbols which have explicit versions in input object files.

Every symbol can only have one version. When the linker asks for the version of a symbol, there can only be one answer. The support for wildcards and matching of demangled names in the GNU linker script means that there may not be a unique answer for the version to use for a given name. The fact that the GNU linker permits version scripts to hide symbols with explicit versions means that in some cases you absolutely must list a symbol two times in a version script (because you might have a local: *; entry which must not match your symbol with an old version). This potential confusion means that using linker scripts correctly with wildcards requires a clear understanding of exactly how the linker parses a version script.

Unfortunately, this was never documented. Until now. Here are the rules which the GNU linker uses to parse version scripts, as of 2010-01-11.

The GNU linker walks through the version tags in the order in which they appear in the version script. For each tag, it first walks through the global patterns for that tag, then the local patterns. When looking at a single pattern, it first applies any language specific demangling as specified for the pattern, and then matches the resulting symbol name to the pattern. If it finds an exact match for a literal pattern (a pattern enclosed in quotes or with no wildcard characters), then that is the match that it uses. If finds a match with a wildcard pattern, then it saves it and continues searching. Wildcard patterns that are exactly “*” are saved separately.

If no exact match with a literal pattern is ever found, then if a wildcard match with a global pattern was found it is used, otherwise if a wildcard match with a local pattern was found it is used.

This is the result:

  • If there is an exact match, then we use the first tag in the version script where it matches.
    • If the exact match in that tag is global, it is used.
    • Otherwise the exact match in that tag is local, and is used.
  • Otherwise, if there is any match with a global wildcard pattern:
    • If there is any match with a wildcard pattern which is not “*”, then we use the tag in which the last such pattern appears.
    • Otherwise, we matched “*”. If there is no match with a local wildcard pattern which is not “*”, then we use the last match with a global “*”. Otherwise, continue.
  • Otherwise, if there is any match with a local wildcard pattern:
    • If there is any match with a wildcard pattern which is not “*”, then we use the tag in which the last such pattern appears.
    • Otherwise, we matched “*”, and we use the tag in which the last such match occurred.

As mentioned above, there is an additional wrinkle. When the GNU linker finds a symbol with a version defined in an object file due to a .symver directive, it looks up that symbol name in that version tag. If it finds it, it matches the symbol name against the patterns for that version. If there is no match with a global pattern, but there is a match with a local pattern, then the GNU linker marks the symbol as local.

I want gold to be compatible, but I also want gold to be efficient. I’ve introduced a hash table in gold to do fast lookups for exact matches. That makes it impossible for gold to follow the exact rules when matching demangled names. Currently gold does not do the final lookup to see if a symbol with an explicit version should be forced local; I don’t understand why that is useful. It is possible that I will be forced to add that to gold at some later date.

Here are the current rules for gold:

  • If there is an exact match for the mangled name, we use it.
    • If there is more than one exact match, we give a warning, and we use the first tag in the script which matches.
    • If a symbol has an exact match as both global and local for the same version tag, we give an error.
  • Otherwise, we look for an extern C++ or an extern Java exact match. If we find an exact match, we use it.
    • If there is more than one exact match, we give a warning, and we use the first tag in the script which matches.
    • If a symbol has an exact match as both global and local for the same version tag, we give an error.
  • Otherwise, we look through the wildcard patterns, ignoring “*” patterns. We look through the version tags in reverse order. For each version tag, we look through the global patterns and then the local patterns. We use the first match we find (i.e., the last matching version tag in the file).
  • Otherwise, we use the “*” pattern if there is one. We give a warning if there are multiple “*” patterns.

I hope for your sake that this information never actually matters to you.

Comments

AOL Time Warner

The merger between AOL and Time Warner, the biggest merger in U.S. history, happened ten years ago. It is now generally considered to have been the worst merger of all time.

Hindsight is 20/20, but I clearly remember that when I first read it I thought it was a joke. I was working in Silicon Valley, as I still do today, and it was obvious that the Valley was in the boom part of its regular boom/bust cycle. Money was pouring into Internet companies, but it was obvious that it was a bubble headed for a crash. That’s not hindsight either, it’s what I and plenty of other people thought at the time.

So AOL’s stock price was obvious vapor, especially considering that cable modems were starting to spread and AOL had no obvious plans to get out of the cheap dialup world that it lived in. AOL’s walled garden had already disappeared into the wider Internet.

Time Warner, on the other hand, was a serious company with real products and real continuing customers. Actually they turned out to be heading into a terrible decade along with the rest of the media, but I didn’t see that coming. The notion that they would merge with AOL—actually AOL’s market cap was higher so it was more of an acquisition of Time Warner by AOL—seemed completely laughable to me.

A friend of mine suggested that the real goal was for Steve Case to put real grounding under AOL’s absurd market cap, by using it to buy a real company. I’m not sure I agree—I think Case may have really believed that AOL somehow deserved its market cap. I have no idea how he convinced everybody else involved. It just seems so obviously crazy on the face of it.

Of course, one thing arguing in favor of the merger is that our form of capitalism requires companies to always grow, which is very hard for very large companies to do. At some point, the personal incentives for executives are such they will do better if they do something even if it looks crazy, because doing nothing will certainly not lead to growth. I haven’t checked I’m sure the executives who agreed to the merger did fine out of it.

There have been mergers which I thought would fail but turned out to more-or-less succeed, such as HP/Compaq. But I always thought AOL/Time Warner one would fail, and on that one I was right.

Comments

Climate Change

Enough about programming for a moment. Here are some questions I have about climate change

1. Temperature measurements clearly show that the Arctic regions are warming up. Atmosphere measurements clearly show that the amount of carbon dioxide in the air is increasing. The physics showing that increasing carbon dioxide in the atmosphere can lead to increasing temperature are simple enough that I can understand them. This all hangs together nicely. It seems to me that people who argue that climate change is not occurring need to explain why increasing carbon dioxide does not make the earth warmer, and why the earth is getting warmer anyhow. That is, instead of having a neat explanation for what real measurements show, they have two puzzles. It does not seem parsimonious.

2. I’ve seen arguments that climate change may be occurring, but that it may not be due to human activity. To which I can only respond, who cares what causes it? We should still think about what to do about it.

3. I frequently see that it may be too costly to take the actions required to stop climate change. This argument seems like a misunderstanding of economics. Changes to the climate are a classic economic externality. Somebody has to pay for them, one way or another. If we don’t apply a carbon tax one way or another, then it will be applied for us later on. You can’t escape an externality by pretending that it doesn’t happen, the best you can do is push it onto somebody else. In our case we are currently pushing it onto future generations. That may be a rational choice, if we believe that future generations will be richer than we are. But we shouldn’t pretend that we can’t afford to stop climate change. We don’t have a choice about whether to pay for it.

4. In any case, adjusting for climate change only hurts the economy if we measure the economy in terms of consumption of natural resources. That is not a very relevant measure of the U.S. economy today. Shifting to different technologies will cost jobs in some areas and create job in others. We can grow the economy while using fewer natural resources. It’s entirely possible for us to shift to carbon neutral technologies without hurting the economy (that is, it is possible in an economic sense; it may or may not be possible in a technological sense). I’m not saying it will be easy, but it is certainly possible. The U.S. is currently letting countries like Germany and China get a significant technological lead in this area, but it’s probably not too late to catch up.

5. I really don’t understand why climate change has become a partisan issue in the U.S. There is nothing either Republican or Democratic about science, or about skepticism.

6. Freeman Dyson makes the very interesting point that about 8% of the carbon dioxide in the atmosphere is absorbed by vegetation every year. That means that biological processes can have enormous effects on carbon dioxide levels. We should be experimenting with biologically based forms of carbon capture. Probably people are already doing this.

7. The climate is already changing. I think the only interesting question now is whether we will prevent a significant rise in sea level. My current bet is that we won’t; history is definitely on the side of people avoiding dealing with environment issues until they are blatant. This is not a good time to invest in beachfront property.

Comments (15)

Cargo Cult Programming

I recently encountered a nice example of cargo cult programming. In bug 10980 Robert Wohlrab helpfully built a large number of Debian packages with the gold linker and reported errors about unknown options. These were options supported by the GNU linker but not by gold. (I’ve now added all the options to gold).

Among the options that packages used were -g and -assert. The GNU linker accepts and ignores these options. It has never done anything with them. Why do people pass them to the linker? I can only assume that they were copied from some other linker invocation.

In today’s increasingly complex world of programming, when so much code involves integrating libraries in various ways, I expect that cargo cult programming is on the rise.

Comments (1)

Generics

The goal of generics in a programming language is to save programmer time. Generics permit you to write code that works with multiple types only once. They also permit one programmer to write generic algorithms which you can use with your program’s types. This kind of thing is attractive to anybody who has written a lot of C code, where you find yourself writing the same linked list and hash table code over and over again. The core code will use void * pointers but for everything else you need type casts. Those type casts are mechanical and tedious.

I would say that the most significant advantage of C++ over C is the C++ mechanism for generic programming, which is templates. C++ has a powerful template system which is somewhat like a macro system built into the language proper. It’s not really a macro system—in a true macro system the code would be parsed only at template instantiation time, but in fact C++ code is parsed at template definition time—but I’ve discovered that people who are not deeply familiar with the language think of it that way, and it is an effective approach. C++ pays for this power in compilation time, as each template must be separately compiled for each type for which it is instantiated. It also pays in language complexity, which is most visible to users in the widely noted incomprehensible error messages. For a while the new C++ standard introduced the idea of concepts which would at least improve the error messages, but concepts themselves turned out to be very complex and were dropped from the standardization process for now.

Other languages have other approaches to generics. I know these languages less well, so I hope I don’t make any mistakes. Java restricts generics to types which are inherited from Object, which permits the templates to be instantiated a single time; a cost of Java’s approach is that the actual types are lost (this is known as type erasure). C# instantiates generics at runtime using a JIT; all types which are representable as pointers can share a single instantiation. ML has two different generics implementations: simple polymorphism which can be used for, e.g., type-safe containers, and a powerful system in which you can write a transformation which takes a set of types, variables, and functions and produces a different set; the latter system is pretty much as powerful as C++ templates but does not need to be compiled each time (I think) and produces comprehensible error messages. LISP has an extremely powerful macro system which simply rewrites the program, something which is relatively easy to do in LISP due to the simple syntax; this is so powerful that it lets you add new syntactic constructs to your program.

I’ve been thinking about generics because of Go. Go’s goals include fast compilation and efficient execution. Those goals are in conflict when it comes to generics. Go is not interpreted, and it is not the case that every type looks like a pointer. Therefore efficient execution suggests separate compilation for each instantiation, which of course slows down compilation time. ML’s approach is quite powerful, but it may be complex to use in simple cases, and it’s hard to understand how to implement it for Go. Relying on every type matching the empty interface doesn’t help when somebody wants to write a generic function for chan T, since chan int is not the same as chan interface{}.

Implementing generics in Go looks like a struggle among the goals of fast compilation time (i.e., avoid multiple compilation of generics), easy programming (i.e., have some form of generics), and fast execution (i.e., different types are represented differently). All languages have those goals in different degrees, and all languages seem to compromise on one of them when it comes to implementing generics. The question for Go is whether we can find an appropriate compromise.

Comments (18)

« Previous Page « Previous Page Next entries »