Linkers part 1

I’ve been working on and off on a new linker. To my surprise, I’ve discovered in talking about this that some people, even some computer programmers, are unfamiliar with the details of the linking process. I’ve decided to write some notes about linkers, with the goal of producing an essay similar to my existing one about the GNU configure and build system.

As I only have the time to write one thing a day, I’m going to do this on my blog over time, and gather the final essay together later. I believe that I may be up to five readers, and I hope y’all will accept this digression into stuff that matters. I will return to random philosophizing and minding other people’s business soon enough.

A Personal Introduction

Who am I to write about linkers?

I wrote my first linker back in 1988, for the AMOS operating system which ran on Alpha Micro systems. (If you don’t understand the following description, don’t worry; all will be explained below). I used a single global database to register all symbols. Object files were checked into the database after they had been compiled. The link process mainly required identifying the object file holding the main function. Other objects files were pulled in by reference. I reverse engineered the object file format, which was undocumented but quite simple. The goal of all this was speed, and indeed this linker was much faster than the system one, mainly because of the speed of the database.

I wrote my second linker in 1993 and 1994. This linker was designed and prototyped by Steve Chamberlain while we both worked at Cygnus Support (later Cygnus Solutions, later part of Red Hat). This was a complete reimplementation of the BFD based linker which Steve had written a couple of years before. The primary target was a.out and COFF. Again the goal was speed, especially compared to the original BFD based linker. On SunOS 4 this linker was almost as fast as running the cat program on the input .o files.

The linker I am now working, called gold, on will be my third. It is exclusively an ELF linker. Once again, the goal is speed, in this case being faster than my second linker. That linker has been significantly slowed down over the years by adding support for ELF and for shared libraries. This support was patched in rather than being designed in. Future plans for the new linker include support for incremental linking–which is another way of increasing speed.

There is an obvious pattern here: everybody wants linkers to be faster. This is because the job which a linker does is uninteresting. The linker is a speed bump for a developer, a process which takes a relatively long time but adds no real value. So why do we have linkers at all? That brings us to our next topic.

A Technical Introduction

What does a linker do?

It’s simple: a linker converts object files into executables and shared libraries. Let’s look at what that means. For cases where a linker is used, the software development process consists of writing program code in some language: e.g., C or C++ or Fortran (but typically not Java, as Java normally works differently, using a loader rather than a linker). A compiler translates this program code, which is human readable text, into into another form of human readable text known as assembly code. Assembly code is a readable form of the machine language which the computer can execute directly. An assembler is used to turn this assembly code into an object file. For completeness, I’ll note that some compilers include an assembler internally, and produce an object file directly. Either way, this is where things get interesting.

In the old days, when dinosaurs roamed the data centers, many programs were complete in themselves. In those days there was generally no compiler–people wrote directly in assembly code–and the assembler actually generated an executable file which the machine could execute directly. As languages liked Fortran and Cobol started to appear, people began to think in terms of libraries of subroutines, which meant that there had to be some way to run the assembler at two different times, and combine the output into a single executable file. This required the assembler to generate a different type of output, which became known as an object file (I have no idea where this name came from). And a new program was required to combine different object files together into a single executable. This new program became known as the linker (the source of this name should be obvious).

Linkers still do the same job today. In the decades that followed, one new feature has been added: shared libraries.

More tomorrow.

15 Comments »

  1. rmathew said,

    August 23, 2007 @ 1:53 am

    I am looking forward to the rest of this series. I hope you will also touch upon sorting out template instantiations, doing link-time optimisations, etc. that put additional burdens on a linker making it slower, though also more useful.

    PS: I’m glad that you have started blogging regularly. I like it that you seem to have thought through most of the issues that you blog about, even if at times I don’t find myself agreeing with your conclusions.

  2. ncm said,

    August 23, 2007 @ 7:03 pm

    I too am looking forward to the rest of the series. It seems to me that it’s getting hard to make something recognizable, any more, as a classical linker, now that code generation and optimization are essential parts of the job.

  3. movement said,

    August 23, 2007 @ 7:41 pm

    Linkers and related topics such as runtime loaders and shared libraries
    are a mystery at some level to most programmers, I find. It initially took
    me a lot of time at staring at assembly output to work out link-time
    relocations actually worked: both the Solaris linker and the GNU one are
    pretty impenetrable if you’re just browsing. It’ll be very interesting to hear
    some details from you on these topics.

    Some related reading:

    Sun’s Linkers and Libraries Guide
    http://blogs.sun.com/rie/
    http://blogs.sun.com/msw/
    http://blogs.sun.com/ali/

    John Levine’s old book Linkers and Loaders too. I didn’t get much out of
    this book; unfortunately it was pretty outdated, and not very clearly put
    together. I suspect the exercises would prove interesting to do though.

  4. Ian Lance Taylor said,

    August 23, 2007 @ 8:27 pm

    Thanks for the notes.

    I tend to view template instantiation and link-time optimizations as separate from the linker proper. In implementations I know about, they are done before invoking the linker itself, or they are done via plugins which the linker invokes. That is, under the hood, there is still a classical linker.

    But since there is interest, perhaps I will move on to those topics after covering the linker proper.

  5. Ivan said,

    August 23, 2007 @ 8:41 pm

    Great Idea,

    I look forward to reading these entries.

    Thanks,
    Ivan Novick
    http://www.0×4849.net

  6. christian schorn » Blog Archive » links for 2007-08-30 said,

    August 30, 2007 @ 1:22 pm

    [...] Airs – Ian Lance Taylor » Linkers part 1 (tags: programming basics) [...]

  7. Mark J. Wielaard » Ian Lance Taylor’s Linker Notes said,

    August 31, 2007 @ 8:21 am

    [...] Linkers part 1 – A Personal Introduction and A Technical Introduction. [...]

  8. jrlevine said,

    September 13, 2007 @ 3:32 pm

    Nice series. Believe it or not, relocating loaders predate assemblers, with the first one in the late 1940s, and linking loaders aren’t much later. This technology goes way back.

    Also, I was kind of surprised at the comment that my books was outdated. One of the reasons I wrote it was that linker technology changes so slowly. There hasn’t been an interesting new idea since incremental linkers about 20 years ago, knowledge of linkers has been mostly programmer folklore, so I figured I’d write it down so it’d be at last available somewhere. The descriptions of ELF, ECOFF, and they way they support dynamic linking are as far as I know still current, nothing’s changed since I wrote the book in 2000.

  9. Ian Lance Taylor said,

    September 13, 2007 @ 8:30 pm

    Thanks for the note. There is a lot I don’t know about the history.

    I didn’t make the comment about your book myself; it’s certainly the best description of linkers I know of. Still, unless I misremember, there are some recent important ideas which aren’t covered, such as ELF symbol versions, ELF (and Mach-O) symbol visibility, interposition with LD_PRELOAD and the like, TLS details. I don’t actually have a copy to hand, so I hope I am not misrepresenting it. These are not major ideas like incremental linking, but they are things which the relatively few people who work with linkers need to understand.

  10. avjo said,

    November 4, 2007 @ 11:30 pm

    Hi Ian,

    This series is so educating and interesting ! Thank
    you for that !

    Just wondering here… will your new linker be GPLv2 or GPLv3 ?

    ~avjo

  11. Ian Lance Taylor said,

    November 5, 2007 @ 6:07 am

    The goal is for the new linker to be part of the GNU binutils, which means that it will be GPLv3.

    (I’ll add that I think that in practice there is very little difference between GPLv2 and GPLv3.)

  12. E. Huntley - Programming & Development Blog » Blog Archive » Back to the “Basics” said,

    February 23, 2009 @ 8:48 am

    [...] The entries start here. And continue through his entry archives to mid September 2007. I highly reccommend giving at least the first few entries a quick read-through if you are like me, and want a better understading of the development tools we use every day. [...]

  13. zur::Linux » Gold Linker said,

    September 1, 2011 @ 10:02 am

    [...] If you want to know more about linkers and Gold in particular Ian Lance Taylor has a twenty-part series about linker internals on his blog. [...]

  14. Yearzero.flaminghorns.com − September is Linker Month!!! said,

    August 27, 2013 @ 6:50 am

    [...] There are around 20 odd articles to be read. The link series for the article is as follows: http://www.airs.com/blog/archives/38http://www.airs.com/blog/archives/57 Tagged and categorized as: General | No comments [...]

  15. Relocatable objects - BYTEC/16 said,

    December 16, 2013 @ 1:03 pm

    [...] code generation, and that’s exactly what symbol tables and relocation tables are used for. I used this material by Ian Lance Taylor to understand the basics of [...]

RSS feed for comments on this post · TrackBack URI

Leave a Comment

You must be logged in to post a comment.