Linkers part 11

Archives

Archives are a traditional Unix package format. They are created by the ar program, and they are normally named with a .a extension. Archives are passed to a Unix linker with the -l option.

Although the ar program is capable of creating an archive from any type of file, it is normally used to put object files into an archive. When it is used in this way, it creates a symbol table for the archive. The symbol table lists all the symbols defined by any object file in the archive, and for each symbol indicates which object file defines it. Originally the symbol table was created by the ranlib program, but these days it is always created by ar by default (despite this, many Makefiles continue to run ranlib unnecessarily).

When the linker sees an archive, it looks at the archive’s symbol table. For each symbol the linker checks whether it has seen an undefined reference to that symbol without seeing a definition. If that is the case, it pulls the object file out of the archive and includes it in the link. In other words, the linker pulls in all the object files which defines symbols which are referenced but not yet defined.

This operation repeats until no more symbols can be defined by the archive. This permits object files in an archive to refer to symbols defined by other object files in the same archive, without worrying about the order in which they appear.

Note that the linker considers an archive in its position on the command line relative to other object files and archives. If an object file appears after an archive on the command line, that archive will not be used to defined symbols referenced by the object file.

In general the linker will not include archives if they provide a definition for a common symbol. You will recall that if the linker sees a common symbol followed by a defined symbol with the same name, it will treat the common symbol as an undefined reference. That will only happen if there is some other reason to include the defined symbol in the link; the defined symbol will not be pulled in from the archive.

There was an interesting twist for common symbols in archives on old a.out-based SunOS systems. If the linker saw a common symbol, and then saw a common symbol in an archive, it would not include the object file from the archive, but it would change the size of the common symbol to the size in the archive if that were larger than the current size. The C library relied on this behaviour when implementing the stdin variable.

My next posting should be on Monday.

11 Comments »

  1. baruch said,

    October 9, 2007 @ 7:29 am

    What is the reason for the order between the archives and the object files? It can make life easier if the order doesn’t matter and you can just place all objects and archives on the command line and let the linker sort it all out.

    I believe the microsoft linker doesn’t care much about order.

  2. Ian Lance Taylor said,

    October 9, 2007 @ 9:32 pm

    Thanks for the note.

    I suspect that the original reason for the ordering was just simplicity. In the original Unix linkers, even the archives were searched in order; there was no archive symbol table. The tsort program, which can still be found on a Unix system near you, was used to sort the object files so that the ones which satisfied references of objects in the archive were found later in the archive. The lorder shell script built a partial order of dependencies, called tsort to build the total order, and built the archive in that order.

    Now that the ordering has been established, people take advantage of it to interpose libraries, so that you can supply your own definitions of functions overriding the ones in an archive.

    Come to think of it, I never got around to discussing interposition of shared libraries. I’ll try to remember to do that some day.

  3. baruch said,

    October 9, 2007 @ 10:39 pm

    Thanks for the information on how to get the objects and archives automatically sorted. I don’t care much about the games that can be played, I just want the simplicity of letting the computer do the work I want it to do with the least amount of work on my part.

    FWIW, I’d be happy to beta test your gold linker, the application at my workplace takes several minutes to link, getting it down will be so nice. I’m willing to act as a guinea pig even for an incomplete linker ;-)

  4. Ian Lance Taylor said,

    October 10, 2007 @ 9:29 am

    Keep an eye on the binutils mailing list (see http://sourceware.org/binutils/). I’ll announce gold there when it is ready to beta test.

  5. avjo said,

    November 7, 2007 @ 12:30 am

    > The C library relied on this behaviour when implementing
    > the stdin variable.

    Interesting! Can you please elaborate ? Thanks !

  6. Ian Lance Taylor said,

    November 7, 2007 @ 6:23 am

    Unfortunately, I don’t remember the exact details of the SunOS 4 a.out representation of stdin. I remember that until the GNU linker implemented the common symbol handling I described–adjust the size of the common symbol but do not include the archive member–it did not work correctly. It made sense at the time, but now I would have to look at an old SunOS system to recreate exactly what happens and why.

    I remember that it didn’t have to work that way. It was just the way that libc.a happened to be implemented.

  7. avjo said,

    January 11, 2008 @ 7:58 am

    Hi Ian,

    I have a question about the process of linking archives.
    Let’s say the linker had an undefined symbol A,
    which it found in an archive, and therefore pulled
    out the whole object file in which the defined symbol A
    have resided. So now a whole new object joins the party.
    A is resolved, which is good.
    But what about other symbols that the new object file
    might have ? E.g. let’s say there was another undefined symbol
    B which was already resolved to a weak symbol, but now,
    in the new object, there is a strong definition of B. Shouldn’t
    the linker now take the new definition of B instead of the
    previously resolved now ? I guess it should, but for that,
    it need to check all symbols of the new object file that
    was just added. Does it do that ? Or maybe did I miss something
    here ?
    Thanks!
    ~avjo

  8. Ian Lance Taylor said,

    January 11, 2008 @ 6:21 pm

    Yes, when the new object file is pulled into the link, all its symbols are checked. The new definition of B will take precedence over the previous weak definition of B.

    Once the linker decides to pull an object in from an archive, that object is treated as though the user named it on the command line.

  9. avjo said,

    January 11, 2008 @ 9:11 pm

    Thanks Ian !

  10. haizaar said,

    April 30, 2008 @ 2:49 am

    “Once the linker decides to pull an object in from an archive, that object is treated as though the user named it on the command line. ” Which means that all unresolved symbols from that object (whose probably are even unrelated the my program) will be resolved as well?

  11. Ian Lance Taylor said,

    April 30, 2008 @ 5:19 pm

    Yes: when an object comes in from an archive, any undefined references that it makes must be satisfied. For example, they may be satisfied by pulling in other objects from the same archive.

RSS feed for comments on this post · TrackBack URI

Leave a Comment

You must be logged in to post a comment.