Concurrent linking

There is still work to do on gold. But once that is done, what is the next big step? In the long term we need an incremental linker, but I’m also interested in an idea which I call concurrent linking. A concurrent linker runs at the same time as the compiler. As each object file is compiled, the linker is notified, and the linker adds the object file to the executable that it is building. When the last compilation is completed, the linker writes out the fully linked executable.

The idea is to keep the linker from being a serializing step in a compilation. It is normally easy to run many compilations in parallel, as each compilation is independent. Traditionally, however, the linker can not start until all the compilations are complete, and it must read all the input files at that time.

A concurrent linker can instead run at the same time as the compilations. There is no long wait after the compilations are complete. Also, the newly generated object file should be in the disk cache, and so the linker will have to do less actual disk I/O. I expect that this would only be a noticeable improvement for large programs, but then those are the cases where the linker really is a bottleneck.

The key to making this work will be careful control of the symbol table, and efficient tracking of relocations. In general relocations for an object file may only be resolved when they refer to symbols defined in objects which appear on the link command line. In other cases we must remember the relocation. If the symbol is already known to be defined in an object which appears later on the link line, it would probably be appropriate to resolve the relocation to that symbol, but to record it in case the symbol definition is changed later.

In general an executable linked in this way may have additional text and data segments, and they won’t be as tightly packed. Thus the resulting executable will most likely have slightly worse paging behaviour and will run slightly slower. So this technique is only appropriate during the development cycle, not during a release build.


Posted

in

by

Tags:

Comments

15 responses to “Concurrent linking”

  1. fche Avatar

    Perhaps “pipelined linking” might be a more descriptive term?

  2. Ian Lance Taylor Avatar

    I like “concurrent” since the linker and the compiler are running at the same time. But I guess that is true of a pipeline as well. OK, concurrent just sounds better to me.

  3. tromey Avatar

    One generic problem with things like this is figuring out how to fit it into existing build systems. Here, make isn’t going to start the linker until everything else is done; it might not even know how to start the linker until then (e.g., one might wildcard on the produced objects).

    But, if it is a big enough win, I suppose the maintainers will solve that problem for themselves.

    One idea I had is that, instead of linking at all, for development we would just do “jit linking”. The “linker” would simply write out a list of the objects included in the link, and a stub main(). This stub would use a special library to load in the object files on demand at runtime.

    BTW, your idea here sounds like a great adjunct to the incremental compiler :-). In particular perhaps I could eliminate the step where small objects are turned into the objects the user asks for, in favor of just having the concurrent linker do it.

  4. ncm Avatar

    Seems like inotify is a new OS feature that could make a difference in how one goes about this. You might start the linker before make, and point it at an existing executable, which can know where all the objects that comprise it came from. (Will inotify tell you when a .o file is closed by the compiler?) As the compiler finishes each .o, the linker quietly picks it up and stitches it into place. When make finally runs ld, the background linker treats that as punctuation. Then, depending on “-O” level, it chooses how much cleanup to bother about.

  5. Ian Lance Taylor Avatar

    Yeah, I don’t know the best interface to this feature. It doesn’t fit will with existing build systems. I don’t know whether inotify would work, but even if it did one would still have to tell the build system to run the linker first.

    Fortunately build systems are getting a little more abstract–while it would be awkward to put this into make, it would be much easier to put it into quagmire. quagmire would generate one Makefile target to start the linker, and one to wait for the linker to finish. Split compilation targets into two parts: one to compile, and one phony target to notify the linker. The executable depends on the phony targets, and the phony targets depend on the object files (and on the target which starts the linker). Each phony target notifies the linker that its object file is ready. The real executable target depends on all the phony targets and then waits for the linker to complete.

    You could approximate jit linking by turning every .o into a .so–that would be a fairly fast operation. Then the main function would dlopen them all.

  6. ncm Avatar

    Or you could leave the incremental linker running all the time, watching your current list of executables.

  7. landley Avatar

    Concurrent linking is essentially how tinycc works. The linker is built into the compiler, it assembles the ELF segments in ram as it’s compiling the software, and then writes them out towards the end.

    I’ve actually spent a large chunk of the past week reading the ELF spec so I can improve the version of it I inherited when Fabrice Bellard lost interest in the project (he does QEMU these days). I’m turning it into a busybox-style swiss army knife executable that has various different “cc”, “ld”, “as”, “ar” etc. functions…

    I’m also extending it until it can build an unmodified Linux kernel, but there’s rather a lot of work to do before then. 🙂

  8. Ian Lance Taylor Avatar

    Thanks for the note. That sounds cool. It seems like it doesn’t really permit a distributed compilation model, though. I suppose that’s OK for a project named “tiny”.

  9. fprog Avatar
    fprog

    I like the inotify style approach,
    also ./configure scripts could be changed to add similar code
    if gold is found and the Makefile could be EASILY changed like this:

    GCC=gcc
    RM=@rm -f
    CP=@cp -p
    CFLAGS=-O2 -g
    LDFLAGS=-lc
    CLEAR=@clear
    EXT=.x86
    RMEXT=*.x86

    HAVE_GOLD := $(wildcard /usr/bin/gold)

    ifeq ($(strip $(HAVE_GOLD)),)
    # Do not create, change only access time (i.e. dummy command harmless)
    PRELINK=@touch -c -a
    LD=gcc
    else
    # Gold is present
    PRELINK=gold –prelink $(LDFLAGS) -o
    LD=gold –postlink
    endif

    all: a$(EXT) b$(EXT) d$(EXT)
    @

    clear:
    $(CLEAR)

    clean:
    $(RM) a b d *.o a.out $(RMEXT)

    a.o: a.h a.c
    @echo “MUST add this line to prelink”
    $(PRELINK) a$(EXT) a.o &
    $(PRELINK) d$(EXT) d.o &
    $(GCC) $(CFLAGS) -c a.c -o a.o

    b.o: b.h b.c
    @echo “MUST add this line to prelink”
    $(PRELINK) b$(EXT) b.o &
    $(PRELINK) d$(EXT) d.o &
    $(GCC) $(CFLAGS) -c b.c -o b.o

    d.o: d.h d.c
    @echo “MUST add this line to prelink”
    $(PRELINK) d$(EXT) d.o &
    $(GCC) $(CFLAGS) -c b.c -o b.o

    a$(EXT): a.o
    @echo “Wrap up the linking process”
    $(LD) $(LDFLAGS) -o a$(EXT) a.o

    b$(EXT): b.o
    @echo “Wrap up the linking process”
    $(LD) $(LDFLAGS) -o b$(EXT) b.o

    d$(EXT): a.o b.o d.o
    @echo “Wrap up the linking process”
    $(LD) $(LDFLAGS) -o d$(EXT) a.o b.o d.o

    run: a$(EXT) b$(EXT) d$(EXT)
    ./a$(EXT)
    ./b$(EXT)
    ./d$(EXT)

    # make clean clear all run

    In this approach you have 3 instance of gold –prelink working on d.x86,
    and one final instance doing the last operation,
    so some locking/sharing/synchronization will have to be used somehow.
    For instance gold –prelink processes could “talk” via a shared file:
    d$(EXT).prebuild

    Another approach is this:

    GCC=gcc
    RM=@rm -f
    CP=@cp -p
    CFLAGS=-O2 -g
    LDFLAGS=-lc
    CLEAR=@clear
    EXT=.x86
    RMEXT=*.x86

    HAVE_GOLD := $(wildcard /usr/bin/gold)

    ifeq ($(strip $(HAVE_GOLD)),)
    # Do not create, change only access time (i.e. dummy command harmless)
    PRELINK=@touch -c -a
    LD=gcc
    else
    PRELINK=gold –prelink $(LDFLAGS) -o
    LD=gold –postlink
    endif

    all: b$(EXT) d$(EXT)
    @

    clear:
    $(CLEAR)

    clean:
    $(RM) a b d *.o a.out $(RMEXT)

    a.o: a.h a.c
    $(GCC) $(CFLAGS) -DNOMAIN -c a.c -o a.o

    b.o: b.h b.c
    $(GCC) $(CFLAGS) -c b.c -o b.o

    d.o: d.h d.c
    $(GCC) $(CFLAGS) -c d.c -o d.o

    # Add these rules
    b$(EXT).prebuild:
    @echo “prebuild b”
    $(PRELINK) b$(EXT) a.o b.o

    d$(EXT).prebuild:
    @echo “prebuild d”
    $(PRELINK) d$(EXT) a.o d.o

    b$(EXT): b$(EXT).prebuild a.o b.o
    @echo “Wrap up the linking process”
    $(LD) $(LDFLAGS) -o b$(EXT) a.o b.o

    d$(EXT): d$(EXT).prebuild a.o d.o
    @echo “Wrap up the linking process”
    $(LD) $(LDFLAGS) -o d$(EXT) a.o d.o

    Now, you have only one gold –prelink that will exits once all files are linked
    and dump the results in d$(EXT).prebuild
    and gold –postlink finds the d$(EXT).prebuild file
    and do the last bit of linking…

    Another way even simpler is to have $(LD) do “nothing”,
    similar to LD=@touch -c -a
    and pass $(LDFLAGS) to $(PRELINK)
    but that may be inconvenient,
    if the executables do not have the same parameters…

    So another way is this, which is easy to script out:

    GCC=gcc
    RM=@rm -f
    CP=@cp -p
    CFLAGS=-O2 -g
    LDFLAGS=-lc
    CLEAR=clear
    EXT=.x86
    RMEXT=*.x86

    HAVE_GOLD := $(wildcard /usr/bin/gold)

    # Check if HAVE_GOLD is not found and null
    ifeq ($(strip $(HAVE_GOLD)),)
    # Returns true and never execute the inner content
    PRELINK=@true || (false && echo ”
    PRELINKEND=”)
    LD=gcc
    LDEND=
    else
    PRELINK=gold –prelink
    PRELINKOUT=
    LD=@true || (false && echo ”
    LDEND=”)
    endif

    all: b$(EXT) d$(EXT)
    @

    clear:
    $(CLEAR)

    clean:
    $(RM) a b d *.o a.out $(RMEXT)

    a.o: a.h a.c
    $(GCC) $(CFLAGS) -DNOMAIN -c a.c -o a.o

    b.o: b.h b.c
    $(GCC) $(CFLAGS) -c b.c -o b.o

    d.o: d.h d.c
    $(GCC) $(CFLAGS) -c d.c -o d.o

    b.prebuild:
    @echo “prebuild b – Copy-paste from below with PRELINK..PRELINKEND prefix/suffix”
    $(PRELINK) $(LDFLAGS) -o b$(EXT) a.o b.o $(PRELINKEND)

    d.prebuild:
    @echo “prebuild d – Copy-paste from below with PRELINK..PRELINKEND prefix/suffix”
    $(PRELINK) $(LDFLAGS) -o d$(EXT) a.o d.o $(PRELINKEND)

    b$(EXT): b.prebuild a.o b.o
    @echo “no post processing”
    $(LD) $(LDFLAGS) -o b$(EXT) a.o b.o $(LDEND)

    d$(EXT): d.prebuild a.o d.o
    @echo “no post processing”
    $(LD) $(LDFLAGS) -o d$(EXT) a.o d.o $(LDEND)

    It all depends on much work and how much life should be easy
    for Makefile maintainers…

    My 2 cents.

    P.S.: Thanks for sharing gold with the community =)
    Anything that can make compilation/linking time smaller
    and our life simpler is always appreciated.
    I saw MIPS, but is anyone working on having ARM support too?

  10. astrange Avatar
    astrange

    Apple had a jit linking system called Zero-Link which was mostly famous for not working and annoying everyone who used it… I’m not sure if that’s a fundamental problem with the idea, though.

  11. Ian Lance Taylor Avatar

    fprog: thanks for the examples. Do they still work if the .o file is up-to-date? What invokes the prelink step in that case?

    I don’t know of anybody working on an ARM port yet.

    astrange: ZeroLink is different from concurrent linking. ZeroLink postpones the linking until runtime. Concurrent linking does a more-or-less regular link, it just does it at the same time as the compilation.

  12. DonLindsay Avatar

    I’m a fan of inotify, but it’s wrong for this. Large build systems leave multiple executables, not one, so the build system has to say which object file goes to which link pipeline.

    Also, each of the plural link invocations has its own parameters, which I don’t think you want to smuggle to it via the GCC invocations.

    So I’d give gold some sort of –daemon (OK, –concurrent) flag which makes it leave a named pipe that it then listens on. The build system could follow each GCC invocation with an echo to a pipe of the new object file’s name. This (or some similar scheme) allows multiple consumer invocations, all loosely coupled to the set of producers, without having to write new locking and buffering code.

    How would the build system know that a link daemon was done? Via a ‘finish’ command?

  13. Ian Lance Taylor Avatar

    I think you *could* use some sort of inotify based code, provided you invoked the linker with the set of objects it had to wait for. I think you would have to first remove all the objects which were going to change, to avoid a race condition. Of course this approach wouldn’t work well with -L -l.

    Or, yes, a named pipe would work. My main concern with that approach is that it would require bigger changes to the build system.

  14. DonLindsay Avatar

    On second thought, we’re both wrong.

    There is a problem with wanting to know a set of objects up front. Larger build systems tend to do a lot of subsystem discovery, and may be well into the build proper before discovery is completed.

    And the problem with associating a link with an object file, is that a given object file may have multiple destinations.

  15. Ian Lance Taylor Avatar

    Thanks for the note. It’s not necessary to work with every build system, and it’s not necessary to work from the start of the build–it’s OK to start the concurrent link at the point where you know the set of objects. I’m not proposing associating a link with an object file–I agree that that can’t work.

Leave a Reply