The gold linker is multi-threaded. I’ll sketch the implementation I chose.
I wanted to avoid using lots of fine-grained mutexes, because they are both error-prone and expensive. Instead, I used a workqueue system. This requires breaking up the job into, essentially, a state machine. The whole link is defined as a series of relatively independent tasks. There are ordering requirements among the tasks.
For example, one task is “read the symbol table of object foo.o” and another is “add the symbols from foo.o to the global symbol table.” Naturally the second must follow the first. Also, while we can read the input symbol tables in any order, we must add them to the global symbol table in the order given on the command line.
The single main thread manages all locking for tasks. A task starts with a (possibly empty) set of locks which it must hold in order to start. These locks are not mutexes, but are simply a class which keeps track of who has locked it and who is waiting. When a task does start, it acquires its locks. This happens in the main thread, so no mutexes are required. The main thread then dispatches the task to a subsidiary thread. When the task is completed, it notifies the main thread (this requires a condition variable). The main thread is then responsible for freeing the locks, possibly moving other tasks to the ready queue.
The workqueue supports both locks and blocks. A lock is a shared or exclusive lock on some object, such as an input file. A block is a semaphore, with a count. Each time a task associated with a block is created (not run, just created), the block count is incremented. When the task completes, the block count is decremented. When the block count hits zero, it enables some other task to run. For example, this is how gold controls the task of mapping the input sections to the output sections, which can only happen after all the input object files have been read and added to the global symbol table. The block is decremented as each input task completes, and mapping task waits for the block to hit zero.
In some ways this is a relatively simple approach, with a minimal number of mutexes. However, it does require some careful reasoning about what objects a task will require access to. Because there are no locks associated with objects, there is no convenient way for a task running in a subsidiary thread to assert that it has access to the object. In order to avoid race conditions, the code simply has to be correct. And that means that anybody working on the code has to understand the locking strategy.
While I think this is a reasonable approach, I do not yet know how maintainable it will be over time. State machine implementations can be difficult for people to understand, and the high-level locking is vulnerable to low-level errors. I know that one of my characteristic programming errors is a tendency toward code that is overly complex, which requires global information to understand in detail. I’ve tried to avoid it here, but I won’t know whether I succeeded for some time.
It would certainly be simpler if I could have written the code in a straightforward fashion rather than breaking it up into tasks, and if I had tools to somehow make that run in parallel where possible. But that seems to be a pipe dream.