Various gcc developers are working on Link Time Optimization, also known as Whole Program Optimization. When using LTO, gcc will write the intermediate representation into the object file. At link time, gcc will read it in and use it to implement cross-module optimizations. The canonical optimization which requires LTO is inlining functions across object file boundaries, which can be a very effective optimization for many programs. Most modern compilers support a version of LTO; gcc is a late-comer here, but that is typical for gcc.
However, the main issue with LTO is speed. If it takes too long to compile a program in LTO mode, then nobody will use it. That is the fate of gcc’s current whole program optimization mode, selected by -combine. The -combine option is also currently restricted to be C only, although the author, Geoff Keating, has done some work toward extending it to C++.
In order to make LTO fast enough to use, we need to make reading the IR during the LTO phase as fast as possible. We need to make decisions about inlining and other optimizations as fast as possible, which means that we need to have compact summary information about different object files, and we need to be able to process it quickly. Once we have made decisions about what to optimize, we need to implement those optimizations as quickly as possible. We can not assume that we will be able to hold the IR for all functions in memory; that will not be feasible for a reasonably sized program. We must be prepared to read and discard IR as necessary.
Our goal for this has to be that we can apply LTO to a large program in a few minutes at most. Anything more is going to be useless in practice. This is going to require attention to speed at all levels. The IR and summary information must be designed for quick reading. The implementation of optimizations will have to be done in parallel where possible–e.g., we will need to be able to have the backend invoke distcc to complete the compilation steps.
In practice we will need to use profile information to drive the LTO optimizations. We will need to be able to only apply LTO to hot functions, in order to reduce the time and maximize the benefits.
There is a long way to go to make all this happen.