Compiler to Assembler

The gcc compiler has always worked by writing out assembly code in text format. The assembler reads this text file to produce an object file. Most compilers work this way, although there have been some exceptions such as the MIPS compiler used on Irix.

Clearly this process of producing text and then parsing it again is inefficient. When the GNU assembler was first written, it read a text format that was very precisely specified. Spacing had to be exact and generally omitted, no comments were permitted, etc. A special directive, #APP, was used to tell the assembler to go back to a normal parsing mode, which was implemented using an input scrubber which converted more free-form text into the precise form. The idea was that the compiler would generate this precise specification, reducing the parsing costs in the assembler, while still permitting users to write assembler code by hand for use in asm statements.

This idea still exists in the assembler, but it has been lost to some extent. By default, today, the assembler will accept input with arbitrary spacing, comments, etc. This can be disabled using the -f option. The assembler used to also accept a #NO_APP directive to go back to precise mode, but that is now only effective if it appears in the first line of the file. OgccGNU/Linux, the compiler does neither of these by default. Oddly, it does generate #NO_APP after an asm statement, where the assembler ignores it. The assembler is also now built in a mode which does permit some whitespace even in precise mode that was, for a while, not permitted.

What this tells me is that nobody cares very much about how long it takes the assembler to parse text. This is not unreasonable, since the assembler is in fact quite fast, and is not a major part of overall compilation time. Still, time spent parsing in the assembler is time lost.

In 1997 David Henkel-Wallace at Cygnus proposed converting the GNU assembler into a library which would be invoked directly by the compiler. That plan was never implemented in gcc. In today’s world it no longer makes much sense. Running the compiler and assembler as separate processes takes better advantage of today’s multicore machines. It’s hard to make a compiler multi-threaded; let’s not take away the one limited form of multi-threading that we already have.

What does make sense is using a structured data format, rather than text, to communicate between the compiler and the assembler. In gcc’s terms, the compiler should generate insn patterns with associated operands. The assembler should piece those together into its own internal data structures. In fact, of course, ideally gcc and the assembler would use the same internal data structure. It would be interesting to design such a structure so that it could be transmitted in a file, or over a pipe, or in shared memory.

I think the time savings would happen less on the assembler side than on the gcc side: gcc would no longer have to format the output.

I think this would have the potential to cut compilation time by 5 to 10 percent. Not a big savings for the effort required, which is why nobody has done it. Compilation time is less important these days due to the use of big compilation clusters, but programs are also getting bigger, so it is not wholly unimportant.


Posted

in

by

Tags:

Comments

3 responses to “Compiler to Assembler”

  1. laurynas Avatar

    “Running the compiler and assembler as separate processes takes better advantage of today’s multicore machines. It’s hard to make a compiler multi-threaded; let’s not take away the one limited form of multi-threading that we already have.”

    What about the long-completed cccp to libcpp transition then? Would it make sense to have separate cccp again?

  2. Ian Lance Taylor Avatar

    Hi Laurynas. An interesting point. It might make sense if the data were transferred in a structured manner rather than via text. An important difference, though, is that the compiler preprocesses the entire input stream before it does much else, so that it can do inter-procedural optimizations. The assembler output, on the other hand, is emitted one function at a time. In other words, the preprocessor has to run entirely before the compiler does most of its work, but both the assembler and the compiler can do useful work simultaneously. So my guess is that the payoff from separating the preprocessor would be lower.

  3. tromey Avatar

    > What this tells me is that nobody cares very much about how long it takes the assembler to parse text

    I am not so sure about this particular conclusion. Instead I think that most GCC developers, and therefore presumably most organizations funding GCC development, care more about optimizations and (maybe) language spec conformance than they do about GCC’s performance. IOW: people have complained for years about GCC’s performance but nobody has really seriously worked on the problem. Unless you count LLVM 😉

    About the preprocessor: I think integrating it provided a speedup on uniprocessor machines. However, the experiment is easy to do — run gcc -E and feed the output to gcc. I’d be interested in the results.

Leave a Reply