Archive for November, 2009

Compiler to Assembler

The gcc compiler has always worked by writing out assembly code in text format. The assembler reads this text file to produce an object file. Most compilers work this way, although there have been some exceptions such as the MIPS compiler used on Irix.

Clearly this process of producing text and then parsing it again is inefficient. When the GNU assembler was first written, it read a text format that was very precisely specified. Spacing had to be exact and generally omitted, no comments were permitted, etc. A special directive, #APP, was used to tell the assembler to go back to a normal parsing mode, which was implemented using an input scrubber which converted more free-form text into the precise form. The idea was that the compiler would generate this precise specification, reducing the parsing costs in the assembler, while still permitting users to write assembler code by hand for use in asm statements.

This idea still exists in the assembler, but it has been lost to some extent. By default, today, the assembler will accept input with arbitrary spacing, comments, etc. This can be disabled using the -f option. The assembler used to also accept a #NO_APP directive to go back to precise mode, but that is now only effective if it appears in the first line of the file. OgccGNU/Linux, the compiler does neither of these by default. Oddly, it does generate #NO_APP after an asm statement, where the assembler ignores it. The assembler is also now built in a mode which does permit some whitespace even in precise mode that was, for a while, not permitted.

What this tells me is that nobody cares very much about how long it takes the assembler to parse text. This is not unreasonable, since the assembler is in fact quite fast, and is not a major part of overall compilation time. Still, time spent parsing in the assembler is time lost.

In 1997 David Henkel-Wallace at Cygnus proposed converting the GNU assembler into a library which would be invoked directly by the compiler. That plan was never implemented in gcc. In today’s world it no longer makes much sense. Running the compiler and assembler as separate processes takes better advantage of today’s multicore machines. It’s hard to make a compiler multi-threaded; let’s not take away the one limited form of multi-threading that we already have.

What does make sense is using a structured data format, rather than text, to communicate between the compiler and the assembler. In gcc’s terms, the compiler should generate insn patterns with associated operands. The assembler should piece those together into its own internal data structures. In fact, of course, ideally gcc and the assembler would use the same internal data structure. It would be interesting to design such a structure so that it could be transmitted in a file, or over a pipe, or in shared memory.

I think the time savings would happen less on the assembler side than on the gcc side: gcc would no longer have to format the output.

I think this would have the potential to cut compilation time by 5 to 10 percent. Not a big savings for the effort required, which is why nobody has done it. Compilation time is less important these days due to the use of big compilation clusters, but programs are also getting bigger, so it is not wholly unimportant.

Comments (3)


It’s been a year. I’m back. I don’t have much new to say, but I’ve been starting to write blog entries in my head, so I might as well write them here. I’ll be aiming for three posts a week for now.

I’ll start with a few quick comments that came to mind as I skimmed my blog posts from November 2007 to November 2008.

  • Obviously, Obama did win, and he did increase spending on infrastructure, and it does seem to have helped end the recession. Open questions are how long unemployment will stay high (employment historically lags economic recovery) and whether there will be any structural reforms to prevent the same sort of thing from happening again in a few years.
  • We were able to trap the feral mother cat several months later and had her neutered as well. No new feral kittens have been seen on our street for some time.
  • Everything continues to get more complicated.
  • Iraq is doing much better than I ever thought it would, and Moktada al-Sadr seems to have disappeared. Was the U.S. right to invade? Was it worth the cost? I haven’t seen much discussion of these questions recently.
  • On the other hand lots of people are discussing whether it’s worth it for the U.S. to keep pushing that Sisyphean stone in Afghanistan. How long should the U.S. continue? What does success even look like?
  • What’s up with Israel and settlements? Just stop building new ones, already.
  • The Watchmen movie. It was better than I thought it would be. I’m not sure it was actually good, but I did enjoy it. The credit sequence was really interesting–in fact, I just saw the same idea in Zombieland.
  • The gcc in C++ work is now available as a configure option in gcc mainline. I’m letting is rest before I make the next push toward making it the default.
  • The gold linker seems to be getting a fair amount of use, judging by the bug reports. The most contentious issue is different handling of linking against shared libraries which themselves refer to other shared libraries not mentioned in the link.
  • Robert Zemeckis is making yet another movie using technology that fails the uncanny valley test. I won’t be seeing it.

Comments (4)

« Previous Page « Previous Page Next entries »