Development of the gcc compiler faces recurring tension between people who want gcc to generate better code and people who want gcc to support their style of programming. The languages which gcc supports all have standards which are intended to provide an interface between the programmer and the compiler. The programmer can assume that the compiler implements the standard, and the compiler can assume that the programmer will not rely on behaviour which the standard declares to be undefined.
Unfortunately, especially for C, programmers make assumptions that have historically been true for C compilers but are not guaranteed by the standard. This leads to trouble. Some examples of trouble in gcc development are:
- Type-based aliasing. The standard says, basically, a memory location written to by a pointer of one type may not be read by a pointer of a different types (ignoring qualifiers like
const). gcc takes advantages of this to rearrange memory loads and stores, which is desirable to increase scheduling flexibility. Unfortunately, this breaks a fair amount of code. While gcc enables type-based aliasing by default when optimizing, it provides an option to disable it:
-fno-strict-aliasing. In order to better detect code that may break, I suggested a way to warn about such code; the warning was implemented by Silvius Rus at Google, and will be in gcc 4.3.
- Undefined signed overflow. The standard says that overflow in a signed computation is undefined behaviour. gcc takes advantages of this in various ways, notably to estimate how many times a loop will be executed. Unfortunately, this breaks code which assumes that signed overflow wraps using modulo arithmetic. I implemented a
-fno-strict-overflowoption to disable the assumption, and I also implemented a
-Wstrict-overflowoption to warn about cases where gcc is making the assumption. These options are both in gcc 4.2.1.
- Inline assembler. It’s difficult to write gcc inline assembler correctly, because it’s difficult to express the constraints and dependencies. If you fail to indicate exactly what your inline assembler requires and precisely what it does, gcc can and will rearrange the code such that it does not work. The documentation of inline assembler is not great.
- Changes to the inliner. This mainly affects kernel developers. The kernel more or less requires that certain functions be inlined (this may have been more true in the past than it is today). gcc does not provide any guarantees about the behaviour of the inliner. There have been cases where it has changed such that functions which should be inlined no longer are. gcc does provide
noinlineattributes which can be used to control this.
- Compilation speed. The compiler has gotten somewhat slower over the years. It’s hard to speed it up. Periodically users run across some horrible test case.
The flip side of these usability issues is, of course, that gcc tries to take advantage of the flexibility it is permitted to generate better code. What tends to happen is that some gcc developer will find a way to speed up a specific test case. He or she will implement the optimization in gcc. This will quietly break some formerly working code which unknowingly relied on undefined behaviour. The programmer won’t consider the formerly working code to be undefined, or will feel that gcc should define it anyhow. This leads quickly to an argument in which the gcc developers say that the standard is clear and ask for a clear proposal for how the standard should work instead, while the user says that the code is clear and the compiler is crazy. These arguments have a characteristic pattern in which both sides talk past each other because they are talking about entirely different things: one talks about standards, the other about existing code. The usual outcome is a new compiler option.
It would be easier if gcc weren’t required to be all things to all people. Some people want gcc to generate better code, and compare it to other compilers. Some people want gcc to continue compiling their current code in the same way. These desires are contradictory. gcc developers tend to pay more attention to people who want better code, because that work is more interesting.
I do feel that most of the issues get resolved. The larger problem is not the issues themselves, it is the bad blood left behind by the arguments. Since the arguments tend to talk past each other, it is not easy to find common ground. If there were another free compiler which was more responsive to user suggestions, it would have a good chance of supplanting gcc. It will be interesting to see whether the gcc development community is able to respond to compilers like LLVM and Open64.
Part of the problem is that there is no person who can speak for gcc. I’m not going to name names, but several of the people who participate in these arguments are abrasive and not inclined to look for common ground. This does a lot of harm to gcc over time in the larger free software community. gcc’s organizational structure, with a steering committee which does not participate directly in development, means that nobody is actually in charge. Many people are paid to work on gcc (including me), but this naturally means that most of them are required to focus on areas of interest to their employers, rather than on the gcc project as a whole.
Despite all this, I’m generally optimistic about the future of gcc. There is a lot of good work being done by a lot of people. The gcc community cooperates well within itself, and rarely gets stuck on non-technical issues. It’s easy to focus on the negative. It’s harder to notice that gcc continues to develop quickly and continues to respond to users.