Gcc vs. Users

Development of the gcc compiler faces recurring tension between people who want gcc to generate better code and people who want gcc to support their style of programming. The languages which gcc supports all have standards which are intended to provide an interface between the programmer and the compiler. The programmer can assume that the compiler implements the standard, and the compiler can assume that the programmer will not rely on behaviour which the standard declares to be undefined.

Unfortunately, especially for C, programmers make assumptions that have historically been true for C compilers but are not guaranteed by the standard. This leads to trouble. Some examples of trouble in gcc development are:

  • Type-based aliasing. The standard says, basically, a memory location written to by a pointer of one type may not be read by a pointer of a different types (ignoring qualifiers like const). gcc takes advantages of this to rearrange memory loads and stores, which is desirable to increase scheduling flexibility. Unfortunately, this breaks a fair amount of code. While gcc enables type-based aliasing by default when optimizing, it provides an option to disable it: -fno-strict-aliasing. In order to better detect code that may break, I suggested a way to warn about such code; the warning was implemented by Silvius Rus at Google, and will be in gcc 4.3.
  • Undefined signed overflow. The standard says that overflow in a signed computation is undefined behaviour. gcc takes advantages of this in various ways, notably to estimate how many times a loop will be executed. Unfortunately, this breaks code which assumes that signed overflow wraps using modulo arithmetic. I implemented a -fno-strict-overflow option to disable the assumption, and I also implemented a -Wstrict-overflow option to warn about cases where gcc is making the assumption. These options are both in gcc 4.2.1.
  • Inline assembler. It’s difficult to write gcc inline assembler correctly, because it’s difficult to express the constraints and dependencies. If you fail to indicate exactly what your inline assembler requires and precisely what it does, gcc can and will rearrange the code such that it does not work. The documentation of inline assembler is not great.
  • Changes to the inliner. This mainly affects kernel developers. The kernel more or less requires that certain functions be inlined (this may have been more true in the past than it is today). gcc does not provide any guarantees about the behaviour of the inliner. There have been cases where it has changed such that functions which should be inlined no longer are. gcc does provide always_inline and noinline attributes which can be used to control this.
  • Compilation speed. The compiler has gotten somewhat slower over the years. It’s hard to speed it up. Periodically users run across some horrible test case.

The flip side of these usability issues is, of course, that gcc tries to take advantage of the flexibility it is permitted to generate better code. What tends to happen is that some gcc developer will find a way to speed up a specific test case. He or she will implement the optimization in gcc. This will quietly break some formerly working code which unknowingly relied on undefined behaviour. The programmer won’t consider the formerly working code to be undefined, or will feel that gcc should define it anyhow. This leads quickly to an argument in which the gcc developers say that the standard is clear and ask for a clear proposal for how the standard should work instead, while the user says that the code is clear and the compiler is crazy. These arguments have a characteristic pattern in which both sides talk past each other because they are talking about entirely different things: one talks about standards, the other about existing code. The usual outcome is a new compiler option.

It would be easier if gcc weren’t required to be all things to all people. Some people want gcc to generate better code, and compare it to other compilers. Some people want gcc to continue compiling their current code in the same way. These desires are contradictory. gcc developers tend to pay more attention to people who want better code, because that work is more interesting.

I do feel that most of the issues get resolved. The larger problem is not the issues themselves, it is the bad blood left behind by the arguments. Since the arguments tend to talk past each other, it is not easy to find common ground. If there were another free compiler which was more responsive to user suggestions, it would have a good chance of supplanting gcc. It will be interesting to see whether the gcc development community is able to respond to compilers like LLVM and Open64.

Part of the problem is that there is no person who can speak for gcc. I’m not going to name names, but several of the people who participate in these arguments are abrasive and not inclined to look for common ground. This does a lot of harm to gcc over time in the larger free software community. gcc’s organizational structure, with a steering committee which does not participate directly in development, means that nobody is actually in charge. Many people are paid to work on gcc (including me), but this naturally means that most of them are required to focus on areas of interest to their employers, rather than on the gcc project as a whole.

Despite all this, I’m generally optimistic about the future of gcc. There is a lot of good work being done by a lot of people. The gcc community cooperates well within itself, and rarely gets stuck on non-technical issues. It’s easy to focus on the negative. It’s harder to notice that gcc continues to develop quickly and continues to respond to users.

9 Comments »

  1. ncm said,

    October 29, 2007 @ 11:28 pm

    The situation with FFTW was rather simpler: they complained that each release since … 2.7, I think, produced slower floating-point code in their innermost loops, on x86, than 2.7. (As I recall, post-2.7 generated loops with unnecessary register spills.) Last I heard was 3.3, but that’s a lot of releases. They were talking then about giving up and generating assembly code instead. I don’t know if they did, or whether 4.x (for any x) does any better.

    Can’t Mark Mitchell, as Project Leader, speak for Gcc? If it has a future, I’m inclined to give him a lot of the credit for it.

  2. Ian Lance Taylor said,

    October 30, 2007 @ 6:24 am

    4.x normally does better than 3.x. That said, I have no idea about FFTW in particular.

    Mark Mitchell isn’t Project Leader, he’s Release Manager. There is no Project Leader. That said, I think Mark probably could speak for gcc if he chose to assert himself. However, he doesn’t. He prefers a more consensus oriented approach, which is the right way to operate within the gcc community but is less helpful when dealing with people who are outside that community. Also Mark is very busy and tends to lag discussions by several days at best, so we see whole discussion firestorms before he has anything to say.

  3. fche said,

    October 30, 2007 @ 6:58 am

    It seemed that the optimization under current controversy may have been misguided for reasons other than just breaking existing code. It adds memory accesses that weren’t there before, betting that this is a good trade against a branch/pipeline flush. This bet however is highly suspect – cache miss latencies can exceed modern processor pipeline flush times, and can get even worse with SMP. If there was anything other than a toy microbenchmark posted to justify this transform, I missed it.

  4. Ian Lance Taylor said,

    October 30, 2007 @ 7:40 pm

    It’s hard to know for sure. It’s true that the cache miss latency is higher than the branch penalty. But the difference is that it’s a cache miss for a store, so all that latency will normally be hidden from the program–while the store is taking place, the program will continue to execute. It would be an issue if it causes cache churn, or if the processor is out of memory slots.

    You are certainly right that it’s not an obvious win in all cases.

  5. tromey said,

    October 31, 2007 @ 10:18 am

    I do tend to talk about the negatives more, because they are the things requiring change. Acceptance of their existence is the first step… but naturally, when discussing solutions I think we must also try to avoid breaking the parts of the process that work well. This isn’t easy, since easy-to-fix problems tend to be fixed immediately.

    The tone of the recent discussion says to me that the bridges have already been burnt. Kernel developers, at least the vocal ones, have given up on GCC. My concern is that we’re in the process of eroding the trust that other developer communities have. (Speaking of which, I also hear many complaints about C++ — ABI breakages from several years ago still get complaints.)

    I agree that the lack of a central maintainer is a problem. It is hard to picture who could fill that role, though, especially given GCC’s history. The abrasive nature of some people on the GCC list partly stems from this problem; although things have been better in recent years, the community would be a bit better off if it required people to temper their tone a bit.

    FWIW I don’t really agree that nobody is in charge of GCC. This is what GCC developers say, but in reality the global write maintainers have an enormous say over the direction of the compiler — at least in terms of veto power. I think there are social reasons that this is not openly discussed, but it doesn’t make it less true 🙂

    I dunno, maybe I’m looking for doom. I’m not as optimistic about GCC as you seem to be, but neither am I overly pessimistic. On the plus side I think GCC is fairly open technically — everybody knows there are problems but there generally isn’t much resistance when someone codes up an improvement. General cleanups are under-invested, but that seems to be an industry-wide phenomenon.

    On the minus side I would put certain developers’ email styles, an occasionally unwieldy patch approval process (or, really, too few maintainer), and lack of transparency and change in the SC (though I liked what I heard from the SC at the summit, and this made the committee not seem quite as bad).

  6. Ian Lance Taylor said,

    November 1, 2007 @ 8:17 pm

    The global write maintainers are in charge in a sense, in that they can move gcc in whatever direction they choose. However, in practice, they don’t. Most of the global write maintainers are inactive. The most active global write maintainer is Mark Mitchell, and he confines himself to the C++ frontend. While the global write maintainers have veto power, they don’t use it. The closest they ever come is a pocket veto: some patches never get approved.

    So I would say that while the structure of gcc permits the global write maintainers to be in charge, in practice they are not.

    The steering committee has evidently decided not to create any new global write maintainers. Perhaps they are concerned about this very issue.

  7. Manu said,

    November 9, 2007 @ 9:38 am

    Hi Ian,

    I think that the key points is people talking past each other. I understand that GCC developers have little time but sometimes I think there is a lack of explanations and patience.

    Saying “that is undefined by the standard” or “that is allowed by the standard” does not satisfy users. And most of the time the *real* reasons are far simpler and easier to understand if explained properly. Typically they are that doing otherwise: “will break other people’s code”, “will reduce performance for other people’s code”, “will reduce the performance of gcc” or “will add excessive complexity to gcc”. Or simply, “sorry, we would like to, but we cannot do that yet”.

    I don’t think GCC needs someone in charge, a Project Leader. But GCC needs someone that can speak for GCC. Someone that has time to go in private with people (users and developers) and work out what are the real issues, which are the possible solutions and how to find a compromise. Someone that acts as a middle-man between busy GCC devs and users and also sometimes between GCC devs when things go a bit personal (and that has happened in a few occassions) in order to avoid people talking past each other.

  8. Ian Lance Taylor said,

    November 9, 2007 @ 6:28 pm

    Thanks for the note. I agree that it would be useful to have somebody to keep issues running more smoothly both inside and outside of the project. It’s a difficult and time consuming role, though, and probably not a very rewarding one for most people. It’s not one which any company is likely to pay for, except conceivably for an organization like OSDL. Also, the only way for that person to really change things would be to get gcc developers to make changes in some cases, and we know that that can also be very hard. So while I hope that some such person appears, I don’t see how we can count on it or plan for it.

  9. January 11, 2008 « Everything is Data said,

    May 1, 2009 @ 4:16 pm

    […] The broader point here is that while this optimization is completely legal according to the C standard, it is inconsistent with the traditional C semantics, and runs the risk of breaking code that depends on integer overflow having the expected behavior. At least GCC now provides a flag to emit warnings for potentially broken code, which IMHO is a prerequisite for doing aggressive optimizations of this type. There’s another interesting post on Ian Lance Taylor’s blog that discusses this situation in general (e.g. alias optimizations are another instance where the C standard contradicts the traditional expectations of C programmers). Possibly related posts: (automatically generated)Linux Kernel Auditing; Unsigned Integer OverflowsDunno Bout C++? […]

RSS feed for comments on this post · TrackBack URI

You must be logged in to post a comment.