gccgo panic/recover

Back in March Go picked up a dynamic exception mechanism. Previously, when code called the panic function, it simply aborted your program. Now, it walks up the stack to the top of the currently running goroutine, running all deferred functions. More interestingly, if a deferred function calls the new recover function, the panic is interrupted, and stops walking the stack at that point. Execution then continues normally from the point where recover was called. The recover function returns the argument passed to panic, or nil if there is no panic in effect.

I just completed the implementation of this in gccgo. It turned out to be fairly complex, so I’m writing some notes here on how it works.

The language requires that panic runs the deferred functions before unwinding the stack. This means that if the deferred function calls runtime.Callers (which doesn’t work in gccgo, but never mind, it will eventually) it gets a full backtrace of where the call to panic occurred. If the language did not work that way, it would be difficult to use recover as a general error handling mechanism, because there would be no good way to dump a stack trace. Building up a stack trace through each deferred function call would be inefficient.

The language also requires that recover only return a value when it is called directly from a function run by a defer statement. Otherwise it would be difficult for a deferred function to call a function which uses panic and recover for error handling; the recover might pick up the panic for its caller, which would be confusing.

As a general gccgo principle I wanted to avoid requiring new gcc backend features. That raised some difficulty in implementing these Go language requirements. How can the recover function know whether it is being invoked directly by a function started by defer? In 6g, walking up the stack is efficient. The panic function can record its stack position, and the recover function can verify that it is at the correct distance below. In gccgo, there is no mechanism for reliably walking up the stack other than exception stack unwinding, which does not provide a helpful API. Even if it did, gccgo’s split stack code can introduce random additional stack frames which are painful to account for. And there is no good way for panic to mark the stack in gccgo.

What I did instead was have the defer statement check whether the function is it deferring might call recover (e.g., it definitely calls recover, or it is a function pointer so we don’t know). In that case, the defer statement arranges to have the deferred thunk record the return address of the deferred function at the top of the defer stack. This value is obtained via gcc’s address-of-label extension, so no new feature was required. This gives us a value which a function which calls recover can check, because a function can always reliably determine its own return address via gcc’s __builtin_return_address function.

However, if the stack is split, then __builtin_return_address will return the address of the stack splitting cleanup code rather than the real caller. To avoid that problem, a function which calls recover is split into two parts. The first part is a small thunk which is marked to not permit its stack to be split. This thunk gets its return address and checks whether it is being invoked directly from defer. It passes this as a new boolean parameter to the real function, which does permit a split stack. The real function checks the new parameter before calling recover; if it is false, it just produces a nil rather than calling recover. The real function is marked uninlinable, to ensure that it is not inlined into its only call site, which could blow out the stack.

That is sufficient to let us know whether recover should return a panic value if there is one, at the cost of having an extra thunk for every function which calls recover. Now we can look at the panic function. It walks up the defer stack, calling functions as it goes. When a function sucessfully calls recover, the panic stack is marked. This stops the calls to the deferred functions, and starts a stack unwind phase. The stack unwinding is done exactly the way that g++ handles exceptions. The g++ exception mechanism is general and cross-language, so this part was relatively easy. This means that every function that calls recover has an exception handler. The exception handlers are all the same: if this is the function in which recover returned a value, then simply return from the current function, effectively stopping the stack unwind. If this is not the function in which recover returned a value, then resume the stack unwinding, just as though the exception were rethrown in C++.

This system is somewhat baroque but it appears to be working. Everything is reasonably efficient except for a call to recover which does not return nil; that is as expensive as a C++ exception. Perhaps I will think of ways to simplify it over time.

Comments

Proposition 16

California’s proposition 16, which will be voted on next Tuesday, is an interesting use of California’s bizarre ballot initiative process. The proposition says that if a local government wants to start a municipal electrical utility, it must get a 2/3 majority of votes. The proposition was initiated and almost entirely funded by PG&E, a California electric company, which has reportedly spent over $35 million on advertising in support of the proposition. I rarely watch television, but I’ve received quite a few flyers in the mail about it. Some even list a long set of candidates and positions to endorse, along with proposition 16. However, since these flyers are required to list the groups which explicitly endorsed the flyer, it’s easy to see that the flyers are being put out for proposition 16, and they are trying to slide in support for it along with other candidates I might be inclined to vote for anyhow.

PG&E is the only company from which I can buy my electricity. Electricity distribution is a classic natural monopoly; why would two different companies put up wires to everybody’s house? Since modern life requires electricity, and since I can only get it from PG&E, I am a PG&E customer. Proposition 16 appears to be designed solely to preserve PG&E’s monopoly position, by making it much harder for communities to create their own municipal power companies. It would be hard enough to get a majority vote in favor of government run power; I think we can assume that a 2/3 majority would be effectively impossible. It’s pretty darn annoying that PG&E is spending $35 million, including money they collect from me, on this. Is that a good use of my money?

Municipal power is not impossible. For example, Palo Alto, California, uses a municipal power utility. It was notable during the rolling blackouts that affected most of California in the early 2000s that Palo Alto was immune. So it’s not as though PG&E deserves to be protected because they are doing a particularly good job. When a real crunch time came, they did a very poor job indeed.

It’s difficult for me to imagine why anybody would vote in favor of such a blatant power grab by a private company. But then, of course, there’s the $35 million. The opponents of proposition 16 have reportedly raised less than $100,000. I assume that if PG&E succeeds we will see more and more cases where private companies spend lots of money on ballot initiatives in their favor. I hope that it fails, and if I have any readers in California I encourage you to vote against proposition 16 this Tuesday.

Comments (3)

Shrek Distances

I’m sure I’m not the only person troubled by the loose geography in the Shrek movies. In the first movie Shrek takes a couple of days to get to the dragon’s castle. In the second movie Shrek and Fiona appear to take a few days to get from Shrek’s swamp to Far Far Away by coach. However, in the fourth movie, Shrek walks from his swamp to the dragon’s castle and then to Far Far Away all in a single night. There is no explanation. How is this possible?

Even allegorical fairy tales are weakened by a lack of internal story consistency.

Comments

GCC in C++

I’m very pleased to see that the GCC steering committee has agreed to permit GCC to be written in C++. At one time RMS, who is a member of the steering committee, had felt that C++ was never appropriate for systems programs like GCC. It’s good to see that he has apparently come around.

There has been a long effort to prepare for this, by moving GCC’s code base from C to the common subset of C and C++. While people naturally think of C++ as an extension to C, they are different languages and there is a lot of C code which is not valid C++. In the GCC code base, one of the biggest issues was that enums are more restricted by the type system in C++ than they are in C. Another was that in C++ you may not use the same name as a typedef and a struct tag, except for the special case of making the struct tag be a typedef for the struct itself.

Gabriel Dos Reis did the first substantial work on moving the GCC code base to the common subset, and many other people contributed. I think it’s fair to say that I did the lion’s share of the work, starting with my surprise presentation on the advantages of C++ at the 2008 GCC Summit. I did a lot of work to improve the -Wc++-compat warning option to warn about C code which was not in the C/C++ common subset, and I did a lot of work to make GCC code compile with that option without warnings.

C++ will not magically make the GCC code base better. However, I believe that it will give us some useful tools to incrementally improve the code base over time, making it easier to read, easier to modify, and more efficient. I say this not based on theory, but on my experiences with gold and with the gccgo frontend. I’ve already started writing some draft C++ coding conventions which I hope we can use to guide our efforts.

Comments (4)

GCC Project

GCC as a free software project is clearly very successful. Over more than 20 years it’s grown from nothing to become the standard compiler for several operating systems and many microprocessors. So far in 2010 the core part of the compiler alone has seen over 1000 commits by over 100 contributors. GCC continues to get significant new features; e.g., the recent GCC 4.5 release includes a new link time optimization facility.

On the other hand, the GCC project has some problems. The major individual contributors to GCC are hired to work on it. That means that they have a lot of time and resources to use to improve the compiler, which is good. However, it also has some negative effects. It’s difficult for new volunteers to join the community. It’s hard for them to learn the code base and it’s hard for them to keep up with the pace of change. It’s also hard for them to learn the conventions of how the project works, and they get little help in getting their patches in. Also, the people who work on GCC have learned the intricacies of the code base over time. They do not rely on the internal documentation. The effect is that the internal documentation for some parts of the code base is quite weak, and none of the main contributors are motivated to fix it.

Another, separate, problem is that there is no single person or group with a clear ability and willingness to decide on the direction of the project. In the past the direction has been set at different times by people like Richard Stallman, Richard Kenner, Jeff Law, and Richard Henderson. None of them are playing that role today. The effect is that nobody can see whether significant new features should or should not go into the project, which leads to a tendency for inconclusive discussions and unreviewed patches. People hoping to contribute are left with no clear path forward. (I should mention that groups like the GCC Steering Committee and the GCC Release Managers are effective but do not take on this role, which is more that of an architect.)

A third problem is that GCC has no public relations activity. The project web page tells you what GCC is but says nothing about how it compares to other compilers or how it has improved over time. There are some common criticisms of GCC, such as the belief that it is measurably worse than proprietary compilers, or that it is stagnating, which the project makes no attempt to discuss or dispute.

None of these issues are critical. As I said, GCC is highly successful. But they are areas where I think GCC could improve. Unfortunately, pointing out these issues is insufficient; it’s necessary for peole to step up to take on these roles. The various companies which pay people to work on GCC are generally less interested in these aspects of the project, which makes it that much harder to find people to work on them.

Comments (7)

« Previous Page« Previous entries « Previous Page · Next Page » Next entries »Next Page »