Archive for October, 2007

Better Holidays

I’m sure that many other people have noticed that stores are offering a lot more items with a Halloween theme these days, and that there are already Christmas themed items available on shelves. The Christmas stuff in particular seems a bit nuts, since it started appearing more than two months before Christmas.

However, I am not now going to complain about the ongoing commercialization of what was once a more contemplative holiday. In general, buying and selling is good for the economy, so using holidays to encourage people to spend money is by and large a good thing. (Assuming, of course, that people are also saving for their old age, which they are not, but that is a problem which the U.S. will have to deal with in some other way.)

The problems I see are 1) there aren’t enough holidays associated with buying things, and 2) many of the Christmas items tend to be useless junk, the consumption of which uses up natural resources and landfill space. So what we need to do is associate the transfer of money with more holidays, without encouraging people to buy junk. That means that we need to encourage everybody to buy services. Services are in any case a better economic stimulus, as all the money stays in the local area, and is generally quickly spent again by the service provider.

I think the most obvious existing holidays which could use some commercialization in the U.S. are Memorial Day and Labor Day.

Labor Day is early for harvest, but we could associate it with buying local farm produce. The idea would be something like “have a local food feast–invite friends, co-workers, neighbors.” That would keep most of the money in the local area, and encourage local food production, which is generally a good thing since it is requires fewer resources. Of course Thanksgiving could also focus on local food, but despite being a harvest festival it is generally too late in the year, and it is already associated with specific foods which would be hard to change.

Memorial Day is already associated with the military, picnics, and barbecues. I’m not seeing much there. Maybe the theme could be preparation for summer. It could be a time to repaint the house or have a garage sale. Or maybe join a gym, change your haircut, get a manicure. Of course volunteering is also good, though it doesn’t lead to much in the way of transferring money. OK, I’m reaching. Perhaps somebody can figure something out here.

In general I think this could be a good approach to use our love of holidays, and buying things, to help the economy in a tangible way.


Gcc vs. Users

Development of the gcc compiler faces recurring tension between people who want gcc to generate better code and people who want gcc to support their style of programming. The languages which gcc supports all have standards which are intended to provide an interface between the programmer and the compiler. The programmer can assume that the compiler implements the standard, and the compiler can assume that the programmer will not rely on behaviour which the standard declares to be undefined.

Unfortunately, especially for C, programmers make assumptions that have historically been true for C compilers but are not guaranteed by the standard. This leads to trouble. Some examples of trouble in gcc development are:

  • Type-based aliasing. The standard says, basically, a memory location written to by a pointer of one type may not be read by a pointer of a different types (ignoring qualifiers like const). gcc takes advantages of this to rearrange memory loads and stores, which is desirable to increase scheduling flexibility. Unfortunately, this breaks a fair amount of code. While gcc enables type-based aliasing by default when optimizing, it provides an option to disable it: -fno-strict-aliasing. In order to better detect code that may break, I suggested a way to warn about such code; the warning was implemented by Silvius Rus at Google, and will be in gcc 4.3.
  • Undefined signed overflow. The standard says that overflow in a signed computation is undefined behaviour. gcc takes advantages of this in various ways, notably to estimate how many times a loop will be executed. Unfortunately, this breaks code which assumes that signed overflow wraps using modulo arithmetic. I implemented a -fno-strict-overflow option to disable the assumption, and I also implemented a -Wstrict-overflow option to warn about cases where gcc is making the assumption. These options are both in gcc 4.2.1.
  • Inline assembler. It’s difficult to write gcc inline assembler correctly, because it’s difficult to express the constraints and dependencies. If you fail to indicate exactly what your inline assembler requires and precisely what it does, gcc can and will rearrange the code such that it does not work. The documentation of inline assembler is not great.
  • Changes to the inliner. This mainly affects kernel developers. The kernel more or less requires that certain functions be inlined (this may have been more true in the past than it is today). gcc does not provide any guarantees about the behaviour of the inliner. There have been cases where it has changed such that functions which should be inlined no longer are. gcc does provide always_inline and noinline attributes which can be used to control this.
  • Compilation speed. The compiler has gotten somewhat slower over the years. It’s hard to speed it up. Periodically users run across some horrible test case.

The flip side of these usability issues is, of course, that gcc tries to take advantage of the flexibility it is permitted to generate better code. What tends to happen is that some gcc developer will find a way to speed up a specific test case. He or she will implement the optimization in gcc. This will quietly break some formerly working code which unknowingly relied on undefined behaviour. The programmer won’t consider the formerly working code to be undefined, or will feel that gcc should define it anyhow. This leads quickly to an argument in which the gcc developers say that the standard is clear and ask for a clear proposal for how the standard should work instead, while the user says that the code is clear and the compiler is crazy. These arguments have a characteristic pattern in which both sides talk past each other because they are talking about entirely different things: one talks about standards, the other about existing code. The usual outcome is a new compiler option.

It would be easier if gcc weren’t required to be all things to all people. Some people want gcc to generate better code, and compare it to other compilers. Some people want gcc to continue compiling their current code in the same way. These desires are contradictory. gcc developers tend to pay more attention to people who want better code, because that work is more interesting.

I do feel that most of the issues get resolved. The larger problem is not the issues themselves, it is the bad blood left behind by the arguments. Since the arguments tend to talk past each other, it is not easy to find common ground. If there were another free compiler which was more responsive to user suggestions, it would have a good chance of supplanting gcc. It will be interesting to see whether the gcc development community is able to respond to compilers like LLVM and Open64.

Part of the problem is that there is no person who can speak for gcc. I’m not going to name names, but several of the people who participate in these arguments are abrasive and not inclined to look for common ground. This does a lot of harm to gcc over time in the larger free software community. gcc’s organizational structure, with a steering committee which does not participate directly in development, means that nobody is actually in charge. Many people are paid to work on gcc (including me), but this naturally means that most of them are required to focus on areas of interest to their employers, rather than on the gcc project as a whole.

Despite all this, I’m generally optimistic about the future of gcc. There is a lot of good work being done by a lot of people. The gcc community cooperates well within itself, and rarely gets stuck on non-technical issues. It’s easy to focus on the negative. It’s harder to notice that gcc continues to develop quickly and continues to respond to users.

Comments (9)

Social Networking

Web sites like Facebook, MySpace, LinkedIn, Orkut are billed as community sites. You can connect to your friends, and find out what they are up to. It’s a good way to keep track of what people you know are doing. I have a page on three of these sites. Since I am not a very social person, I find the LinkedIn page to be most useful: people update it when they change jobs, and from time to time LinkedIn sends me an e-mail message about changes in the people to whom I am linked.

That’s nice, but it’s not all that exciting. What are these community sites really good for? It seems to me that they are good for keeping up with people who you don’t see every day, and they are good for meeting friends of friends. They could be good for getting recommendations for movies, books, etc., on the theory that you probably like the sorts of things your friends like. They’re a convenient way to have a home web page, though there are several other ways to do that. Arguably I am not the best person to see the use of these sites, since I am not a heavy user. But that fact in itself means that these sites aren’t that useful to some people.

Facebook is in the news these days because Microsoft paid them an amount of money which gives them a $15 billion valuation. Of course that is a very misleading number: Microsoft paid to place ads on Facebook, and Microsoft took a stake in the company. Microsoft will make money selling those ads. In order to determine the valuation that Microsoft put on the company, one needs to distinguish the amount Microsoft paid for ad placement and the amount Microsoft paid for a stake in the company. We don’t know those numbers, so we don’t know what Microsoft really thinks Facebook is worth.

Facebook is also in the news for providing an API for people to write applications which can then appear on their site. That is a nice idea. But really the most interesting thing about these sites is that they provide a network of people. The sites use that to provide information on the site itself, but it would be much more interesting if they used that network to make other sites more useful. For example, while it’s useful on LinkedIn to see where other people work, it would be much more useful to see what other web sites people in my network like. And this might be what it takes to make web annotations really work–a network of people whose annotations you can see.

So what Facebook really should be doing is exporting their network so that other people can write applications which use it. Of course there are some serious privacy issues, but they should be resolvable–Facebook would have to make sure that the contents of the network were not exposed directly, just via things like “propagate to all connections” or “are these two people connected?”

Once Facebook does that, though, there is less reason to spend time on the Facebook site itself. And since Facebook doesn’t seem to have a revenue plan other than advertising, it’s not clear how they will make any money by exposing this information (it’s not clear that they will make much money on advertising either, but it is at least possible). So while I hope they see their way clear to doing this, I don’t know whether they really will.

Comments (2)

Single Threaded Memory Model

One more round on the parallel programming theme. There has been some recent discussion on the gcc and LKML mailing lists about an interesting case.

static pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
static int acquires_count = 0;

int res;

res = pthread_mutex_trylock(&mutex);
if (res == 0)

return res;

On x86 processors, current gcc will optimize this to use a comparison, an add with carry flag, and an unconditional store to acquires_count. This eliminates a branch, but it means that the variable is written to even if the lock is not held. That introduces a race condition.

This is clearly permitted by the C language standard, which in general describes a single threaded model. The standard says nothing about precisely when values are written to memory. It simply says that if you assign a value to a global variable, then when you read the global variable you should see that value.

It may seem that this should be an invalid optimization: the compiler should not move stores out of a conditionally executed basic block. Several people have reacted to this test case in this way. However, note that the first function call could be anything–it could even be a test of a global variable. And note that the store could be a load–we could load a global variable twice, protected by some conditional both times. If we require that this code be safe in a multi-threaded environment, then we can not coalesce the loads. That would be OK with people who write C as a fancy assembly language. But it would significantly hurt performance for complex programs, especially in C++, in which the conditionals would be in different functions and then inlined together.

So what should the compiler do, given a choice between optimizing complex C++ code and helping to generate correct complex multi-threaded code? Complex C++ code is much more common than complex multi-threaded code. I think the compiler is making the right choice. And it conforms to the language standard.

For this code to be correct in standard C, the variable needs to be marked as volatile, or it needs to use an explicit memory barrier (which requires compiler specific magic–in the case of gcc, a volatile asm with an explicit memory clobber). But many people don’t see that–including Linus Torvalds.

I think this is a nice example of my earlier point, which is that our current models of parallel programming are generally too hard for people to use.

Comments (19)

Gold Workqueues

The gold linker is multi-threaded. I’ll sketch the implementation I chose.

I wanted to avoid using lots of fine-grained mutexes, because they are both error-prone and expensive. Instead, I used a workqueue system. This requires breaking up the job into, essentially, a state machine. The whole link is defined as a series of relatively independent tasks. There are ordering requirements among the tasks.

For example, one task is “read the symbol table of object foo.o” and another is “add the symbols from foo.o to the global symbol table.” Naturally the second must follow the first. Also, while we can read the input symbol tables in any order, we must add them to the global symbol table in the order given on the command line.

The single main thread manages all locking for tasks. A task starts with a (possibly empty) set of locks which it must hold in order to start. These locks are not mutexes, but are simply a class which keeps track of who has locked it and who is waiting. When a task does start, it acquires its locks. This happens in the main thread, so no mutexes are required. The main thread then dispatches the task to a subsidiary thread. When the task is completed, it notifies the main thread (this requires a condition variable). The main thread is then responsible for freeing the locks, possibly moving other tasks to the ready queue.

The workqueue supports both locks and blocks. A lock is a shared or exclusive lock on some object, such as an input file. A block is a semaphore, with a count. Each time a task associated with a block is created (not run, just created), the block count is incremented. When the task completes, the block count is decremented. When the block count hits zero, it enables some other task to run. For example, this is how gold controls the task of mapping the input sections to the output sections, which can only happen after all the input object files have been read and added to the global symbol table. The block is decremented as each input task completes, and mapping task waits for the block to hit zero.

In some ways this is a relatively simple approach, with a minimal number of mutexes. However, it does require some careful reasoning about what objects a task will require access to. Because there are no locks associated with objects, there is no convenient way for a task running in a subsidiary thread to assert that it has access to the object. In order to avoid race conditions, the code simply has to be correct. And that means that anybody working on the code has to understand the locking strategy.

While I think this is a reasonable approach, I do not yet know how maintainable it will be over time. State machine implementations can be difficult for people to understand, and the high-level locking is vulnerable to low-level errors. I know that one of my characteristic programming errors is a tendency toward code that is overly complex, which requires global information to understand in detail. I’ve tried to avoid it here, but I won’t know whether I succeeded for some time.

It would certainly be simpler if I could have written the code in a straightforward fashion rather than breaking it up into tasks, and if I had tools to somehow make that run in parallel where possible. But that seems to be a pipe dream.


« Previous entries Next Page » Next Page »