As I’ve written before, multi-threaded code is hard to write. It becomes even harder to write when you find that lock contention is an efficiency issue. On a modern multi-core system locks require some sort of memory cache coordination between the various cores. If your program is written using fine-grained locks which may be held by threads running on different cores, access to those locks can become relatively expensive. For high performance code it then becomes desirable to avoid the locks when possible.
When you do this, you have to understand memory barriers. A memory barrier enforces a weak global ordering on memory reads and writes. Without a memory barrier, depending on cache behaviour, it is quite possible for one core to write A before B but for another core to see the write to B before it sees the write to A. Memory barriers enforce consistency.
Unfortunately, different architecture define memory barriers in different ways. And different implementations of different architecture implement them in different ways. So it is very easy to write code which works reliably in some places but fails occasionally in others.
Also, the compiler knows nothing about memory barriers (this will change to some extent with the next C++ standard and the atomic<> types). The compiler is generally free to casually reorder memory reads and writes as long as the final result is correct. When writing lock-free code using memory barriers, you need to carefully indicate what you are doing in two different ways: once for the processor and once for the compiler. This will normally be done via some sort of inline assembler construct. And of course you want to make this run as efficiently as possible, since that is the point of the exercise in the first place.
So basically getting all this right is a big hairball. The next C++ standard will push this hairball onto the compiler and library implementors. The types aren’t all that easy to understand, but at least they should work correctly. Until then, or when working in C, it’s really hard to write correct portable lock-free code. Better to avoid fine-grained locks, I think, or at least to avoid contention for them.
Some people argue that transactional memory will solve these problems. I am very skeptical of this. I’ll write more about this later.