Hyperthreaded Memory

One thing I didn’t really touch on in my earlier notes on multi-threaded programming is memory. As processors become increasing hyperthreaded and multicored, access to shared memory becomes the bottleneck. The obvious recourse of processor designers will be to break the sharing: each processor will have its own memory. We already see this in the Cell. And already in multi-core machines different processors have different local memory caches, although some multi-cores share an L2 cache. Those machines use complex cache snooping to maintain memory coherency among the processors.

So the highest performance of future programs is going to require many threads with processor affinity for threads, where the threads do not communicate via shared memory. Any access to shared memory is going to be a choke point, so people are going to want to write their programs to only access local memory.

That is probably a good thing for our future programming models. The difficulties with the multithreaded programming model all center on shared memory. If memory is not shared, we are in much better shape.

In this model, we need high bandwidth communication between the processors which does not to through shared memory. Ideally this will be modeled as a communication queue which can exist entirely in userland. Then different threads can exchange data via these communication queues. Presumably we would put a function call interface over the queues as well.

This model is really communicating processes rather than communicating threads. Without shared memory they would only really be threads in that they would share the same instructions (paged in from the same program file) without sharing memory. Creating a new thread would be calling fork and breaking the processor affinity.

Shared memory would still be possible, of course, via the paging system. However, it would most likely require explicit acquire and release calls to control access to it.

Although it’s easier to write correct code for this model, but it’s harder to write code in the first place. Casual sharing would be forbidden. Would people be willing to accept it? Is there an alternative model which gets us around the shared memory bottleneck?


Posted

in

by

Tags:

Comments

4 responses to “Hyperthreaded Memory”

  1. ianw Avatar
    ianw

    “In this model, we need high bandwidth communication between the processors which does not to through shared memory. Ideally this will be modeled as a communication queue which can exist entirely in userland. Then different threads can exchange data via these communication queues.”

    This sounds a lot like a microkernel to me …

  2. Ian Lance Taylor Avatar

    Thanks for the note. Yes, it probably is something like a micro-kernel. I recall that Mach had some sort of general I/O portal, but I never looked at how it is implemented.

  3. ncm Avatar

    As the complexity of caches and memory systems grows without bound, my expectation that they will all be implemented correctly approaches zero. Joe Coder’s ability to program these memory systems correctly starts out near zero, despite his great confidence in his skills. Bjarne Stroustrup reports that the “lockless programming” literature consists of alternations between “look, you can do this” and “no, that doesn’t work”.

    Eliminating shared memory between processors would simplify cache systems to the point of reliability. With MPI and pipes, it seems possible to code them reliably as well. MPI and kernels, then, would need to be right, but that means only a few people need know what they’re doing (unlike Joe Coder); if they get it wrong, they can fix it (unlike cache subsystems).

  4. Ian Lance Taylor Avatar

    It seems like a very plausible scenario to me.

Leave a Reply