
Joe Duffy is a program manager for the CLR team and has recently wrote a post on concurrency planning that he is orchestrating for the next few releases of CLR. He mentions a host of possible new abstractions such as futures and barriers.
I listened to Duffy’s PDC presentation on concurrency in which he provided a number of examples of why multithreading programming was hard.
In his first example, he demonstrated a naive implementation of a spin lock that turned out to be wasteful and unfair. Unfortunately, the correct implementation he demonstrated required handling for the case of hyperthreading and other low-level issues. He made it clear that most programmers should not write their own locks, but use the system-provided ones. Unfortunately, the existing ones in the framework aren’t the best but will probably be improved in future versions.
I have also seen earlier a similar demonstration by Jeffrey Richter about correctly implementing a spin lock, but Richter’s implementation differed from Duffy’s in quite a few ways. Richter’s spin lock, for instance, pinvoked SwitchToThread from Kernel32.dll on a single processor machine, and called Thread.SpinWait on multiprocessor machines. The differences made me question whether one or both of these locks were still incorrectly implemented. If experts can’t get it right, how can you expect to?
In another example, Duffy described how compiler and hardware optimization techniques such as instruction reordering which typically doesn’t affect the behavior of code in single threads can produce incomprehensible “Heisenbugs” on multi-threaded machines.