Some minor improvements:
- Make StaticSpinMutex non-copyable.
- Add LIKELY to Lock.
- Move LockSlow into the .cpp file (now that we have it).
- The only non-trivial change: use proc_yield(1) instread of proc_yield(10) with the proportional increase in the number of spin iterations. Latency of the PAUSE instruction has raised from ~1 cycle to ~100 cycles in the recent Intel CPUs. So proc_yield(10) is too aggressive backoff.