The SigCtx function lazily allocates a ThreadSignalContext, and stores it
in the ThreadState. This function may be called by various interceptors and
the signal handler itself.
If SigCtx itself is interrupted by a signal, then (prior to this fix) there
was a possibility of allocating two ThreadSignalContexts. This not only
leaks, it fails to deliver the signal to the program's signal handler, as
the recorded signal is overwritten by the new ThreadSignalContext.
Fix this by using a CAS to swap in the ThreadSignalContext, preventing the
race. Add a test for this case.
A failed atomic_compare_exchange will load current value into ctx, so this additional atomic_load is unnecessary?