This improves unwind performance quite substantially, and follows
a somewhat similar approach used in libgcc_s as described in the
thread here:
https://gcc.gnu.org/ml/gcc/2005-02/msg00625.html
On certain extremely exception heavy internal tests, the time
drops from about 80 minutes to about five minutes.
Do we need any synchronization here if there are multiple threads?