Copying all of the saved register state on every entry to
parseInstruction is a severe performance contraint, especially
because most of this saved state is never used. On x86 linux
this is about 560 bytes, and will be more on other platforms.
When performance testing libunwind, this memcpy appears at the
top of nearly all our tests.
By only saving this state as needed, we see increases in performance
of around 2.5% for the ctak test here.
Certain internal extremely exception-heavy tasks run in about 2/3
Note that by stashing the new boolean inside what had been padding in
the original structure, this uses no additional memory.