Currently funclets reuse the same stack slots that are used in the parent function for saving callee-saved xmm registers. If the parent function modifies a callee-saved xmm register before an excpetion is thrown, the catch handler will overwrite the original saved value.
This patch allocates a second stack slot to be used in the EH funclets for saving these same registers. Long term, it would be better to determine actual CSR use by the funclets and only allocate the extra space when needed.
I don't think this will work long in practice because we are storing XMM into the parent function's stack frame, but the .seh_savexmm directive describes locations relative the the funclet's RSP. So, when the stack is unwound, (throw an exception out of a catch block) XMM CSRs will not be restored correctly.
Another idea I had for fixing this was to change X86FrameLowering::getFrameIndexReference to do something special for XMM CSRs (we should be able to find the list of them somewhere), and resolve them to some SP-relative offset in the funclet's frame. We'll have to adjust the "SUB RSP, 32" that we currently emit for every funclet as well for that to work.