On windows, the callee saved registers in a canonical prologue are ordered starting from a lower register number at a lower stack address (with the possible gap for aligning the stack at the top); this is the opposite order that llvm normally produces.
To achieve this, reverse the order of the registers in the assignCalleeSavedSpillSlots callback, and adjust the code for matching up register pairs (to expect them in reverse order). In order to pair them properly when iterating in reverse order, restrict pairing pairs starting with an odd numbered GPR (e.g. for [x19, x20, x21], iterated over backwards, don't pair up x21 with x20, but leave x21 alone and pair x20 with x19).
This allows generated prologs more often to match the format that allows the unwind info to be written as packed info.
I've got an alternative implementation of the same, that keeps the register iteration order normal, but changes computeCalleeSaveRegisterPairs to lay things from the bottom up, instead of top down (contrary to the generic code in PrologEpilogInserter, that still lays out the matching stack objects top down). This results in a slightly smaller code change, but has the effect that the register names for the CSR stack objects (visible in MIR) actually mismatch where the registers really are saved.
I've also got two more patches coming up on top of this one, but I'm holding off the other ones until this one is settled, as one of the later ones depend on the exact form of this one.
With all of them applied, a 228 KB xdata section shrinks by 74 KB thanks to being able to write packed unwind info, ending up with smaller xdata than the corresponding section for an x86_64 build of the same DLL.
This if ends up being a little confusing to read... maybe worth duplicating the if (Reg2 == Reg1 + 1) check into both the integer and fp branches of the if statement.