The previous attempt, which made do with a single offset in computeCalleeSaveRegisterPairs, wasn't quite enough. The previous attempt only worked as long as CombineSPBump == true (since the offset would be adjusted later in fixupCalleeSaveRestoreStackOffset).
Instead include the size for the fixed stack area used for win64 varargs in calculations in emitPrologue/emitEpilogue. The stack
consists of mainly three parts;
The latter two are often used together (for the offset from the original stack position, to the start of the local stack area) and are summed up in the CSStackSize variable. (Reviewer question: Should we keep it this way, or keep CSStackSize as is, and create a new variable for the sum, such as RegDumpSize?)
In addition to moving the offsetting into emitPrologue/emitEpilogue which fixes functions with CombineSPBump == false), also set the frame pointer to point to the right location, where the frame pointer and link register actually are stored. In addition to the prologue/epilogue, this also requires changes to resolveFrameIndexReference.
Add tests for a function that keeps a frame pointer and that uses a VLA.
Outside of the tests in the testsuite, I've now tested this with a large array of different vararg function variants, including VLAs, both targeting win64 and when using the win64 calling convention on linux. For linux, I've also tested to make sure that C++ exceptions can be properly passed through a win64cc vararg function and checked that the produced .cfi directives look correct.