Pass parameters properly in calls to such functions (pass all floats in integer registers), and handle va_start properly (allocate stack immediately below the arguments on the stack, to save the register arguments into a single continuous array).
Details
Diff Detail
Event Timeline
Added an initial test, fixed va_copy to copy the right amount of bytes, fixed the va_list pointer if the first variable argument is on the stack.
lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
2889 | I don't understand this line, particularly the GPRSaveSize & 15. I'd expected to see -GPRSaveSize since the args are on the stack just before the canonical frame address (i.e. SP on function entry). I assume the masking is a crude version of alignTo? I don't think that's going to be right. You need to make sure the stack is aligned in FrameLowering, but you're still going to want the FrameIndex that gets put into a va_list is able to point an odd number of registers below SP on entry if necessary. Things look equally bad when I dump the MIR (e.g. from your first test-case): [...] fi#-1: size=56, align=8, fixed, at location [SP+8] [...] STRXui %X1<kill>, <fi#-1>, 0; mem:ST8[FixedStack-1+8] So LLVM thinks X1 is going at the incoming SP+8 rather than (as actually happens) incoming SP-56 (I think). So I think something is going horribly wrong during FrameLowering and LLVM is losing track of where SP actually is compared to function entry. I'll try to take a look again later, but I've not spotted the issue yet. | |
2901–2902 | I think this is lying about the location isn't it? They're not all being stored at GPRIdx. |
I've thought some more and it seems pretty clear now that the difference is exactly that extra GPRSaveSize in FrameLowering. That needs to be incorporated into LLVM's view of the stack size and layout.
I think just using the correct offset in CreateFixedObject will do it for the size (though this is a pretty hazy memory from when I looked in detail multiple years ago so take it with a pinch of salt). That leaves the layout as the real problem: you can probably skip the emitFrameOffset but you need to convince FrameLowering to leave a gap before storing the callee-saved registers instead.
OK, I now see that this actually works. You're essentially redefining the CFA (canonical frame address -- where LLVM bases its SP calculations) to the aligned SP after the extra GPRs have been pushed.
It's not thoroughly broken, but it is confusing and as far as I'm aware unprecedented (32-bit ARM always does this kind of jig to save varargs GPRs but doesn't mangle the CFA in the process). I still think switching back to the original CFA along roughly the lines I proposed earlier is the best way to implement this.
Thanks for taking a look! Do you have a more concrete suggestion on how to make frame lowering leave a gap at the top, that doesn't mess up the layout like this does? This was a result of trying to take inspiration from the same code for arm, with a lot of blind guesswork and trial and error. If not, I can try to get back to trying things later today.
Things look equally bad when I dump the MIR (e.g. from your first test-case):
What parameters do you use to produce this MIR dump btw? I found http://llvm.org/docs/MIRLangRef.html but llc -stop-after=machine-cp doesn't really give me the things you quoted.
Avoiding the extra manual stack adjustment, using CreateFixedObject properly (or at least more properly than before).
lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
4591–4593 | IMO it would be nice to make this match with two ifs. Now that we have 3 cases, the old ternary conditional doesn't look right. |
This seems like the right approach to me. It's like the Win x64 "shadow slots" are allocated on the callee side instead of the caller side. We might want to consider implementing /homeparams at some point across the Windows-supporting backends, which will require more FrameLowering surgery.
Fixed the style issue pointed out by Reid, adjusted the test case and flipped the immutable parameter to CreateFixedObject.
Thanks for updating the patch. Looks really good to me, just one trivial comment about a comment.
lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
2833–2834 | Minor nit (just change and commit), but that's peripheral to why Win64 needs registers to be saved. The key point is something along the lines of "Win64 also passes variadic arguments in registers". |
Minor nit (just change and commit), but that's peripheral to why Win64 needs registers to be saved. The key point is something along the lines of "Win64 also passes variadic arguments in registers".