Before this change, callee-save registers would be rounded up to even
pairs of GPRs and FPRs. This change eliminates these extra padding
load/stores, though it does keep the stack allocation the same size
unless both the GPR and FPR sets have an odd size, in which case one
full pair stack slot (16 bytes) is saved.
Details
Diff Detail
Event Timeline
Hi Geoff,
Unfortunately this has implications for exception unwinding on iOS. We have fast-path .debug_frame equivalents that only have 32 bits to store the prologue's format, and the format assumes the registers are stored in pairs.
On the other hand, we really don't want frame lowering to be any more complex than it absolutely has to be. For now, we probably have to go with the existing format on Darwin but I think we can probably come up with a better determineCalleeSaves that serves both needs.
To start with, some observations (none of them your fault, the existing code did it):
- UnspilledCSFPRs is completely unused.
- Only one value is really needed in UnspilledCSGPRs: we need one to scavenge, but that's it.
- Only the sum of NumGPRSpilled and NumFPRSpilled is used.
So, I think we can rewrite the top loop as:
// Figure out which callee-saved registers to save/restore. for (unsigned i = 0; CSRegs[i]; ++i) { const unsigned Reg = CSRegs[i]; bool RegUsed = SavedRegs.test(Reg); if (!RegUsed && AArch64::GPR64RegClass.contains(Reg) && !RegInfo->isReservedReg(MF, Reg)) { // Save this one for later in case we need a scratch register. UnspilledGPR = Reg; continue; } // MachO's compact unwind format relies on all registers being stored in pairs. // FIXME: the usual format is actually better if unwinding isn't needed. unsigned PairedReg = CSRegs[i ^ 1]; if (MF.getSubtarget().isTargetMachO() && !SavedRegs.test(PairedReg)) { SavedRegs.set(PairedReg); ExtraCSSpill = true; } } NumRegsSpilled = SavedRegs.size(); CanEliminateFrame = NumRegsSpilled == 0;
Do you think you could make something like that work, or have I overlooked a key use that brings my whole edifice crumbling down?
Cheers.
Tim.
lib/Target/AArch64/AArch64FrameLowering.cpp | ||
---|---|---|
760 | I think this should be asserting 16: it matches the comment a few lines above, the corresponding set call, and the ABI requirement. |
Oh, and the debug printing can probably be a lambda'd-up std::for_each or similar to avoid complicating the control flow of the loop further.
NumRegsSpilled = SavedRegs.size();
Sorry, that would definitely be .count() if the loop's at all viable.
Tim,
Tim,
Thanks, this is the kind of thing I suspected was probably hiding out there. I assume these MachO fast unwind restrictions also would prevent doing non-leaf frame pointer elimination and redzone optimization? Do you think these restrictions are currently captured in the tests that I modified? Perhaps some of these tests need another non apple triple run/check so we cover both paths?
Hi Geoff,
I assume these MachO fast unwind restrictions also would prevent doing non-leaf frame pointer elimination and redzone optimization.
We'd probably want to keep non-leaf frame pointers independently of unwinding (people are very keen on getting good backtraces here). Similarly, we'd have to check our libc (etc) for red zone violations before it could be turned on.
Are you thinking of using the red zone for callee-saved registers too? That might have unwinding implications.
Do you think these restrictions are currently captured in the tests that I modified?
I think I should probably specifically write a few basic compact unwind codegen functions which would have broken with your patch. Just a sort of smokescreen to catch obvious things that should be using the compact format. I'll do that today (hopefully).
Cheers.
Tim.
Hi Tim,
I've uploaded a new version which should hopefully address the MachO unwind issues. I've tried to add more test coverage to the existing lit tests, but I think adding the MachO specific smoke tests would also be a good thing.
Thanks for the update regarding non-leaf frame pointer omission and red-zone usage. I'm still exploring the options at this point and just wanted to get any iOS specific requirements in my head early. The first step I'll take in this direction is probably just getting -fomit-frame-pointer to actually to work for non-leaf functions.
Thanks for rewriting it to accommodate MachO. I've got a couple of minor suggestions, but only saw one likely bug (the unconditional register increment).
Cheers.
Tim.
lib/Target/AArch64/AArch64FrameLowering.cpp | ||
---|---|---|
702–707 | Possibly: assert(AArch64::GPR64RegClass.contains(RPI.Reg1) || AArch64::FPR64RegClass.contains(RPI.Reg1)); RPI.IsGPR = AArch64::GPR64RegClass.contains(RPI.Reg1); | |
760 | Did you decide a stack size == 8 (mod 16) was allowed in some circumstances? | |
957 | The logic would be marginally simpler if you set this unconditionally and then used the MachO test when actually using it. | |
1008 | This will be incremented even if UnspilledCSGPRPaired was already spilled. |
lib/Target/AArch64/AArch64FrameLowering.cpp | ||
---|---|---|
760 | I'm not sure that would be allowed. The AAPCS says "Additionally, at any point at which memory is accessed via SP, the hardware requires that SP mod 16 = 0. The stack must be quad-word aligned." It's a bit more complicated than that (it's configurable via SCTLR), but I think we should force anyone who wants to weaken that to think very hard about it. | |
1008 | Ah, neat! |
Thanks Tim. Are you okay with this going in, or should I wait for your additional test cases? I'm fine with either.
lib/Target/AArch64/AArch64FrameLowering.cpp | ||
---|---|---|
760 | I don't think the hypothetical future change that I'm thinking of would violate that rule. |
Possibly: