Whenever a call to __chkstk was made, the frame lowering previously
omitted the aligning (as NumBytes was reset to zero before doing
alignment).
This fixes https://github.com/llvm/llvm-project/issues/56182.
Differential D135687
[AArch64] Fix aligning the stack after calling __chkstk mstorsjo on Oct 11 2022, 9:03 AM. Authored by
Details Whenever a call to __chkstk was made, the frame lowering previously This fixes https://github.com/llvm/llvm-project/issues/56182.
Diff Detail
Event TimelineComment Actions In general, I think we need to pass the number of bytes required for realignment to __chkstk, instead of directly realigning the stack. As discussed on the bug. Comment Actions Oh, right, I didn't realize that we could fix that part of the requirement that easily. Comment Actions Actually, on second thought, either this requires a bit more changes than that (deviating more from the pattern for how __chkstk is used) or it doesn't actually fulfill the intent you have here (or I'm still overlooking something). Currently, the call to __chkstk looks like this: mov x15, #(NumBytes/16) bl __chkstk sub sp, sp, x15, lsl #4 If we'd include the alignment in the probing, it'd look like this: mov x15, #((NumBytes+Align)/16) bl __chkstk sub x15, sp, x15, lsl #4 and sp, x15, #(AlignMask) Here, the call to __chkstk does cover the extra align amount, but we also decrement the stack pointer by that amount, so when doing the alignment masking, we can still decrement the stack pointer further past what's been probed. To avoid this, I guess we'd need to add an sub x15, x15, #(Align/16) after the call, before decrementing the stack. Is that how you meant it? Comment Actions I was thinking you could do something like this: mov x15, #(NumBytes/16) ; Stack allocation size mov x9, sp and x9, x9, #AlignMask ; compute aligned stack sub x9, sp, x9 ; compute number of bytes required to align stack add x15, x15, x9, lsr #4 ; add bytes to chkstk input bl __chkstk sub sp, sp, x15, lsl #4 There's probably some slightly more efficient way to write this, but that's the idea. Comment Actions Hmm, ok, I see. I guess that'd work, but it feels a bit roundabout IMO. It looks like MSVC does something like this: mov x15, #(NumBytes+Align)/16 bl __chkstk sub sp, sp, x15, lsl #4 mov x8, sp add x9, x8, #(Align-1) and x0, x9, #AlignMask (The mov x8, sp feels superfluous here?) It doesn't align the stack pointer itself, but I think we should be able to do the same (while wasting fewer registers) by doing the last and into sp. I think I'd prefer something like this: mov x15, #(NumBytes+Align)/16 bl __chkstk sub sp, sp, x15, lsl #4 add x9, sp, #(Align -1) and sp, x9, #AlignMask Comment Actions After staring at my code sequence a bit more, it simplifies to: mov x15, #(NumBytes/16) mov x9, sp ubfx x9, x9, #4, #AlignBits add x15, x15, x9 bl __chkstk sub sp, sp, x15, lsl #4 Your suggested sequence also works, I guess, but it feels a little weird to overallocate, then deallocate the extra memory. MSVC doesn't actually realign the stack in the sense LLVM does; it just overallocates stack memory, then uses masking to align the addresses of variables. Comment Actions Yeah, that's a bit clumsy (but the amount of extra instructions is quite small). I find your suggestion very neat, but it assumes that NumBytes itself is aligned to the alignment, otherwise we break the realignment we calculated from the actual stack pointer. In practice it isn't - the total allocation of the stack frame (including CSRs) is aligned, but NumBytes isn't. And trying to calculate how much is spent on the other bits and adjusting it here feels very brittle too. To counter that, we'd have to overallocate the chkstk amount to make sure that it is aligned on its own, and that's not very pretty either. Comment Actions Updated to overallocate the chkstk amount based on the alignment, and then reducing the overallocation to align the stack pointer in the end. I'm not very fond of this solution (your version would be neater), but I'm updating the patch to bring it a bit forward for discussion at least. Comment Actions Oh, to account for unaligned NumBytes, my solution would have to be something more like: sub x15, sp, #NumBytes and x15, x15, #AlignMask sub x15, sp, x15 lsr x15, x15, lsr #4 bl __chkstk sub sp, sp, x15, lsl #4 (If NumBytes is large, add 1 or 2 instructions to materialize it.) Maybe emitting extra instructions to avoid overallocating isn't worth it, though...
Comment Actions Reduced the amount of overallocation; moving back by (RealignmentBytes-16) aka RealignmentPadding in the new revision instead of (RealignmentBytes-1) when aligning the pointer at the end. Using encodeLogicalImmediate to encode the immediate, added a missed setStackRealigned(true), added an assert that NeedsWinCFI is false at this point. Comment Actions Added a testcase for one case where things broke in practice with the previous form. Since the previous version, I moved the setting of NeedsRealignment further down in the function. Previously I initialized this variable early in the function, before the first call to windowsRequiresStackProbe. However, NumBytes may be nonzero at this point (making NeedsRealignment true), even if NumBytes would be zero at the point when realignment is done later in the function, prior to the patch. We don't need to include RealignmentPadding in the first initial check with windowsRequiresStackProbe, since if we're actually going to realign the stack, we do have a stack frame anyway. Additionally, don't set RealignmentPadding to a nonzero value, if the requested alignment is less than 16. In the added testcase, MFI.getMaxAlign() is 8. (Curiously, for non-Windows OSes, this input actually does result in generating code for ANDing the stack pointer to align it to 8 bytes though.)
|
This is allocating more than necessary? We only really need to add RealignmentBytes ? RealignmentBytes - 16 : 0 bytes.