This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Disable spill slot scavenging when stack realignment required.
AbandonedPublic

Authored by paulwalker-arm on Apr 18 2018, 6:44 AM.

Details

Summary

Prevent corruption of the callee-save region by spills whose offset calculation assumes FP-SP is a compile time constant.

Diff Detail

Event Timeline

paulwalker-arm created this revision.Apr 18 2018, 6:44 AM

Ugh, good find. Isn't this an issue if variable sized stack objects are present as well?

MatzeB requested changes to this revision.Apr 18 2018, 9:42 AM
MatzeB added a reviewer: thegameg.

Huh? You cannot just disable stack slot scavenging. Who guarantees you that you don't need it?
Is the bug related to the latest changes in PEI? (I've seen some commits talking about base and frame pointers flying by but did not have time to read them or dive into the details...)

This revision now requires changes to proceed.Apr 18 2018, 9:42 AM

@MatzeB This isn't about the scavenging of a register to use for large offsets, etc., but rather the scavenging of stack slots in the CSR save area to use for regular stack objects.

Oh sorry, I didn't notice there is another "scavenging" thing in there.

MatzeB resigned from this revision.Apr 18 2018, 10:42 AM

Ugh, good find. Isn't this an issue if variable sized stack objects are present as well?

The graphic (within the same file) showing the stack layout suggests a base pointer is used in such circumstances, so my guess is no.

Ugh, good find. Isn't this an issue if variable sized stack objects are present as well?

The graphic (within the same file) showing the stack layout suggests a base pointer is used in such circumstances, so my guess is no.

Yeah, I took a closer look and it appears that either the BP or FP will be used in this case, which should be okay.

I also took a look at fixing this failure by using the FP as a base for cases like this, and it turned out to be not too bad:

--- a/lib/Target/AArch64/AArch64FrameLowering.cpp
+++ b/lib/Target/AArch64/AArch64FrameLowering.cpp
@@ -1019,6 +1019,13 @@ int AArch64FrameLowering::resolveFrameIndexReference(const MachineFunction &MF,
   int FPOffset = MFI.getObjectOffset(FI) + FixedObject + 16;
   int Offset = MFI.getObjectOffset(FI) + MFI.getStackSize();
   bool isFixed = MFI.isFixedObjectIndex(FI);
+  bool isCSR = !isFixed && MFI.getObjectOffset(FI) >=
+                               -((int)AFI->getCalleeSavedStackSize());

   // Use frame pointer to reference fixed objects. Use it for locals if
   // there are VLAs or a dynamically realigned SP (and thus the SP isn't
@@ -1032,6 +1034,12 @@ int AArch64FrameLowering::resolveFrameIndexReference(const MachineFunction &MF,
     // Argument access should always use the FP.
     if (isFixed) {
       UseFP = hasFP(MF);
+    } else if (isCSR && RegInfo->needsStackRealignment(MF)) {
+      // References to the CSR area must use FP if we're re-aligning the stack
+      // since the dynamically-sized alignment padding is between the SP/BP and
+      // the CSR area.
+      assert(hasFP(MF) && "Re-aligned stack must have frame pointer");
+      UseFP = true;
     } else if (hasFP(MF) && !RegInfo->needsStackRealignment(MF)) {
       // If the FPOffset is negative, we have to keep in mind that the
       // available offset range for negative offsets is smaller than for
@@ -1065,9 +1078,9 @@ int AArch64FrameLowering::resolveFrameIndexReference(const MachineFunction &MF,
     }
   }

-  assert((isFixed || !RegInfo->needsStackRealignment(MF) || !UseFP) &&
+  assert(((isFixed || isCSR) || !RegInfo->needsStackRealignment(MF) || !UseFP) &&
          "In the presence of dynamic stack pointer realignment, "
-         "non-argument objects cannot be accessed through the frame pointer");
+         "non-argument/CSR objects cannot be accessed through the frame pointer");

   if (UseFP) {
     FrameReg = RegInfo->getFrameRegister(MF);

@t.p.northover should probably approve whichever approach we take since presumably he will need to approve it for 6.0.1 anyway.

I think @gberry's solution is actually the correct fix. I don't really see why we shouldn't use empty stack slots if we can.

I guess AFI->getCalleeSavedStackSize is enough to get the boundaries of the "CSR region".

I've put the full diff for my alternative fix here: D46063

I think I prefer Geoff's solution too. Better to make it work than just turn off parts of the optimizer, and it's not even particularly more difficult to read.

paulwalker-arm abandoned this revision.Apr 30 2018, 8:03 AM

I also prefer Geoff's solution. Thanks Geoff.