This is an archive of the discontinued LLVM Phabricator instance.

Fix AArch64 prologue for empty frame with dynamic allocas
ClosedPublic

Authored by eugenis on Jun 29 2015, 5:22 PM.

Details

Summary

Fixes PR23804: assertion failure in emitPrologue in the case of a function with an empty frame and a dynamic alloca that needs stack realignment.
This is a typical case for AddressSanitizer.

Diff Detail

Event Timeline

eugenis updated this revision to Diff 28734.Jun 29 2015, 5:22 PM
eugenis retitled this revision from to Fix AArch64 prologue for empty frame with dynamic allocas.
eugenis updated this object.
eugenis edited the test plan for this revision. (Show Details)
eugenis added a reviewer: kristof.beyls.
eugenis set the repository for this revision to rL LLVM.
eugenis added a subscriber: Unknown Object (MLST).

I think there's more going on here that needs fixing too. For example if I add a simple "%tmp = alloca i8, i32 4" to your entry block I get the same assertion failure David mentioned in the PR (even with your patch).

To me it looks like NeedsRealignment is misdefined, though that might not be the end of it. Elsewhere we used RegInfo->needsStackRealignment(MF), which looks like it would be an improvement, but doesn't actually fix the issue.

Another possible problem in this area is that if NumBytes != 0, we end up realigning the stack twice: once in the prologue and once at the alloca. That's arguably a separate performance problem though.

Cheers.

Tim.

I think there's more going on here that needs fixing too. For example if I add a simple "%tmp = alloca i8, i32 4" to your entry block I get the same assertion failure David mentioned in the PR (even with your patch).

To me it looks like NeedsRealignment is misdefined, though that might not be the end of it. Elsewhere we used RegInfo->needsStackRealignment(MF), which looks like it would be an improvement, but doesn't actually fix the issue.

Another possible problem in this area is that if NumBytes != 0, we end up realigning the stack twice: once in the prologue and once at the alloca. That's arguably a separate performance problem though.

Good point, thanks. This particular case is easy to fix by reusing the base register for scratchSPReg. This matches register reservation code in processFunctionBeforeCalleeSavedScan.

RegInfo->needsStackRealignment is also true in this case, and it has roughly the same meaning as getMaxAlignment, i.e. it does not take alloca scopes into account. This problem (extra re-alignment) is also present on i686 and x86_64 targets.

eugenis updated this revision to Diff 28822.Jun 30 2015, 3:53 PM
eugenis removed rL LLVM as the repository for this revision.

PTAL

kristof.beyls edited edge metadata.Jul 9 2015, 1:16 PM

I think the logic of this patch is roughly right, I only have a few minor comments.
After addressing those comments, this LGTM.

lib/Target/AArch64/AArch64FrameLowering.cpp
352

This should become

const bool NeedsRealignment = RegInfo->needsStackRealignment(MF);

That's in line with what other backends do and respects the DRY principle more.

356–358

I think slightly nicer/better logic is

if (RegInfo->hasBasePointer(MF))
  scratchSPReg = RegInfo->getBaseRegister();
else
  scratchSPReg = AArch64::X9;
assert(MF.getRegInfo().isPhysRegUsed(scratchSPReg) &&
       "No scratch register to align SP!");

With the above code, a few small tweaks to the tests in test/CodeGen/AArch64/aarch64-dynamic-stack-layout.ll are needed (to use the base pointer X19 instead of X9 for a few tests). When both X9 and X19 are available as scratch registers it's an arbitrary decision which one to use.

test/CodeGen/AArch64/aarch64-dynamic-stack-layout-pr23804.ll
1 ↗(On Diff #28822)

I don't think there is a need to create a separate test file just for pr23804, these test cases can be added to test/CodeGen/AArch64/aarch64-dynamic-stack-layout.ll

t.p.northover added inline comments.Jul 9 2015, 2:52 PM
lib/Target/AArch64/AArch64FrameLowering.cpp
356–358

Can't we just nick any temporary register we please (I think the original comment miscategorises X9) and make sure it's used?

scratchSPReg = AArch64::X9;
MF.getRegInfo().setPhysRegUsed(scratchSPReg); // Take that!
eugenis added inline comments.Jul 9 2015, 3:11 PM
lib/Target/AArch64/AArch64FrameLowering.cpp
356–358

The caller in PrologEpilogInserter.cpp does

TFI->processFunctionBeforeCalleeSavedScan(Fn, RS);

// Scan the function for modified callee saved registers and insert spill code
// for any callee saved registers that are modified.
calculateCalleeSavedRegisters(Fn);

...

if (!F->hasFnAttribute(Attribute::Naked))
  insertPrologEpilogCode(Fn);

So it looks like it is too late to add used regisgters at this point (in emitPrologue).

eugenis updated this revision to Diff 29401.Jul 9 2015, 3:19 PM
eugenis edited edge metadata.

Addressed reviewer's comments.

eugenis marked 2 inline comments as done.Jul 9 2015, 3:20 PM
t.p.northover added inline comments.Jul 9 2015, 3:23 PM
lib/Target/AArch64/AArch64FrameLowering.cpp
354–358

But X9 isn't callee-saved (AAPCS says it's temporary), so I don't think we need to worry about stack layouts: it's basically dead at this point.

Unless there's some assertion that *everything* is fixed by now?

eugenis added inline comments.Jul 9 2015, 3:33 PM
lib/Target/AArch64/AArch64FrameLowering.cpp
354–358

Well, all tests pass.
Would you like me to switch to your variant?

t.p.northover added inline comments.Jul 9 2015, 4:25 PM
lib/Target/AArch64/AArch64FrameLowering.cpp
354–358

It seems like the simplest variant so far. I'd say so, unless Kristof objects.

kristof.beyls added inline comments.Jul 10 2015, 1:33 AM
lib/Target/AArch64/AArch64FrameLowering.cpp
354–358

I can't think of a reason not to do the simpler variant (always using X9 as the scratch register) - so no objections from my side.

eugenis updated this revision to Diff 29480.Jul 10 2015, 1:13 PM
eugenis marked an inline comment as done.

PTAL

I've only got two very minor suggestions for improvements.
With those improvements, LGTM!

lib/Target/AArch64/AArch64FrameLowering.cpp
354–358

I'd keep the comment

// Use the first callee-saved register as a scratch register

otherwise looks ok.

test/CodeGen/AArch64/aarch64-dynamic-stack-layout.ll
525–526

This comment isn't correct anymore in the newest version of the patch. I would just remove the comment.

eugenis updated this revision to Diff 29495.Jul 10 2015, 2:11 PM

updated comments

Committed as r241943. Thanks for the review!

eugenis accepted this revision.Jul 10 2015, 2:26 PM
eugenis added a reviewer: eugenis.
This revision is now accepted and ready to land.Jul 10 2015, 2:26 PM
eugenis closed this revision.Jul 10 2015, 2:26 PM