This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE] Ensure hasFP has a consistent return value
AbandonedPublic

Authored by bsmith on Feb 22 2021, 7:15 AM.

Details

Summary

Do the necessary FP profitability checks once, in a new pre CSR spilling
function, referring to that in hasFP. Also ensure that hasFP returns true
prior to this calculation.

This fixes various codegen issues caused by an inconsistent return value
of hasFP, such as PR49217.

Diff Detail

Event Timeline

bsmith created this revision.Feb 22 2021, 7:15 AM
bsmith requested review of this revision.Feb 22 2021, 7:15 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 22 2021, 7:15 AM
sdesmalen added inline comments.Feb 24 2021, 3:29 AM
llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
377

nit: maybe add a TODO/FIXME comment to say that this approach is probably temporary and may be removed in the future when LocalStackSlotAllocation works on multiple StackIDs. (as that will have created virtual base pointers, so there's no long the need to always force the availability of the FP).

2957

Given that the approach is probably temporary, does it necessarily need a new generic callback, or could this be called from early-on in determineCalleeSaves?

2961

This can probably do with an early bail-out if it is not compiling for SVE.

llvm/test/CodeGen/AArch64/sve-calling-convention-mixed.ll
45

Why does this function not require the frame-pointer? (it has both locals for passing GPRs and ZPRs)

efriedma added inline comments.Feb 24 2021, 2:14 PM
llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
383

My team was testing out this patch internally, and this mechanism doesn't work quite as expected.

Specifically, changing the value of hasFP() after register allocation has a bad interaction with register scavenging. If hasFP() returns false, the frame lowering code can decide to reserve FP as the emergency spill, so we don't allocate an emergency spill slot. But register scavenging doesn't realize this: it queries MachineRegisterInfo::isReserved(), which is cached from before register allocation. So in rare cases, scavenging can't find a register, and we can hit the fatal error "Cannot scavenge register without an emergency spill slot!"

One possible solution here is to change AArch64RegisterInfo::getReservedRegs() so it doesn't depend on hasCalculatedSVEShouldUseFP(), so the reserved registers don't change after regalloc.

Currently don't have a testcase I can share; let me know if you need one, and I'll try to come up with something.

bsmith updated this revision to Diff 327136.Mar 1 2021, 9:04 AM
bsmith marked 3 inline comments as done.
  • Always explicitly reserve x29 when building for SVE
  • Add comments to indicate the fact that this is a temporary change
  • Add early bailout when not building for SVE
llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
2957

Ideally probably not, however determineCalleeSaves is actually called from more that just PrologEpilogInserter, hence it makes more sense to call this directly from PrologEpilogInserter rather than determineCalleeSaves.

llvm/test/CodeGen/AArch64/sve-calling-convention-mixed.ll
45

Unfortunately I think this is just going to have to be a deficiency in this approach and will have to get fixed when done properly.

The issue here is that when processFunctionBeforeCalleeSpill() is called the stack has only one object on it (the SVE one), the non-SVE objects only appear during assignCalleeSavedSpillSlots() in PrologEpilogInserter, which is done after determineCalleeSaves(). The calculation of whether an FP should be used can't be delayed until after this point otherwise you end up with rogue frame pointer setup where it isn't used.

bsmith updated this revision to Diff 327417.Mar 2 2021, 5:26 AM
  • Fix new test that appeared after rebase.
bsmith abandoned this revision.Mar 10 2021, 6:59 AM

Given this approach has other downsides (namely reserving x29 all the time for SVE), I think instead of doing this I'll revert the previous patch this fixes, and wait until we have a better approach for this.