Page MenuHomePhabricator

[RISCV][PrologEpilogInserter] "Float" emergency spill slots to avoid making them immediately unreachable from the stack pointer
ClosedPublic

Authored by rogfer01 on Oct 12 2020, 7:44 AM.

Details

Summary

In RISC-V there is a single addressing mode of the form imm(reg) where imm is a signed integer of 12-bit with a range of [-2048..2047] bytes from reg.

The test MultiSource/UnitTests/C++11/frame_layout of the LLVM test-suite exercises several scenarios with the stack, including function call where the stack will need to be realigned to to a local variable having a large alignment of 4096 bytes.

In situations of large stacks, the RISC-V backend (in RISCVFrameLowering) reserves an extra emergency spill slot which can be used (if no free register is found) by the register scavenger after the frame indexes have been eliminated. PrologEpilogInserter already takes care of keeping the emergency spill slots as close as possible to the stack pointer or frame pointer (depending on what the function will use). However there is a final alignment step to honour the maximum alignment of the stack that, when using the stack pointer to access the emergency spill slots, has the side effect of setting them farther from the stack pointer.

In the case of the frame_layout testcase, the net result is that we do have an emergency spill slot but it is so far from the stack pointer (more than 2048 bytes due to the extra alignment of a variable to 4096 bytes) that it becomes unreachable via any immediate offset.

During elimination of the frame index, many (regular) offsets of the stack may be immediately unreachable already. Their address needs to be computed using a register. A virtual register is created and later RegisterScavenger should be able to find an unused (physical) register. However if no register is available, RegisterScavenger will pick a physical register and spill it onto an emergency stack slot, while we compute the offset (restoring the chosen register after all this). This assumes that the emergency stack slot is easily reachable (this is, without requiring another register!).

This is the assumption we seem to break when we perform the extra alignment in PrologEpilogInserter.

My understanding is that we can "float" the emergency spill slots by increasing (in absolute value) their offsets from the incoming stack pointer. This way the emergency spill slots will remain close to the stack pointer (once the function has allocated storage for the stack, including the needed realignment). The new size computed in PrologEpilogInserter is padding so it should be OK to move the emergency spill slots there. Also because we're increasing the alignment, the new location should stay aligned for the purpose of the emergency spill slots.

Note that this change also impacts other backends as shown by the tests. So far they seem minor adjustments to the emergency stack slot offset to me, making it even closer to the stack pointer than before, so they seem OK.

A final comment: this seems an issue that could impact any target. However this problem seems only easily reproducible in RISC-V. For some reason other targets (such as ARM or AArch64) always happen to find a free register and the emergency spill slot (if any) ends going unused. Why we are so unlucky in RISC-V I have not been able to determine yet but this seems some side-effect of other unrelated changes in CodeGen.

Diff Detail

Event Timeline

rogfer01 created this revision.Oct 12 2020, 7:44 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 12 2020, 7:44 AM
rogfer01 requested review of this revision.Oct 12 2020, 7:44 AM
rogfer01 updated this revision to Diff 297586.Oct 12 2020, 7:46 AM

ChangeLog:

  • Forgot to add the test for RISC-V

The code makes sense to me and it seems correct, but there may be assumptions about the way that mechanism operates that I'm not familiar with. Some comments:

  • Thanks for the thorough work for this patch and D89237. Much better than just tweaking the RISC-V stack spill limit, sweeping the issue under the rug and calling it a day!
  • It might be a good idea to add additional reviewers, to check the tests for the other targets?
  • Nit: could the RISC-V test be further minimized? It still seems a bit big/complex. I'm assuming that's the minimized version of the test suite file that was failing to compile. Maybe also add a small comment in that file explaining what's being tested and describing the origin of the test.
  • Could there be cases (due to future changes elsewhere) where these tests still pass but are no longer checking the conditions handled by the new code? If so, we could check that a debug run of the test prints one of the new debug strings (probably the one inside the loop). That's done in llvm/test/CodeGen/RISCV/disjoint.ll, by using REQUIRES: asserts and -debug-only=....
efriedma added inline comments.
llvm/lib/CodeGen/PrologEpilogInserter.cpp
1085

If the stack alignment is large enough that all immediate offsets from the stack pointer are stuck in padding, the performance of the resulting code is going to be terrible, period. Really, we want the gap somewhere between the allocas that require it, and the rest of the stack allocations. Changing the location of the scavenging slot, specifically, seems a bit hacky. Maybe it's good enough, though.

On a related note, I'm tempted to say we should add a mechanism to allow targets to cap stack realignment to something sane, to allow avoiding situations like this. (We currently have isStackRealignable(), but that completely disables realignment.)

Looking into the origin of this problem, git bisect points at cb82de29601 but I don't think it is the culprit. Looks like we may have been a bit unlucky during register allocation with the new sequence of instructions.

After cb82de29601, -stop-before=prologepilog shows we need a couple more of spill slots for RegAlloc. It is accessing to one of them what triggers the issue with the register scavenger and the emergency spill slot.

simoll added a subscriber: simoll.Nov 4 2020, 6:29 AM
rogfer01 updated this revision to Diff 303711.Nov 8 2020, 5:15 AM

ChangeLog:

  • Use llvm-reduce to reduce the test even further
  • Use update_llc_test_checks.py to reduce maintenance burden of CHECK lines
  • Add DEBUG checks so we can tell PEI is adjusting the offsets.

The code makes sense to me and it seems correct, but there may be assumptions about the way that mechanism operates that I'm not familiar with. Some comments:

  • It might be a good idea to add additional reviewers, to check the tests for the other targets?

Sure!

@sdesmalen @MatzeB mind to take a look at the AArch64 changes, they seem OK to me.

@arsenm is the AMDGPU change OK?

  • Nit: could the RISC-V test be further minimized? It still seems a bit big/complex. I'm assuming that's the minimized version of the test suite file that was failing to compile. Maybe also add a small comment in that file explaining what's being tested and describing the origin of the test.

I was able to reduce it a bit more with llvm-reduce. Not very noticeable but it was worth updating the test.

  • Could there be cases (due to future changes elsewhere) where these tests still pass but are no longer checking the conditions handled by the new code? If so, we could check that a debug run of the test prints one of the new debug strings (probably the one inside the loop). That's done in llvm/test/CodeGen/RISCV/disjoint.ll, by using REQUIRES: asserts and -debug-only=....

So I've done two things here:

  • to reduce the burden of maintaning the test I used update_llc_test_checks.py this time.
  • I included your suggestion of using the debug message plus a REQUIRES: asserts so we can ensure we are going through the specific new path of PEI.
sdesmalen added inline comments.Nov 9 2020, 6:25 AM
llvm/lib/CodeGen/PrologEpilogInserter.cpp
1092

Is this approach the same as setting the alignment of the last SFI to StackAlign?

llvm/test/CodeGen/RISCV/out-of-reach-emergency-slot.mir
2

This test seems quite substantial, is there no simpler IR to test this?

arichardson added inline comments.
llvm/test/CodeGen/RISCV/out-of-reach-emergency-slot.mir
2

This test can definitely be reduced a lot more. Also you could run just this pass and make it a MIR test instead.

rogfer01 added inline comments.Nov 13 2020, 12:53 AM
llvm/lib/CodeGen/PrologEpilogInserter.cpp
1092

Not sure I understood you comment.

This removes the additional offset that was introduced in line 1081 for all the SFIs, not only the last one.

llvm/test/CodeGen/RISCV/out-of-reach-emergency-slot.mir
2

The test is already an MIR test. The YAML representation contains a copy of the IR (which it is required to create essential data structures such as llvm::Function). I can remove most of it as I understand adds none to little information here.

I will look into how to reduce the large amount of machine instructions: part of the difficulty is making sure no register is left for the register scavenger when eliminating the frame index of the problematic stack slot (the one that used to be, before this change, too far from the top of the stack).

I'll look into this again because I think it is possible to do so.

arichardson added inline comments.Nov 13 2020, 5:14 AM
llvm/test/CodeGen/RISCV/out-of-reach-emergency-slot.mir
2

I meant using -run-pass=prologepilog and using update_mir_checks.py to generate the CHECK lines.

I just ran creduce over your test case and it gave me this:

# RUN: llc -mtriple riscv64 -start-before=prologepilog -o - -verify-machineinstrs %s
name:            a
stack:
  - { id: 0 , size: 8, alignment: 4096 }
body:             |
  bb.0:
    $= ADD $, $x10
    SD $x12, $, $x13
    $= DIVU $, $x30
    $= DIVU $, $x29
    $= DIV $, $x15
    $= DIV $, $x14
    $= DIV $, $x1
    $= DIV $, $x7
    $= DIV $, $x6
    $= DIV $, $x31
    $= DIV $, $x28
    $= DIV $, $x5
    $= DIV $, $x17
    $= DIV $, $x16
    $= DIV $x11, $

Feeding that into /bin/llc -mtriple riscv64 -start-before=prologepilog -o - -verify-machineinstrs -stop-before=prologepilog
yields this:

--- |
  ; ModuleID = '/local/scratch/alr48/cheri/build/llvm-project-build/test-reduce.mir'
  source_filename = "/local/scratch/alr48/cheri/build/llvm-project-build/test-reduce.mir"
  target datalayout = "e-m:e-p:64:64-i64:64-i128:128-n64-S128"
  target triple = "riscv64"

  define void @a() {
  entry:
    unreachable
  }

...
---
name:            a
alignment:       1
exposesReturnsTwice: false
legalized:       false
regBankSelected: false
selected:        false
failedISel:      false
tracksRegLiveness: false
hasWinCFI:       false
registers:       []
liveins:         []
frameInfo:
  isFrameAddressTaken: false
  isReturnAddressTaken: false
  hasStackMap:     false
  hasPatchPoint:   false
  stackSize:       0
  offsetAdjustment: 0
  maxAlignment:    4096
  adjustsStack:    false
  hasCalls:        false
  stackProtector:  ''
  maxCallFrameSize: 4294967295
  cvBytesOfCalleeSavedRegisters: 0
  hasOpaqueSPAdjustment: false
  hasVAStart:      false
  hasMustTailInVarArgFunc: false
  localFrameSize:  0
  savePoint:       ''
  restorePoint:    ''
fixedStack:      []
stack:
  - { id: 0, name: '', type: default, offset: 0, size: 8, alignment: 4096,
      stack-id: default, callee-saved-register: '', callee-saved-restored: true,
      debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
callSites:       []
constants:       []
machineFunctionInfo: {}
body:             |
  bb.0:
    $noreg = ADD $noreg, $x10
    SD $x12, $noreg, $x13
    $noreg = DIVU $noreg, $x30
    $noreg = DIVU $noreg, $x29
    $noreg = DIV $noreg, $x15
    $noreg = DIV $noreg, $x14
    $noreg = DIV $noreg, $x1
    $noreg = DIV $noreg, $x7
    $noreg = DIV $noreg, $x6
    $noreg = DIV $noreg, $x31
    $noreg = DIV $noreg, $x28
    $noreg = DIV $noreg, $x5
    $noreg = DIV $noreg, $x17
    $noreg = DIV $noreg, $x16
    $noreg = DIV $x11, $noreg

...
rogfer01 added inline comments.Nov 14 2020, 6:11 AM
llvm/test/CodeGen/RISCV/out-of-reach-emergency-slot.mir
2

Thanks @arichardson for the reduced testcase.

However, I'm not sure we'd be testing the right thing here. We need a frame index that when eliminated has to use the emergency spill slot. Something like the SD instruction in line 500:

SD killed renamable $x12, %stack.1, 0

Maybe I'm missing something.

arichardson added inline comments.Nov 14 2020, 6:14 AM
llvm/test/CodeGen/RISCV/out-of-reach-emergency-slot.mir
2

Yeah it probably needs something like this.

The reduced testcase I posted here was just what creduce gave me for the error message that I see when compiling the LLVM test suite right now: LLVM ERROR: Incomplete scavenging after 2nd pass

@rogfer01 the changes to the aarch64 tests seem fine to me.

llvm/lib/CodeGen/PrologEpilogInserter.cpp
1092

You're right, these are all allocated individually, so changing the alignment of the last SFI would indeed only affect the position of that one.

I guess I was thinking along the lines to what is done for the Local stack allocation block. i.e. allocate the SFIs together as one block (in this case aligned to the stack alignment), and then adjust the individual offsets for each of the SFIs? That would place the logic for this in one place. Now it first assigns offsets for the slots, then calculates stack alignment, and then fixes up the offsets based on the stack alignment. Is that something you have considered?

rogfer01 updated this revision to Diff 315634.Jan 9 2021, 1:29 PM

ChangeLog:

  • Largely simplify the testcase by making a call that uses all the eligible registers forcing the RegisterScavenger to use the emergency spill.
rogfer01 added inline comments.Jan 9 2021, 1:58 PM
llvm/lib/CodeGen/PrologEpilogInserter.cpp
1092

Hi Sander, thanks for the suggestion. I hadn't considered it. I understand you suggest to do the adjustment earlier, in the loop in line 1053.

However I took a look and so far it does not seem an obvious thing to me. It seems I still have to take into account things like !TFI.targetHandlesStackFrameRounding() and MFI.adjustsStack() || MFI.hasVarSizedObjects(), etc. Also MaxAlign is passed by reference to AdjustStackoffset and may change during the loop. I am afraid it would make the code a messier. I could do that after the loop (before 1056) but I still have to copy some of the checks and calculations that happen in the body of 1058.

sdesmalen accepted this revision.Jan 19 2021, 3:37 AM

LGTM, thanks @rogfer01!

llvm/lib/CodeGen/PrologEpilogInserter.cpp
1083

nit: constraints

1092

My thought was more to create a new single object that comprises all SFIs, that is either allocated early or late, possibly adjusted for stack alignment. And then have one loop that allocates the individual SFIs 'within' that object, by setting offsets based on the encompassing object's start address. But thinking this through, it probably wouldn't be much of an improvement to the current mechanism, as it still requires allocating an object and adjusting it's offset.

Apologies if I have stalled the patch with this suggestion.

This revision is now accepted and ready to land.Jan 19 2021, 3:37 AM