Page MenuHomePhabricator

[RISCV] Fix missing emergency slots for scalable stack offsets
ClosedPublic

Authored by frasercrmck on Thu, Apr 15, 9:12 AM.

Details

Summary

This patch adds an additional emergency spill slot to RVV code. This is
required as RVV stack offsets may require an additional register to compute.

This patch includes an optimization by @HsiangKai <kai.wang@sifive.com>
to reduce the number of registers required for the computation of stack
offsets from 3 to 2. Otherwise we'd need two additional emergency spill
slots.

Diff Detail

Event Timeline

frasercrmck created this revision.Thu, Apr 15, 9:12 AM
frasercrmck requested review of this revision.Thu, Apr 15, 9:12 AM
Herald added a project: Restricted Project. · View Herald TranscriptThu, Apr 15, 9:12 AM
frasercrmck added inline comments.
llvm/test/CodeGen/RISCV/rvv/emergency-slot.mir
93

From what I can tell, we need to scavenge three registers to compute the stack offsets at this point. The scalable offsets certainly require two. After adding two extra slots, the code generated is:

renamable $x13 = nsw ADDI renamable $x16, -2
$x5 = LUI 1
$x9 = ADDIW killed $x5, -1896
$x9 = ADD $x2, killed $x9
$x1 = PseudoReadVLENB
$x5 = ADDI $x0, 50
$x1 = MUL killed $x1, killed $x5
$x5 = LD $x2, 8 :: (load 8 from %stack.17)
$x9 = ADD killed $x9, killed $x1
$x1 = LD $x2, 16 :: (load 8 from %stack.16)
renamable $v0 = PseudoVRELOAD_M1 killed $x9 :: (load unknown-size from %stack.1, align 8)

I was wondering if I've missed something in my analysis. Perhaps @rogfer01, @StephenFan or @Hsiang-Kai know more?

craig.topper added inline comments.Thu, Apr 15, 9:19 AM
llvm/test/CodeGen/RISCV/rvv/emergency-slot.mir
93

From what I can tell, we need to scavenge three registers to compute the stack offsets at this point. The scalable offsets certainly require two. After adding two extra slots, the code generated is:

renamable $x13 = nsw ADDI renamable $x16, -2
$x5 = LUI 1
$x9 = ADDIW killed $x5, -1896
$x9 = ADD $x2, killed $x9
$x1 = PseudoReadVLENB
$x5 = ADDI $x0, 50
$x1 = MUL killed $x1, killed $x5
$x5 = LD $x2, 8 :: (load 8 from %stack.17)
$x9 = ADD killed $x9, killed $x1
$x1 = LD $x2, 16 :: (load 8 from %stack.16)
renamable $v0 = PseudoVRELOAD_M1 killed $x9 :: (load unknown-size from %stack.1, align 8)

I was wondering if I've missed something in my analysis. Perhaps @rogfer01, @StephenFan or @Hsiang-Kai know more?

I believe we found this same issue internally. @HsiangKai has patch in our downstream branch.

  • add bruteforce "solution" of requiring two extra spill slots
frasercrmck added inline comments.Thu, Apr 15, 9:28 AM
llvm/test/CodeGen/RISCV/rvv/emergency-slot.mir
93

Ah that's good to know, thanks. I've updated this patch with what can be used to get the test passing, so it'll be interesting to see if there's a cleverer way.

And sorry for that incorrect hyphenated ping. It comes up first in the autocomplete suggestions.

Our downstream contains this additional change to reduce the number of virtual registers created

--- a/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
@@ -1353,33 +1353,32 @@ Register RISCVInstrInfo::getVLENFactoredAmount(MachineFunction &MF,
   DebugLoc DL = II->getDebugLoc();
   int64_t NumOfVReg = Amount / 8;

-  Register SizeOfVector = MRI.createVirtualRegister(&RISCV::GPRRegClass);
-  BuildMI(MBB, II, DL, TII->get(RISCV::PseudoReadVLENB), SizeOfVector);
-  Register FactorRegister = MRI.createVirtualRegister(&RISCV::GPRRegClass);
+  Register VL = MRI.createVirtualRegister(&RISCV::GPRRegClass);
+  BuildMI(MBB, II, DL, TII->get(RISCV::PseudoReadVLENB), VL);
   assert(isInt<12>(NumOfVReg) &&
          "Expect the number of vector registers within 12-bits.");
   if (isPowerOf2_32(NumOfVReg)) {
     uint32_t ShiftAmount = Log2_32(NumOfVReg);
     if (ShiftAmount == 0)
-      return SizeOfVector;
-    BuildMI(MBB, II, DL, TII->get(RISCV::SLLI), FactorRegister)
-        .addReg(SizeOfVector, RegState::Kill)
+      return VL;
+    BuildMI(MBB, II, DL, TII->get(RISCV::SLLI), VL)
+        .addReg(VL)
         .addImm(ShiftAmount);
   } else {
-    Register VN = MRI.createVirtualRegister(&RISCV::GPRRegClass);
-    BuildMI(MBB, II, DL, TII->get(RISCV::ADDI), VN)
+    Register N = MRI.createVirtualRegister(&RISCV::GPRRegClass);
+    BuildMI(MBB, II, DL, TII->get(RISCV::ADDI), N)
         .addReg(RISCV::X0)
         .addImm(NumOfVReg);
     if (!MF.getSubtarget<RISCVSubtarget>().hasStdExtM())
       MF.getFunction().getContext().diagnose(DiagnosticInfoUnsupported{
           MF.getFunction(),
           "M-extension must be enabled to calculate the vscaled size/offset."});
-    BuildMI(MBB, II, DL, TII->get(RISCV::MUL), FactorRegister)
-        .addReg(SizeOfVector, RegState::Kill)
-        .addReg(VN, RegState::Kill);
+    BuildMI(MBB, II, DL, TII->get(RISCV::MUL), VL)
+        .addReg(N, RegState::Kill)
+        .addReg(VL);
   }

-  return FactorRegister;
+  return VL;
 }

 Optional<std::pair<unsigned, unsigned>>
HsiangKai added inline comments.Thu, Apr 15, 6:34 PM
llvm/test/CodeGen/RISCV/rvv/emergency-slot.mir
93

Ah that's good to know, thanks. I've updated this patch with what can be used to get the test passing, so it'll be interesting to see if there's a cleverer way.

And sorry for that incorrect hyphenated ping. It comes up first in the autocomplete suggestions.

The hyphenated @Hsiang-Kai may also be mine. I already forgot the password and I have no idea how to delete it. :(

HsiangKai added a comment.EditedThu, Apr 15, 6:43 PM

Sorry for not upstreaming this fix when I found it. I am busy on testing/debugging our downstream version recently. I think you need to update several tests after we reserved two more scavenging slots.

Thanks for the help. Do you want to create your own patch, or shall I incorporate your patch to reduce the number of registers into this (with attribution)?

Thanks for the help. Do you want to create your own patch, or shall I incorporate your patch to reduce the number of registers into this (with attribution)?

Feel free to put the virtual register reduction into this patch. Thanks.

HsiangKai added inline comments.Fri, Apr 16, 7:46 PM
llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
858

I found a bug in our downstream testing about the condition. We may need to change the condition to

const auto &STI = MF.getSubtarget<RISCVSubtarget>();
...
if (!isInt<11>(MFI.estimateStackSize(MF)) || STI.hasStdExtV())

The reason is we will use VLA instructions for VLS data set. If all the data are VLS types, RVVStackSize will be 0. However, we still need one scavenging slot because vector load/store have no immediate offset field in the instructions.

frasercrmck retitled this revision from [RISCV][WIP] Fix missing emergency slots for scalable stack offsets to [RISCV] Fix missing emergency slots for scalable stack offsets.Mon, Apr 19, 3:45 AM
frasercrmck edited the summary of this revision. (Show Details)
frasercrmck added inline comments.Mon, Apr 19, 3:50 AM
llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
858

That makes sense. Are we able to detect that case upfront, though? I'd hope that maybe we'd be able to distinguish a known stack size of 0 vs an "unknown" stack size with VLS types. Something conceptually like an Optional<int64_t> RVVStackSize. Then the condition would remain the same since None != 0.

Anyway, maybe that should be a separate patch with its own test case(s)?

HsiangKai added inline comments.Mon, Apr 19, 5:42 PM
llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
858

It makes sense to me. It should be a separate patch with test cases.

HsiangKai added inline comments.Mon, Apr 19, 5:56 PM
llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
863

What if !isInt<11>(MFI.estimateStackSize(MF)) and RVVStackSize != 0 occurred in the same function, do we need three or two emergency spill slot in total?

llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
1391

Sorry, it is my fault. We may have no need to switch the order of the operands.
Let's keep the order to reduce the difference in the test cases.

BuildMI(MBB, II, DL, TII->get(RISCV::MUL), VL)
    .addReg(VL, RegState::Kill)
    .addReg(N, RegState::Kill);
  • restore MUL operand order: reduce test noise
frasercrmck marked an inline comment as done.Tue, Apr 20, 12:55 AM
frasercrmck added inline comments.
llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
863

I had assumed the worst case would still be two, but I'll run some tests.

llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
1391

Makes sense, the diff is better like this. Thanks

frasercrmck marked 2 inline comments as done.Tue, Apr 20, 1:00 AM
frasercrmck added inline comments.
llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
863

In my simple testing, when both are true it's still two slots. I don't know if there's a more targeted test we can use for that (e.g. if fp vs sp would change things)

HsiangKai accepted this revision.Tue, Apr 20, 1:41 AM

LGTM.

llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
863

I think it is fine with this patch.

This revision is now accepted and ready to land.Tue, Apr 20, 1:41 AM
frasercrmck marked an inline comment as done.Tue, Apr 20, 3:14 AM
frasercrmck added inline comments.
llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
863

I have actually just found this case, just a bit too late. I think it's possible to optimize it so that it still requires only two registers. I'll post up a follow-up patch shortly.