This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Reserve an emergency spill slot for any RVV spills
ClosedPublic

Authored by frasercrmck on May 27 2021, 10:44 AM.

Details

Summary

This patch addresses an issue in which fixed-length (VLS) vector RVV
code could fail to reserve an emergency spill slot for their frame index
elimination. This is because we were previously only reserving a spill
slot when there were scalable-vector frame indices being used.
However, fixed-length codegen uses regular-type frame indices if it
needs to spill.

This patch does the fairly brute-force method of checking ahead of time
whether the function contains any RVV spill instructions, in which case
it reserves one slot. Note that the second RVV slot is still only
reserved for scalable-vector frame indices.

This unfortunately causes quite a bit of churn in existing tests, where
we chop and change stack offsets for spill slots.

Diff Detail

Event Timeline

frasercrmck created this revision.May 27 2021, 10:44 AM
frasercrmck requested review of this revision.May 27 2021, 10:44 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 27 2021, 10:44 AM

@HsiangKai I believe you said in D100574 that you'd identified this issue in your downstream testing?

craig.topper added inline comments.May 27 2021, 10:49 AM
llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
1443

Is this checking for any RVV instruction rather than just spills?

frasercrmck added inline comments.May 27 2021, 10:59 AM
llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
1443

Ah yes, it must be getting late. I hoisted it out of those tasteful lambdas where it was also checking the presence of FI operands. Even that isn't technically just checking for spills since it could and should detect a stack reference in a vadd.vx (for whatever reason).

Do you think this method sufficiently encapsulates something like isRVVInstruction and I can just rename it to that?

  • rebase
  • try to find good names for the various helper functions
HsiangKai added a comment.EditedJun 1 2021, 11:07 PM

In our downstream implementation, we do not go through SelectBaseAddr for vector load/store to generate ADDI for the base address. In this way, we have no need to reserve one additional scavenging slot for vector load/store.

Some snippet looks like this.

// Load
-  def : Pat<(vti.Vector (riscv_vle_vl BaseAddr:$rs1, VLOpFrag)),
-            (load_instr BaseAddr:$rs1, GPR:$vl, vti.Log2SEW)>;
+  def : Pat<(vti.Vector (riscv_vle_vl GPR:$rs1, VLOpFrag)),
+            (load_instr GPR:$rs1, GPR:$vl, vti.Log2SEW)>;
   // Store
-  def : Pat<(riscv_vse_vl (vti.Vector vti.RegClass:$rs2), BaseAddr:$rs1,
+  def : Pat<(riscv_vse_vl (vti.Vector vti.RegClass:$rs2), GPR:$rs1,
                           VLOpFrag),
-            (store_instr vti.RegClass:$rs2, BaseAddr:$rs1, GPR:$vl, vti.Log2SEW)>;
+            (store_instr vti.RegClass:$rs2, GPR:$rs1, GPR:$vl, vti.Log2SEW)>;

The drawback is it will generate redundant addi xa, xb, 0 if the offset is 0.

-; LMULMAX2-RV32-NEXT:    vsetivli a1, 8, e16,m1,ta,mu
-; LMULMAX2-RV32-NEXT:    vle16.v v25, (sp)
+; LMULMAX2-RV32-NEXT:    mv a1, sp
+; LMULMAX2-RV32-NEXT:    vsetivli a2, 8, e16,m1,ta,mu
+; LMULMAX2-RV32-NEXT:    vle16.v v25, (a1)

Your implementation is the most precise way to detect the vector load/store frame objects. It scans all the instructions and all the operands in the instruction to detect it.
What is the impact of the compile time if we scan all these instructions in the frame lowering?

In our downstream implementation, we do not go through SelectBaseAddr for vector load/store to generate ADDI for the base address. In this way, we have no need to reserve one additional scavenging slot for vector load/store.

Don't we also set a flag in RISCVMachineFunctionInfo when we emit a spill/reload?

Ah that's interesting, thanks both for the input. I've actually done both of those things in other backends and wasn't a huge fan of either approach. I found the flag to be quite brittle and didn't like how we had to clean up the redundant addressing instructions to get the best code.

I don't anticipate a large increase in compile time, since it's only called once per function and should have early exits. One thing I forgot to include was that hasRVVSpillWithFIs can (presumably) safely return false if !Subtarget.hasStdExtV(). That will limit the impact on non-V code. The flag would no doubt be faster, of course, in terms of compile time. I can run some tests on the larger kernels we're producing if you like.

Ah that's interesting, thanks both for the input. I've actually done both of those things in other backends and wasn't a huge fan of either approach. I found the flag to be quite brittle and didn't like how we had to clean up the redundant addressing instructions to get the best code.

I don't anticipate a large increase in compile time, since it's only called once per function and should have early exits. One thing I forgot to include was that hasRVVSpillWithFIs can (presumably) safely return false if !Subtarget.hasStdExtV(). That will limit the impact on non-V code. The flag would no doubt be faster, of course, in terms of compile time. I can run some tests on the larger kernels we're producing if you like.

Add Subtarget.hasStdExtV() checking to avoid the impact on non-V code is a good idea.

Add Subtarget.hasStdExtV() checking to avoid the impact on non-V code is a good idea.

An interesting case shows up where the MIR test I added in this patch contains RVV instructions but doesn't pass -mattr=+experimental-v so the subtarget reports that hasStdExtV is false. It's presumably the user's responsibility to correctly pass mattr even with MIR. Though through this I've identified other tests like rvv/zvlsseg-copy.mir which don't pass -mattr either.

Assuming I've got this right, we should go through the MIR tests and make sure they pass the correct attrs. Something to keep in mind for future reviews.

  • rebase
  • early-exit for non-V functions
  • update test checks
HsiangKai added inline comments.Jun 2 2021, 8:04 AM
llvm/test/CodeGen/RISCV/rvv/fixed-vectors-emergency-slot.mir
2

I am not sure what is the difference between update_llc_test_checks.py and update_mir_test_checks.py. I see most of the MIR tests are updated by update_mir_test_checks.py. Just curious why this file is updated by update_llc_test_checks.py.

frasercrmck added inline comments.Jun 2 2021, 8:07 AM
llvm/test/CodeGen/RISCV/rvv/fixed-vectors-emergency-slot.mir
2

I'm not 100% on this, but I believe update_mir_test_checks.py will only add checks to the MIR, like when coming from -run-pass or -stop-after. You still need update_llc_test_checks.py if you want to check the asm.

This particular test was copied from out-of-each-emergency-slot.mir which checks the resulting asm.

HsiangKai added inline comments.Jun 2 2021, 8:13 AM
llvm/test/CodeGen/RISCV/rvv/fixed-vectors-emergency-slot.mir
2

Got it. Thanks for your explanation.

LGTM. Let @craig.topper review it again.

This revision is now accepted and ready to land.Jun 2 2021, 11:28 AM
jrtc27 added inline comments.Jun 2 2021, 11:34 AM
llvm/test/CodeGen/RISCV/rvv/fixed-vectors-emergency-slot.mir
2

Yeah, it's about what type of output llc is producing, as the different tools use different patterns to capture the relevant output lines for CHECK lines. So IR->MIR and MIR->MIR use mir, whereas IR->asm and MIR->asm use llc. The confusion stems from the fact that, unlike C->IR->IR->asm, where you have clear cc1/opt/llc steps, llc gets used for both IR and MIR production. The names could perhaps be clearer, but I suspect update_mir_test_checks.py came after update_llc_test_checks.py already existed (see also update_test_checks.py which is for opt, that I assume came before all the others, and so has an even more confusing name; ideally it'd be a smart wrapper around all the others, with a dedicated update_opt_test_checks.py...).

HsiangKai added inline comments.Jun 3 2021, 1:51 AM
llvm/test/CodeGen/RISCV/rvv/fixed-vectors-emergency-slot.mir
2

@jrtc27, thanks for your detail explanation. It is very clear for me.

  • rebase
  • let the build bots have a go at it