This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Fold vmerge into its ops with smaller VL if known
ClosedPublic

Authored by luke on Jul 12 2023, 4:30 AM.

Details

Summary

Currently when folding vmerge into its operands, we stop if the VLs aren't
identical. However since the body of (vmerge (vop)) is the intersection of
vmerge and vop's bodies, we can use the smaller of the two VLs if we know it
ahead of time. This patch relaxes the constraint on VL if they are both
constants, or if either of them are VLMAX.

Diff Detail

Event Timeline

luke created this revision.Jul 12 2023, 4:30 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 12 2023, 4:30 AM
luke requested review of this revision.Jul 12 2023, 4:30 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 12 2023, 4:30 AM
fakepaper56 added inline comments.Jul 12 2023, 5:37 PM
llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
3338

Is the below code more concise?

if (auto *CLHS = dyn_cast<ConstantSDNode>(LHS))
  if (auto *CRHS = dyn_cast<ConstantSDNode>(RHS))
luke updated this revision to Diff 539986.Jul 13 2023, 6:04 AM
luke marked an inline comment as done.

Address review comment

llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
3338

Yeah that's much more readable, thanks!

reames requested changes to this revision.Jul 13 2023, 1:28 PM

Generally seems correct, but enough minor comments that I want to see a revision before giving an LGTM.

llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
3331–3348

Naming: GetMinVL

3338

Suggestion:

auto *CLHS = dyn_cast<ConstantSDNode>(LHS);
auto *CRHS = dyn_cast<ConstantSDNode>(RHS);
if (!CRHS || !CLHS)
  return SDValue();
return CLHS->getZExtValue() <= CRHS->getZExtValue() ? LHS : RHS;
3344

Nit pick - have the same merge operand *or the merge on the True instruction doesn't exist and is thus undef*.

3345

This comment is awfully verbose, and I don't think the example helps much.

3354

The comment change needs reverted.

llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vselect.ll
250–251

Totally off topic for this review, but we can do way better for odd sized mask loads. The scalarization code here already touches all bits in the byte, we can just do a single byte vector load and mask off the high bits directly.

Not sure we care, just noticed it as I was looking at this test.

This revision now requires changes to proceed.Jul 13 2023, 1:28 PM
luke added inline comments.Jul 14 2023, 2:46 AM
llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
3344

My mental model of this is that we need to have the same merge operand for both, but if the merge operand on True is undefined or missing, then we assume it's the same merge operand as N. Could we make this reasoning more explicit in the preceding code?

3345

After reading this the next day, I agree. I think I was using the comments as a mental scratch pad

luke added inline comments.Jul 14 2023, 3:14 AM
llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
3344

Something like:

diff --git a/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp b/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
index d956b303e584..6a084efa0a8f 100644
--- a/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
@@ -3245,18 +3245,20 @@ bool RISCVDAGToDAGISel::performCombineVMergeAndVOps(SDNode *N) {
   if (!Info)
     return false;
 
-  if (HasTiedDest && !isImplicitDef(True->getOperand(0))) {
-    // The vmerge instruction must be TU.
-    // FIXME: This could be relaxed, but we need to handle the policy for the
-    // resulting op correctly.
-    if (isImplicitDef(Merge))
-      return false;
-    SDValue MergeOpTrue = True->getOperand(0);
-    // Both the vmerge instruction and the True instruction must have the same
-    // merge operand.
-    if (False != MergeOpTrue)
-      return false;
-  }
+  SDValue TrueMerge = HasTiedDest ? True->getOperand(0) : False;
+  if (isImplicitDef(TrueMerge))
+    TrueMerge = False;
+
+  // Both the vmerge instruction and the True instruction must have the same
+  // merge operand.
+  if (TrueMerge != False)
+    return false;
+
+  // The vmerge instruction must be TU.
+  // FIXME: This could be relaxed, but we need to handle the policy for the
+  // resulting op correctly.
+  if (isImplicitDef(Merge) && !isImplicitDef(TrueMerge))
+    return false;
 
   if (IsMasked) {
     assert(HasTiedDest && "Expected tied dest");
luke updated this revision to Diff 540345.Jul 14 2023, 3:17 AM

Address some review comments

reames accepted this revision.Jul 18 2023, 9:37 AM

LGTM

This revision is now accepted and ready to land.Jul 18 2023, 9:37 AM
luke marked 2 inline comments as not done.Jul 19 2023, 7:28 AM
luke added inline comments.
llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
3344

@reames Do you have any thoughts on the above?

This revision was landed with ongoing or failed builds.Jul 19 2023, 9:25 AM
This revision was automatically updated to reflect the committed changes.
luke marked an inline comment as not done.

I think I'm seeing a miscompile from this

I have a tail undefined unmasked vrgather.vi with VL=2 followed by a tail undefined vmerge with VL=4. This gets combined to a tail agnostic masked undisturbed masked vrgather.vi with VL =2. I think this needs to be a tail undisturbed VL to account for elements 2 and 3 from the vmerge becoming part of the tail?

Here's a hack to avoid the buggy case

diff --git a/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp b/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
index dfa54740c0d6..ad3f0e98d12f 100644
--- a/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
@@ -3629,7 +3629,7 @@ bool RISCVDAGToDAGISel::performCombineVMergeAndVOps(SDNode *N) {
   SDValue TrueVL = True.getOperand(TrueVLIndex);
   SDValue SEW = True.getOperand(TrueVLIndex + 1);
 
-  auto GetMinVL = [](SDValue LHS, SDValue RHS) {
+  auto GetMinVL = [&](SDValue LHS, SDValue RHS) {
     if (LHS == RHS)
       return LHS;
     if (isAllOnesConstant(LHS))
@@ -3640,7 +3640,11 @@ bool RISCVDAGToDAGISel::performCombineVMergeAndVOps(SDNode *N) {
     auto *CRHS = dyn_cast<ConstantSDNode>(RHS);
     if (!CLHS || !CRHS)
       return SDValue();
-    return CLHS->getZExtValue() <= CRHS->getZExtValue() ? LHS : RHS;
+    if (CRHS->getZExtValue() <= CLHS->getZExtValue())
+      return RHS;
+    if (isImplicitDef(Merge))
+      return SDValue();
+    return LHS;
   };
 
   // Because N and True must have the same merge operand (or True's operand is
craig.topper added inline comments.Aug 16 2023, 7:24 PM
llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vselect.ll
62

I think this shows the bug. This is a ta load, formed from a vl=6 load and vl=8 merge both ta. Elements 6 and 7 are now part of the tail after merging. Elements 6 and 7 were defined if the mask was 0 for those elements. We're now ignoring mask bits 6 and 7 with the reduce vl. So we need to use tu to make them defined.