This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Don't assume tail undefined if there's no policy op
Changes PlannedPublic

Authored by luke on Jun 14 2023, 2:21 PM.

Details

Summary

A masked pseudo without a policy operand implicitly has a tail
undisturbed, masked undisturbed policy. Previously, doPeepholeMaskedRVV
assumed that without the policy op it was tail undefined, and discarded
the merge operand.

This patch addresses this by using tail undisturbed by default, and then
only when the merge operand is IMPLICIT_DEF can it relax it to tail
undefined.

We don't need to check if the policy operand was tail agnostic anymore,
as we can only relax this to tail undefined if IMPLICIT_DEF is set
anyway.

Diff Detail

Event Timeline

luke created this revision.Jun 14 2023, 2:21 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 14 2023, 2:21 PM
luke requested review of this revision.Jun 14 2023, 2:21 PM
luke added inline comments.Jun 14 2023, 2:26 PM
llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
3200

I think we need to add a pseudo variant for VPseudoBinaryM (and potentially others) that allows us to represent tail agnostic without a mask. @reames How would this work with the TU_ canonicalisation plan? Could we get away with adding a TU_ variant and including a policy operand, where we set ForceTailAgnostic=1?

This comment was removed by craig.topper.
llvm/test/CodeGen/RISCV/rvv/rvv-peephole-vmerge-vops.ll
475 ↗(On Diff #531514)

Was something other than compares affected that changed this test?

luke added inline comments.Jun 14 2023, 2:59 PM
llvm/test/CodeGen/RISCV/rvv/rvv-peephole-vmerge-vops.ll
475 ↗(On Diff #531514)

Yeah, this changes the interaction with the vmerge fold combine. The combine turns this DAG:

t27: nxv2i32 = PseudoVADD_VV_M1_MASK IMPLICIT_DEF:nxv2i32, t4, t6, Register:nxv2i1 $v0, t8, TargetConstant:i64<5>, TargetConstant:i64<1>, t35:1
  t4: nxv2i32,ch = CopyFromReg t0, Register:nxv2i32 %1
    t0: ch,glue = EntryToken
  t6: nxv2i32,ch = CopyFromReg t0, Register:nxv2i32 %2
  t8: i64,ch = CopyFromReg t0, Register:i64 %3
  t35: ch,glue = CopyToReg t0, Register:nxv2i1 $v0, t30
    t30: nxv2i1 = PseudoVMSET_M_B2 TargetConstant:i64<-1>, TargetConstant:i64<0>

Into this:

t39: nxv2i32 = PseudoVADD_VV_M1_MASK t2, t4, t6, Register:nxv2i1 $v0, t8, TargetConstant:i64<5>, TargetConstant:i64<0>, t33:1
  t2: nxv2i32,ch = CopyFromReg t0, Register:nxv2i32 %0
    t0: ch,glue = EntryToken
  t4: nxv2i32,ch = CopyFromReg t0, Register:nxv2i32 %1
  t6: nxv2i32,ch = CopyFromReg t0, Register:nxv2i32 %2
  t8: i64,ch = CopyFromReg t0, Register:i64 %3
  t33: ch,glue = CopyToReg t0, Register:nxv2i1 $v0, t30
    t30: nxv2i1 = PseudoVMSET_M_B2 TargetConstant:i64<-1>, TargetConstant:i64<0>

I.e. the merge operand in PseudoVADD_VV_M1_MASK is now set to something that isn't implicit_def.
But previously doPeepholeMaskedRVV would have converted this into a tail undefined PseudoVADD_VV_M1 anyway, even though we need to preserve the merge operand. The tail undefined then got translated into a ta policy.

luke added inline comments.Jun 14 2023, 3:07 PM
llvm/test/CodeGen/RISCV/rvv/rvv-peephole-vmerge-vops.ll
475 ↗(On Diff #531514)

*turns this node:

luke edited the summary of this revision. (Show Details)Jun 15 2023, 5:25 AM
luke added inline comments.Jun 15 2023, 7:21 AM
llvm/test/CodeGen/RISCV/rvv/ceil-vp.ll
36

https://reviews.llvm.org/D135122 is the reason why the RISCVISD::SETCC_VL nodes have the mask as both the passthru and mask operand.
Do we still need to set v0 here to be correct?

Can I ask you to hold this patch for a day or so? I've got an idea on correctly inferring tail undefined here that I want to try out. If it works, this becomes much closer to NFC.

Can I ask you to hold this patch for a day or so? I've got an idea on correctly inferring tail undefined here that I want to try out. If it works, this becomes much closer to NFC.

The more complicated idea turned out not to be needed, and instead I posted the mini series ending in https://reviews.llvm.org/D153070.

I think if you rebase on that, some of your diffs should disappear. I am curious as to the case which requires the _TU pseudo for VPatBinaryM. I didn't run into that with my change, and I suspect your refactor here is creating more TU nodes.

llvm/test/CodeGen/RISCV/rvv/roundtozero-vp.ll
698

This delta is very odd as v0 doesn't appear to be used. Why is this not getting DCE?

luke added inline comments.Jun 19 2023, 6:59 AM
llvm/test/CodeGen/RISCV/rvv/roundtozero-vp.ll
698

v0 is being used as the passthrough for the vmflt.vf below. These new v0s come from these setcc_vl nodes which have vmset as both the passthrough and mask operands:

t51: nxv8i1 = RISCVISD::SETCC_VL t49, t50, setolt:ch, t62, t62, t78
t62: nxv8i1 = RISCVISD::VMSET_VL Register:i64 $x0

It's difficult to see the diff in these test cases, but I'm now convinced that the vmset.m v0 is necessary to be correct. Here's a reduced test case which shows the issue a bit better:

define <vscale x 1 x i1> @vmflt_allonesmask(<vscale x 1 x float> %a, <vscale x 1 x float> %b, i64 %vl, ptr %p) {
entry:
  %head = insertelement <vscale x 1 x i1> poison, i1 true, i32 0
  %mask = shufflevector <vscale x 1 x i1> %head, <vscale x 1 x i1> poison, <vscale x 1 x i32> zeroinitializer
  %passthru = load <vscale x 1 x i1>, ptr %p
  %v = call <vscale x 1 x i1> @llvm.riscv.vmflt.mask.nxv1f32(
    <vscale x 1 x i1> %passthru,
    <vscale x 1 x float> %a,
    <vscale x 1 x float> %b,
    <vscale x 1 x i1> %mask,
    i64 %vl)
  ret <vscale x 1 x i1> %v
}

On head@6947db2778e0f4799f5311bc80fe7963aa8409c6 this generates:

vmflt_allonesmask:                      # @vmflt_allonesmask
        .cfi_startproc
# %bb.0:                                # %entry
        vsetvli zero, a0, e32, mf2, ta, ma
        vmflt.vv        v0, v8, v9
        ret

Which I believe is wrong because we still need to load the passthrough into v0 to preserve tail agnostic semantics.

The intrinsic is selected as:
t14: nxv1i1 = PseudoVMFLT_VV_MF2_MASK nofpexcept t20, t2, t4, Register:nxv1i1 $v0, t6, TargetConstant:i64<5>, t22:1

And then the post-isel peephole discards the passthrough t20:
t25: nxv1i1 = PseudoVMFLT_VV_MF2 nofpexcept t2, t4, t6, TargetConstant:i64<5>

With the patch we get:

vmflt_allonesmask:                      # @vmflt_allonesmask
        .cfi_startproc
# %bb.0:                                # %entry
        vsetvli a2, zero, e8, mf8, ta, ma
        vlm.v   v0, (a1)
        vsetvli zero, a0, e32, mf2, ta, ma
        vmflt.vv        v0, v8, v9
        ret

I'll add these smaller cases as a precommit test to show this better.

luke added a comment.Jun 19 2023, 7:07 AM

Can I ask you to hold this patch for a day or so? I've got an idea on correctly inferring tail undefined here that I want to try out. If it works, this becomes much closer to NFC.

The more complicated idea turned out not to be needed, and instead I posted the mini series ending in https://reviews.llvm.org/D153070.

I think if you rebase on that, some of your diffs should disappear. I am curious as to the case which requires the _TU pseudo for VPatBinaryM. I didn't run into that with my change, and I suspect your refactor here is creating more TU nodes.

Rebasing removed the diffs in rvv-peephole-vmerge-vops.ll, allone-masked-to-unmasked.ll and fixed-vectors-peephole-vmerge-vops.ll, but the other diffs remain. Will update this revision shortly

luke updated this revision to Diff 532660.Jun 19 2023, 8:02 AM

Rebase, handle the case where the unmasked pseudo has a policy operand but the masked pseudo doesn't

luke planned changes to this revision.Aug 2 2023, 8:32 AM

After the intrinsics refactoring this patch needs reworking