This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Peephole optimization to fold merge.vvm and unmasked intrinsics.
ClosedPublic

Authored by fakepaper56 on Jul 24 2022, 8:50 AM.

Details

Summary

The patch uses peephole method to fold merge.vvm and unmasked intrinsics to
masked intrinsics. Using peephole intead of tablegen patterns is to avoid large
auto gnerated code.

Note: The patch ignores segment loads since I don't know how to test them.

Diff Detail

Event Timeline

fakepaper56 created this revision.Jul 24 2022, 8:50 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 24 2022, 8:50 AM
fakepaper56 requested review of this revision.Jul 24 2022, 8:50 AM
reames requested changes to this revision.Jul 28 2022, 12:56 PM

Have you looked at doing this as a dag combine instead of a post-isel peephole? I believe we should have SDNodes with masks for all of these, doing the local peephole in a combine and letting it iterate seems more straight forward. (I'm probably missing something, right?)

llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
154

Order of operations - it seems like this transform should probably be done in the prior loop. In particular, we probably want to fold the mask into the prior instruction before doPeepholeMaskedRVV turns it into an unmasked variant.

2530

Extract out a helper function for the opcode matching please.

2559

I believe you have a missing check here. Don't we need the passthrough for True to be False? Otherwise, it seems like the vmerge might be merging two unrelated values?

We could potentially phrase this as an eager demanded lane style push back of the mask which wouldn't require that check, but that doesn't seem consistent with your code below.

2608

This doesn't make any sense. The only way this could succeed is if the mask was all ones; we shouldn't have a merge with an all ones mask to begin with right?

llvm/test/CodeGen/RISCV/rvv/rvv-peephole-vmerge-vops.ll
1

Can you precommit these tests?

Please also add tests for fixed-length vectors with explicit vector length set on the runline. I remember looking at this a bit before and convincing myself there were some complicating issues with fixed for this, but not what those were unfortunately.

This revision now requires changes to proceed.Jul 28 2022, 12:56 PM

Have you looked at doing this as a dag combine instead of a post-isel peephole? I believe we should have SDNodes with masks for all of these, doing the local peephole in a combine and letting it iterate seems more straight forward. (I'm probably missing something, right?)

Most of the _VL nodes don't have a passthru operand right now. Nor do they have a policy operand. Important since this patch is focusing on the tail undisturbed case. That would need to change before this could be a DAG combine.

llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
2559

This is only handling the case for True being an unmasked, tail agnostic instruction. That's why it calls lookupMaskedIntrinsicByUnmaskedTA below.

2608

A vp.merge intrinsic can be emitted by the BSC vectorizer to represent a tail undisturbed merge. This happens in reduction loops where the last iteration has less elements than the previous iteration. So we need to merge the upper elements from a previous iteration under a tail undisturbed policy. The mask on the vp.merge would be all 1s.

Have you looked at doing this as a dag combine instead of a post-isel peephole? I believe we should have SDNodes with masks for all of these, doing the local peephole in a combine and letting it iterate seems more straight forward. (I'm probably missing something, right?)

Most of the _VL nodes don't have a passthru operand right now. Nor do they have a policy operand. Important since this patch is focusing on the tail undisturbed case. That would need to change before this could be a DAG combine.

Not a strongly held opinion but... maybe we should just do that?

llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
2559

Right, but that's insufficient.

Converting:
vadd v1, v2, v3
vmul v4, v5, v6
vmerge v7, v1, v4, v0

To:
vadd v7 v2, v3, v0
vmul v4, v5, v6

Doesn't seem like what we want?

There's a variant here which would be legal, but it doesn't seem to be the transform implemented here. That would be:
vadd v7 v2, v3, v0
v0 = not v0
vmul v7, v5, v6, v0

2608

Wouldn't an insert_sub_vector be a better representation here? Or is your point that we had an insert_sub_vector, and that got lowered to the vmerge? I'm guessing yes, that's your point because thinking that through seems to make sense.

I guess we don't have a dedicated insert_sub_reg analogous instruction. VADD could work - in analogy to scalar move idioms - but I don't have a strong reason to prefer that over vmerge if that's what we're currently using.

craig.topper added inline comments.Jul 28 2022, 1:49 PM
llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
2559

Note the earlier check that the Merge operand and False operand of the VMERGE are the same.

2608

The register size didn’t change. Only the VL. There is no statically sized subvector.

fakepaper56 added inline comments.Jul 28 2022, 6:02 PM
llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
154

The patch is try to work on unmasked intrinsics and some unmasked intrinsics are generated by doPeepholeMaskedRVV. Take an example, vp.add with all-one mask need doPeepholeMaskedRVV to be transformed to unmasked intrinsic.

2559

Yeah. I thought about the scenario during writing the patch, but I wanted to decrease the complexity of the patch and I will do those work in the future. Maybe I could add some TODO in the comments?

  1. Use a lambda function to identify a node is VMERGE TU node.
  2. Add comments.
fakepaper56 marked an inline comment as done.Jul 29 2022, 1:11 AM
fakepaper56 added inline comments.
llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
2530

Done.

fakepaper56 marked an inline comment as done.

Rebase to precommited test revision D130753.

craig.topper added inline comments.Aug 2 2022, 10:33 AM
llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
2610

This does not handle the chain output of True correctly if it has users. We would need to replace True.getValue(1) with N->getValue(1). ReplaceUses will only replace the direct users of N.

khchen added a subscriber: khchen.Aug 2 2022, 10:49 AM
craig.topper added inline comments.Aug 2 2022, 12:24 PM
llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
2610

Simple test case

define void @vpmerge_vpload_store(<vscale x 2 x i32> %passthru, <vscale x 2 x i32> * %p, <vscale x 2 x i1> %m, i32 zeroext %vl) {
; CHECK-LABEL: vpmerge_vpload_store:                                             
; CHECK:       # %bb.0:                                                          
; CHECK-NEXT:    vsetvli zero, a1, e32, m1, tu, mu                               
; CHECK-NEXT:    vle32.v v8, (a0), v0.t                                          
; CHECK-NEXT:    vs1r.v v8, (a0)                                                 
; CHECK-NEXT:    ret                                                             
  %splat = insertelement <vscale x 2 x i1> poison, i1 -1, i32 0                  
  %mask = shufflevector <vscale x 2 x i1> %splat, <vscale x 2 x i1> poison, <vscale x 2 x i32> zeroinitializer
  %a = call <vscale x 2 x i32> @llvm.vp.load.nxv2i32.p0nxv2i32(<vscale x 2 x i32> * %p, <vscale x 2 x i1> %mask, i32 %vl)
  %b = call <vscale x 2 x i32> @llvm.vp.merge.nxv2i32(<vscale x 2 x i1> %m, <vscale x 2 x i32> %a, <vscale x 2 x i32> %passthru, i32 %vl)
  store <vscale x 2 x i32> %b, <vscale x 2 x i32> * %p                           
  ret void                                                                       
}

Right after isel the MachineIR is

# Machine code for function vpmerge_vpload_store: IsSSA, TracksLiveness          
Function Live Ins: $v8 in %0, $x10 in %1, $v0 in %2, $x11 in %3                  
                                                                                 
bb.0 (%ir-block.0):                                                              
  liveins: $v8, $x10, $v0, $x11                                                  
  %3:gprnox0 = COPY $x11                                                         
  %2:vr = COPY $v0                                                               
  %1:gpr = COPY $x10                                                             
  %0:vrnov0 = COPY $v8                                                           
  %4:vr = PseudoVLE32_V_M1 %1:gpr, %3:gprnox0, 5 :: (load unknown-size from %ir.p, align 8)
  $v0 = COPY %2:vr                                                               
  %5:vrnov0 = PseudoVLE32_V_M1_MASK %0:vrnov0(tied-def 0), %1:gpr, $v0, %3:gprnox0, 5, 0
  VS1R_V killed %5:vrnov0, %1:gpr :: (store unknown-size into %ir.p, align 8)    
  PseudoRET                                                                      
                                                                                 
# End machine code for function vpmerge_vpload_store.

Notice the two VLEs. Dead code elimination will eventually delete the extra one, but it shouldn't have to. For a more complex test we might put loads and stores in the wrong order in the MachineIR.

Address Craig's comment and add machine ir tests.

fakepaper56 marked 2 inline comments as done.Aug 3 2022, 12:42 AM
fakepaper56 added inline comments.
llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
2610

Done. Thank you find the bug.

fakepaper56 marked an inline comment as done.

Refine commit message.

craig.topper added inline comments.Aug 3 2022, 9:20 PM
llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
2614

Assert that True.getResNo() == 0

Address Craig's comment by adding assertion about ResNo of True.

Rebase and ping.

@reames Do you have any concerns?

@reames Do you have any concerns?

I spoke with @reames yesterday and I think he's ok with this. Philip, can you confirm?

@reames Do you have any concerns?

I spoke with @reames yesterday and I think he's ok with this. Philip, can you confirm?

Go for it. @craig.topper convinced me I was wrong about the potential correctness issue, and I have no strong opinion on whether this is the right approach. If it's not, we can always change course and remove the code later.

MERGE_VVM -> VMERGE_VVM, merge.vvm -> vmerge.vvm

I am sorry that I updated the revision to fix typos after the agreement and I will land the revision 48 hours later if nobody will give comments.

Go for it. @craig.topper convinced me I was wrong about the potential correctness issue, and I have no strong opinion on whether this is the right approach. If it's not, we can always change course and remove the code later.

This revision was not accepted when it landed; it landed in state Needs Review.Aug 11 2022, 2:58 AM
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.