This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/test/CodeGen/RISCV/rvv/
-
test/
-
CodeGen/
-
RISCV/
-
rvv/
-
vsetvli-insert-crossbb.mir

Differential D124089

[RISCV] Add a test showing incorrect VSETVLI insertion
ClosedPublic

Authored by frasercrmck on Apr 20 2022, 6:39 AM.

Download Raw Diff

Details

Reviewers

craig.topper
rogfer01
jacquesguan

Commits

rG78c1dcbf1bb9: [RISCV] Add a test showing incorrect VSETVLI insertion

Summary

This test shows incorrect cross-bb insertion. We'd expect to see
a SEW=8 vsetvli, something like:

vsetvli zero, zero, e8, mf8, ta, mu
vluxei64.v      v1, (a2), v8, v0.t

But instead the vsetvli is omitted and instead an inherited SEW=64
vsetvli is used:

    vmv1r.v v9, v1
    vsetvli a3, zero, e64, m1, ta, mu
    vmseq.vi        v9, v1, 0
    vmv1r.v v8, v0
    vmandn.mm       v0, v9, v2
    beqz    a0, .LBB0_2
# %bb.1:
    vluxei64.v      v1, (a2), v8, v0.t
    vmv1r.v v3, v1

The "mask reg op" vmandn.mm in bb.1 appears to be confusing the insertion
process, as it is able to elide its own vsetvli as its VLMAX (SEW=8,
LMUL=MF8) is identical to the previous one (SEW=64, LMUL=1).

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

frasercrmck created this revision.Apr 20 2022, 6:39 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 20 2022, 6:39 AM

Herald added subscribers: sunshaoce, VincentWu, luke957 and 26 others. · View Herald Transcript

frasercrmck requested review of this revision.Apr 20 2022, 6:39 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 20 2022, 6:39 AM

Herald added subscribers: llvm-commits, • pcwang-thead, eopXD, MaskRay. · View Herald Transcript

frasercrmck edited the summary of this revision. (Show Details)Apr 20 2022, 6:41 AM

It appears as if this is what D119518 is trying to achieve, but since we're not inserting any VSETVLIs in bb.1, CurInfo isn't valid and so we skip the insertion. I tried making it insert vsetvlis if CurInfo is invalid (out of safety) but that's perhaps too pessimistic, as then it inserts a VSETVLI at the end of bb.2 as well. We don't have the scope during phase 3 to work out that the emergency VSETVLI inserted at the end of bb.1 would cover that.

Unless we do something naive like that then try and remove them later? We're obviously at the limits of what this pass can do so I don't know if that's just another short-term solution.

Harbormaster completed remote builds in B160445: Diff 423893.Apr 20 2022, 7:15 AM

One interesting thing is that computeIncomingVLVTYPE doesn't seem to be fully aligned with what emitVSETVLIs will do. If the latter chooses to skip a vsetvl then the Exit of that basic block might potentially be different to the one that we determined in computeVLVTYPEChanges and computeIncomingVLVTYPE.

Maybe aligning computeIncomingVLVTYPE with the expectations of emitVSETVLIs is possible. Looks like once we have computed InInfo in computeIncomingVLVTYPE we may have to run again computeVLVTYPEChanges for that block (and there make sure we use the same skip criteria as in emitVSETVLIs). The latter would now receive the InInfo (in contrast to Phase 1 where it'd be unknown) and it would compute a potentially different Exit value. This also suggests that Phase 1 might be embedded as part of Phase 2 once we have computed the InInfo. This might make the algorithm a bit slower.

D119518 mitigates the lack of alignment by reconciling both but it means that in your case (if we do CurInfo = BlockInfo[MBB.getNumber()].Pred; for the case in which we can skip a vsetvli due to the predecessors) we get an unnecessary change at the end of bb1 which we'd want to have in bb2 instead.

rebase on vtype/sew mir changes

Harbormaster completed remote builds in B160879: Diff 424510.Apr 22 2022, 10:58 AM

LGTM

This revision is now accepted and ready to land.Apr 23 2022, 3:27 PM

This revision was landed with ongoing or failed builds.May 4 2022, 6:38 AM

Closed by commit rG78c1dcbf1bb9: [RISCV] Add a test showing incorrect VSETVLI insertion (authored by frasercrmck). · Explain Why

This revision was automatically updated to reflect the committed changes.

frasercrmck added a commit: rG78c1dcbf1bb9: [RISCV] Add a test showing incorrect VSETVLI insertion.

In D124089#3463965, @rogfer01 wrote:

One interesting thing is that computeIncomingVLVTYPE doesn't seem to be fully aligned with what emitVSETVLIs will do. If the latter chooses to skip a vsetvl then the Exit of that basic block might potentially be different to the one that we determined in computeVLVTYPEChanges and computeIncomingVLVTYPE.

Maybe aligning computeIncomingVLVTYPE with the expectations of emitVSETVLIs is possible. Looks like once we have computed InInfo in computeIncomingVLVTYPE we may have to run again computeVLVTYPEChanges for that block (and there make sure we use the same skip criteria as in emitVSETVLIs). The latter would now receive the InInfo (in contrast to Phase 1 where it'd be unknown) and it would compute a potentially different Exit value. This also suggests that Phase 1 might be embedded as part of Phase 2 once we have computed the InInfo. This might make the algorithm a bit slower.

D119518 mitigates the lack of alignment by reconciling both but it means that in your case (if we do CurInfo = BlockInfo[MBB.getNumber()].Pred; for the case in which we can skip a vsetvli due to the predecessors) we get an unnecessary change at the end of bb1 which we'd want to have in bb2 instead.

Sorry for not replying earlier @rogfer01, but I wanted to thank you for your thoughts. After a bit of a break, I am about to dig in to see how best to fix it. The fact that the phases see different information is really not ideal.

frasercrmck mentioned this in D125021: [RISCV] Fix VSETVLI insertion by syncing phases 2 and 3.May 5 2022, 9:52 AM

Revision Contents

Path

Size

llvm/

test/

CodeGen/

RISCV/

rvv/

vsetvli-insert-crossbb.mir

87 lines

Diff 426988

llvm/test/CodeGen/RISCV/rvv/vsetvli-insert-crossbb.mir

Show First 20 Lines • Show All 115 Lines • ▼ Show 20 Lines	vector.body: ; preds = %vector.body, %entry
br i1 %1, label %middle.block, label %vector.body		br i1 %1, label %middle.block, label %vector.body

middle.block: ; preds = %vector.body		middle.block: ; preds = %vector.body
%2 = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> %0)		%2 = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> %0)
store i32 %2, i32* %res, align 4		store i32 %2, i32* %res, align 4
ret void		ret void
}		}

		define void @vsetvli_vluxei64_regression() {
		ret void
		}

; Function Attrs: nofree nosync nounwind readnone willreturn		; Function Attrs: nofree nosync nounwind readnone willreturn
declare i32 @llvm.vector.reduce.add.v4i32(<4 x i32>)		declare i32 @llvm.vector.reduce.add.v4i32(<4 x i32>)

; Function Attrs: nounwind readnone		; Function Attrs: nounwind readnone
declare <vscale x 1 x i64> @llvm.riscv.vadd.nxv1i64.nxv1i64.i64(<vscale x 1 x i64>, <vscale x 1 x i64>, <vscale x 1 x i64>, i64) #1		declare <vscale x 1 x i64> @llvm.riscv.vadd.nxv1i64.nxv1i64.i64(<vscale x 1 x i64>, <vscale x 1 x i64>, <vscale x 1 x i64>, i64) #1

; Function Attrs: nounwind readnone		; Function Attrs: nounwind readnone
declare <vscale x 1 x i64> @llvm.riscv.vsub.nxv1i64.nxv1i64.i64(<vscale x 1 x i64>, <vscale x 1 x i64>, <vscale x 1 x i64>, i64) #1		declare <vscale x 1 x i64> @llvm.riscv.vsub.nxv1i64.nxv1i64.i64(<vscale x 1 x i64>, <vscale x 1 x i64>, <vscale x 1 x i64>, i64) #1
▲ Show 20 Lines • Show All 598 Lines • ▼ Show 20 Lines	bb.2.middle.block:
%21:vr = IMPLICIT_DEF		%21:vr = IMPLICIT_DEF
%20:vr = PseudoVMV_S_X_M1 %21, %19, 1, 5		%20:vr = PseudoVMV_S_X_M1 %21, %19, 1, 5
%24:vr = IMPLICIT_DEF		%24:vr = IMPLICIT_DEF
%23:vr = PseudoVREDSUM_VS_M1 %24, %16, killed %20, 4, 5		%23:vr = PseudoVREDSUM_VS_M1 %24, %16, killed %20, 4, 5
PseudoVSE32_V_M1 killed %23, %8, 1, 5 :: (store (s32) into %ir.res)		PseudoVSE32_V_M1 killed %23, %8, 1, 5 :: (store (s32) into %ir.res)
PseudoRET		PseudoRET

...		...
		---
		# FIXME: This test shows incorrect VSETVLI insertion. The VLUXEI64 needs
		# configuration for SEW=8 but it instead inherits a SEW=64 from the entry
		# block.
		name: vsetvli_vluxei64_regression
		tracksRegLiveness: true
		body: \|
		; CHECK-LABEL: name: vsetvli_vluxei64_regression
		; CHECK: bb.0:
		; CHECK-NEXT: successors: %bb.1(0x80000000)
		; CHECK-NEXT: liveins: $x10, $x11, $x12, $v0, $v1, $v2, $v3
		; CHECK-NEXT: {{ $}}
		; CHECK-NEXT: %a:gpr = COPY $x10
		; CHECK-NEXT: %b:gpr = COPY $x11
		; CHECK-NEXT: %inaddr:gpr = COPY $x12
		; CHECK-NEXT: %idxs:vr = COPY $v0
		; CHECK-NEXT: %t1:vr = COPY $v1
		; CHECK-NEXT: %t3:vr = COPY $v2
		; CHECK-NEXT: %t4:vr = COPY $v3
		; CHECK-NEXT: %t5:vrnov0 = COPY $v1
		; CHECK-NEXT: dead %14:gpr = PseudoVSETVLIX0 $x0, 88 /* e64, m1, ta, mu */, implicit-def $vl, implicit-def $vtype
		; CHECK-NEXT: %t6:vr = PseudoVMSEQ_VI_M1 %t1, 0, -1, 6 /* e64 */, implicit $vl, implicit $vtype
		; CHECK-NEXT: PseudoBR %bb.1
		; CHECK-NEXT: {{ $}}
		; CHECK-NEXT: bb.1:
		; CHECK-NEXT: successors: %bb.3(0x40000000), %bb.2(0x40000000)
		; CHECK-NEXT: {{ $}}
		; CHECK-NEXT: %mask:vr = PseudoVMANDN_MM_MF8 %t6, %t3, -1, 0 /* e8 */, implicit $vl, implicit $vtype
		; CHECK-NEXT: %t2:gpr = COPY $x0
		; CHECK-NEXT: BEQ %a, %t2, %bb.3
		; CHECK-NEXT: PseudoBR %bb.2
		; CHECK-NEXT: {{ $}}
		; CHECK-NEXT: bb.2:
		; CHECK-NEXT: successors: %bb.3(0x80000000)
		; CHECK-NEXT: {{ $}}
		; CHECK-NEXT: $v0 = COPY %mask
		; CHECK-NEXT: early-clobber %t0:vrnov0 = PseudoVLUXEI64_V_M1_MF8_MASK %t5, killed %inaddr, %idxs, $v0, -1, 3 /* e8 */, 1, implicit $vl, implicit $vtype
		; CHECK-NEXT: %ldval:vr = COPY %t0
		; CHECK-NEXT: PseudoBR %bb.3
		; CHECK-NEXT: {{ $}}
		; CHECK-NEXT: bb.3:
		; CHECK-NEXT: %stval:vr = PHI %t4, %bb.1, %ldval, %bb.2
		; CHECK-NEXT: $v0 = COPY %mask
		; CHECK-NEXT: PseudoVSOXEI64_V_M1_MF8_MASK killed %stval, killed %b, %idxs, $v0, -1, 3 /* e8 */, implicit $vl, implicit $vtype
		; CHECK-NEXT: PseudoRET
		bb.0:
		successors: %bb.1
		liveins: $x10, $x11, $x12, $v0, $v1, $v2, $v3

		%a:gpr = COPY $x10
		%b:gpr = COPY $x11
		%inaddr:gpr = COPY $x12
		%idxs:vr = COPY $v0
		%t1:vr = COPY $v1
		%t3:vr = COPY $v2
		%t4:vr = COPY $v3
		%t5:vrnov0 = COPY $v1
		%t6:vr = PseudoVMSEQ_VI_M1 %t1, 0, -1, 6
		PseudoBR %bb.1

		bb.1:
		successors: %bb.3, %bb.2

		%mask:vr = PseudoVMANDN_MM_MF8 %t6, %t3, -1, 0
		%t2:gpr = COPY $x0
		BEQ %a, %t2, %bb.3
		PseudoBR %bb.2

		bb.2:
		successors: %bb.3

		$v0 = COPY %mask
		early-clobber %t0:vrnov0 = PseudoVLUXEI64_V_M1_MF8_MASK %t5, killed %inaddr, %idxs, $v0, -1, 3, 1
		%ldval:vr = COPY %t0
		PseudoBR %bb.3

		bb.3:
		%stval:vr = PHI %t4, %bb.1, %ldval, %bb.2
		$v0 = COPY %mask
		PseudoVSOXEI64_V_M1_MF8_MASK killed %stval, killed %b, %idxs, $v0, -1, 3
		PseudoRET

		...