This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/RISCV/
-
Target/
-
RISCV/
-
RISCVInsertVSETVLI.cpp
-
test/CodeGen/RISCV/rvv/
-
CodeGen/
-
RISCV/
-
rvv/
-
vsetvli-insert-crossbb.ll
-
vsetvli-insert-crossbb.mir

Differential D126574

[RISCV] Fix an inconsistency with compatible load/store handling
ClosedPublic

Authored by reames on May 27 2022, 3:25 PM.

Download Raw Diff

Details

Reviewers

craig.topper
frasercrmck
kito-cheng

Commits

rGdcdb0bf25bc8: [RISCV] Fix an inconsistency with compatible load/store handling

Summary

Once we've computed the incoming predecessor state, we can use the same compatibility check to decide if we need to insert a vsetvli before it. We in fact did this during the data flow (phase 1 and 2), but skipped doing when using the result (phase 3).

The test changes show minor improvements, but the actual motivation is to fix a case where strict-asserts fail. I haven't yet managed to reduce a test case down to anything sensible, will update if I manage.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

reames created this revision.May 27 2022, 3:25 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 27 2022, 3:25 PM

Herald added subscribers: sunshaoce, VincentWu, luke957 and 30 others. · View Herald Transcript

reames requested review of this revision.May 27 2022, 3:25 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 27 2022, 3:25 PM

Herald added subscribers: • pcwang-thead, eopXD, MaskRay. · View Herald Transcript

Harbormaster completed remote builds in B166722: Diff 432655.May 27 2022, 4:00 PM

Add a reduced test which violates strict asserts without this change. It can't be precommitted as strict asserts are currently enabled in tree, and thus it crashes by default.

Harbormaster completed remote builds in B167085: Diff 433141.May 31 2022, 11:52 AM

I suppose the test could be added as a separate file with strict assertions disabled. But I only think that's valuable it if there's a visible codegen bug in that test that we're fixing here. If it's just that the strict assertions trip but the codegen happens to come out okay then adding the test in this patch is best, if you ask me.

As for the change itself, I'm afraid don't understand the precise issue. In phase 1 and 2 if the current info isn't valid we don't do any compatibility checks at all, do we? We just assign the NewInfo to BBInfo.Change. So I'm a bit confused where you say "We can use the same compatibility check [...] We in fact did this during the data flow (phase 1 and 2)". Also I think the use of "can" suggests this is option we have available to us, where in fact we're fixing a bug? So isn't it required?

In D126574#3549417, @frasercrmck wrote:

I suppose the test could be added as a separate file with strict assertions disabled. But I only think that's valuable it if there's a visible codegen bug in that test that we're fixing here. If it's just that the strict assertions trip but the codegen happens to come out okay then adding the test in this patch is best, if you ask me.

There is a small codegen diff without small asserts, both versions are "correct". (i.e. it is not fixing an active miscompile, just the assertion failure.) I do not plan to precommit unless a reviewer wants it.

As for the change itself, I'm afraid don't understand the precise issue. In phase 1 and 2 if the current info isn't valid we don't do any compatibility checks at all, do we? We just assign the NewInfo to BBInfo.Change. So I'm a bit confused where you say "We can use the same compatibility check [...] We in fact did this during the data flow (phase 1 and 2)". Also I think the use of "can" suggests this is option we have available to us, where in fact we're fixing a bug? So isn't it required?

Let me lay it out for you.

In phase 1, we have no block predecessor state, and thus start with invalid. We hit the store and use the state of the store. That's fine as a stating exit state, but may get further refined in phase 2.

In phase 2, we have predecessor states and merge them. This state may be compatible with the store. If so, we use the incoming state and do *not* change the state at the store. Thus, if the store is the only instruction in the block, the output state is the input state.

In phase 3, we have the same predecessor states. Currently, we don't consider the fact the store may be compatible (i.e. we don't pass MI), and thus don't apply the store compat rule. As such, we select the state of the store (different from phase 2) and propagate that forward. We get to the end of block with a different state in this case.

In my description, my wording may be a bit sloppy. The bug is that we not consistent between phase 2 and phase 3. (phase 1 is somewhat irrelevant since we don't have the predecessor states). I can adjust the submit comment if you have suggestions on how to clarify.

Honestly, this isn't really an "interesting" bug. We just have two copies of the same code which are supposed to be the same, and they aren't. We should definitely common up this code, but I wanted to do that post bugfix.

In D126574#3551018, @reames wrote:

There is a small codegen diff without small asserts, both versions are "correct". (i.e. it is not fixing an active miscompile, just the assertion failure.) I do not plan to precommit unless a reviewer wants it.

No, that's fine. Cheers.

Let me lay it out for you.

In phase 1, we have no block predecessor state, and thus start with invalid. We hit the store and use the state of the store. That's fine as a stating exit state, but may get further refined in phase 2.

In phase 2, we have predecessor states and merge them. This state may be compatible with the store. If so, we use the incoming state and do *not* change the state at the store. Thus, if the store is the only instruction in the block, the output state is the input state.

In phase 3, we have the same predecessor states. Currently, we don't consider the fact the store may be compatible (i.e. we don't pass MI), and thus don't apply the store compat rule. As such, we select the state of the store (different from phase 2) and propagate that forward. We get to the end of block with a different state in this case.

Thanks for the explanation! I think my confusion arose because I saw phase 2 starting with predecessor state (presumably valid) and phase 3 starting with invalid state. I missed the fact that if the phase 3 state is invalid we take the predecessor state before checking compatibility. So yeah I see how the compatibility checks are different between phases 2 and 3.

In my description, my wording may be a bit sloppy. The bug is that we not consistent between phase 2 and phase 3. (phase 1 is somewhat irrelevant since we don't have the predecessor states). I can adjust the submit comment if you have suggestions on how to clarify.

Maybe something along the lines of "Once we've computed the incoming predecessor state, we should use the same compatibility check with knowledge of MI as we did in phase 2 in order to be consistent across all phases." ? It's really just the "can" that throws me off the most.

We just have two copies of the same code which are supposed to be the same, and they aren't. We should definitely common up this code, but I wanted to do that post bugfix.

SGTM.

Anyway, this patch LGTM. I also ran our internal testing before and after this patch, and everything passes with this patch applied. I saw lots of assertions before.

This revision is now accepted and ready to land.Jun 2 2022, 1:22 AM

This revision was landed with ongoing or failed builds.Jun 2 2022, 8:04 AM

Closed by commit rGdcdb0bf25bc8: [RISCV] Fix an inconsistency with compatible load/store handling (authored by reames). · Explain Why

This revision was automatically updated to reflect the committed changes.

reames added a commit: rGdcdb0bf25bc8: [RISCV] Fix an inconsistency with compatible load/store handling.

Revision Contents

Path

Size

llvm/

lib/

Target/

RISCV/

RISCVInsertVSETVLI.cpp

2 lines

test/

CodeGen/

RISCV/

rvv/

vsetvli-insert-crossbb.ll

29 lines

vsetvli-insert-crossbb.mir

2 lines

Diff 433756

llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp

Show First 20 Lines • Show All 1,117 Lines • ▼ Show 20 Lines	if (RISCVII::hasSEWOp(TSFlags)) {
MI.addOperand(MachineOperand::CreateReg(RISCV::VTYPE, /isDef/ false,		MI.addOperand(MachineOperand::CreateReg(RISCV::VTYPE, /isDef/ false,
/isImp/ true));		/isImp/ true));

if (!CurInfo.isValid()) {		if (!CurInfo.isValid()) {
// We haven't found any vector instructions or VL/VTYPE changes yet,		// We haven't found any vector instructions or VL/VTYPE changes yet,
// use the predecessor information.		// use the predecessor information.
CurInfo = BlockInfo[MBB.getNumber()].Pred;		CurInfo = BlockInfo[MBB.getNumber()].Pred;
assert(CurInfo.isValid() && "Expected a valid predecessor state.");		assert(CurInfo.isValid() && "Expected a valid predecessor state.");
if (needVSETVLI(NewInfo, CurInfo)) {		if (needVSETVLI(MI, NewInfo, CurInfo)) {
// If this is the first implicit state change, and the state change		// If this is the first implicit state change, and the state change
// requested can be proven to produce the same register contents, we		// requested can be proven to produce the same register contents, we
// can skip emitting the actual state change and continue as if we		// can skip emitting the actual state change and continue as if we
// had since we know the GPR result of the implicit state change		// had since we know the GPR result of the implicit state change
// wouldn't be used and VL/VTYPE registers are correct. Note that		// wouldn't be used and VL/VTYPE registers are correct. Note that
// we do need to model the state as if it changed as while the		// we do need to model the state as if it changed as while the
// register contents are unchanged, the abstract model can change.		// register contents are unchanged, the abstract model can change.
if (needVSETVLIPHI(NewInfo, MBB))		if (needVSETVLIPHI(NewInfo, MBB))
▲ Show 20 Lines • Show All 390 Lines • Show Last 20 Lines

llvm/test/CodeGen/RISCV/rvv/vsetvli-insert-crossbb.ll

Show First 20 Lines • Show All 499 Lines • ▼ Show 20 Lines
define <vscale x 2 x i32> @test_vsetvli_x0_x0(<vscale x 2 x i32>* %x, <vscale x 2 x i16>* %y, <vscale x 2 x i32> %z, i64 %vl, i1 %cond) nounwind {		define <vscale x 2 x i32> @test_vsetvli_x0_x0(<vscale x 2 x i32>* %x, <vscale x 2 x i16>* %y, <vscale x 2 x i32> %z, i64 %vl, i1 %cond) nounwind {
; CHECK-LABEL: test_vsetvli_x0_x0:		; CHECK-LABEL: test_vsetvli_x0_x0:
; CHECK: # %bb.0: # %entry		; CHECK: # %bb.0: # %entry
; CHECK-NEXT: vsetvli zero, a2, e32, m1, ta, mu		; CHECK-NEXT: vsetvli zero, a2, e32, m1, ta, mu
; CHECK-NEXT: vle32.v v9, (a0)		; CHECK-NEXT: vle32.v v9, (a0)
; CHECK-NEXT: andi a0, a3, 1		; CHECK-NEXT: andi a0, a3, 1
; CHECK-NEXT: beqz a0, .LBB9_2		; CHECK-NEXT: beqz a0, .LBB9_2
; CHECK-NEXT: # %bb.1: # %if		; CHECK-NEXT: # %bb.1: # %if
; CHECK-NEXT: vsetvli zero, zero, e16, mf2, ta, mu
; CHECK-NEXT: vle16.v v10, (a1)		; CHECK-NEXT: vle16.v v10, (a1)
		; CHECK-NEXT: vsetvli zero, zero, e16, mf2, ta, mu
; CHECK-NEXT: vwcvt.x.x.v v8, v10		; CHECK-NEXT: vwcvt.x.x.v v8, v10
; CHECK-NEXT: .LBB9_2: # %if.end		; CHECK-NEXT: .LBB9_2: # %if.end
; CHECK-NEXT: vsetvli zero, zero, e32, m1, ta, mu		; CHECK-NEXT: vsetvli zero, zero, e32, m1, ta, mu
; CHECK-NEXT: vadd.vv v8, v9, v8		; CHECK-NEXT: vadd.vv v8, v9, v8
; CHECK-NEXT: ret		; CHECK-NEXT: ret
entry:		entry:
%a = call <vscale x 2 x i32> @llvm.riscv.vle.nxv2i32(<vscale x 2 x i32> undef, <vscale x 2 x i32>* %x, i64 %vl)		%a = call <vscale x 2 x i32> @llvm.riscv.vle.nxv2i32(<vscale x 2 x i32> undef, <vscale x 2 x i32>* %x, i64 %vl)
br i1 %cond, label %if, label %if.end		br i1 %cond, label %if, label %if.end
Show All 21 Lines
define <vscale x 2 x i32> @test_vsetvli_x0_x0_2(<vscale x 2 x i32>* %x, <vscale x 2 x i16>* %y, <vscale x 2 x i16>* %z, i64 %vl, i1 %cond, i1 %cond2, <vscale x 2 x i32> %w) nounwind {		define <vscale x 2 x i32> @test_vsetvli_x0_x0_2(<vscale x 2 x i32>* %x, <vscale x 2 x i16>* %y, <vscale x 2 x i16>* %z, i64 %vl, i1 %cond, i1 %cond2, <vscale x 2 x i32> %w) nounwind {
; CHECK-LABEL: test_vsetvli_x0_x0_2:		; CHECK-LABEL: test_vsetvli_x0_x0_2:
; CHECK: # %bb.0: # %entry		; CHECK: # %bb.0: # %entry
; CHECK-NEXT: vsetvli zero, a3, e32, m1, ta, mu		; CHECK-NEXT: vsetvli zero, a3, e32, m1, ta, mu
; CHECK-NEXT: vle32.v v9, (a0)		; CHECK-NEXT: vle32.v v9, (a0)
; CHECK-NEXT: andi a0, a4, 1		; CHECK-NEXT: andi a0, a4, 1
; CHECK-NEXT: beqz a0, .LBB10_2		; CHECK-NEXT: beqz a0, .LBB10_2
; CHECK-NEXT: # %bb.1: # %if		; CHECK-NEXT: # %bb.1: # %if
; CHECK-NEXT: vsetvli zero, zero, e16, mf2, ta, mu
; CHECK-NEXT: vle16.v v10, (a1)		; CHECK-NEXT: vle16.v v10, (a1)
		; CHECK-NEXT: vsetvli zero, zero, e16, mf2, ta, mu
; CHECK-NEXT: vwadd.wv v9, v9, v10		; CHECK-NEXT: vwadd.wv v9, v9, v10
; CHECK-NEXT: .LBB10_2: # %if.end		; CHECK-NEXT: .LBB10_2: # %if.end
; CHECK-NEXT: andi a0, a5, 1		; CHECK-NEXT: andi a0, a5, 1
; CHECK-NEXT: beqz a0, .LBB10_4		; CHECK-NEXT: beqz a0, .LBB10_4
; CHECK-NEXT: # %bb.3: # %if2		; CHECK-NEXT: # %bb.3: # %if2
; CHECK-NEXT: vsetvli zero, zero, e16, mf2, ta, mu		; CHECK-NEXT: vsetvli zero, zero, e16, mf2, ta, mu
; CHECK-NEXT: vle16.v v10, (a2)		; CHECK-NEXT: vle16.v v10, (a2)
; CHECK-NEXT: vwadd.wv v9, v9, v10		; CHECK-NEXT: vwadd.wv v9, v9, v10
▲ Show 20 Lines • Show All 300 Lines • ▼ Show 20 Lines	if:
tail call i64 @llvm.riscv.vsetvlimax.i64(i64 2, i64 1)		tail call i64 @llvm.riscv.vsetvlimax.i64(i64 2, i64 1)
br label %if.end		br label %if.end

if.end:		if.end:
%b = call <vscale x 2 x i32> @llvm.riscv.vadd.nxv2i32(<vscale x 2 x i32> undef, <vscale x 2 x i32> %a, <vscale x 2 x i32> %y, i64 %vl)		%b = call <vscale x 2 x i32> @llvm.riscv.vadd.nxv2i32(<vscale x 2 x i32> undef, <vscale x 2 x i32> %a, <vscale x 2 x i32> %y, i64 %vl)
ret <vscale x 2 x i32> %b		ret <vscale x 2 x i32> %b
}		}

		define <vscale x 1 x double> @compat_store_consistency(i1 %cond, <vscale x 1 x double> %a, <vscale x 1 x double> %b, <vscale x 1 x double>* %p1, <vscale x 1 x float> %c, <vscale x 1 x float>* %p2) {
		; CHECK-LABEL: compat_store_consistency:
		; CHECK: # %bb.0: # %entry
		; CHECK-NEXT: andi a0, a0, 1
		; CHECK-NEXT: vsetvli a3, zero, e64, m1, ta, mu
		; CHECK-NEXT: vfadd.vv v8, v8, v9
		; CHECK-NEXT: vs1r.v v8, (a1)
		; CHECK-NEXT: beqz a0, .LBB19_2
		; CHECK-NEXT: # %bb.1: # %if.then
		; CHECK-NEXT: vse32.v v10, (a2)
		; CHECK-NEXT: .LBB19_2: # %if.end
		; CHECK-NEXT: ret
		entry:
		%res = fadd <vscale x 1 x double> %a, %b
		store <vscale x 1 x double> %res, <vscale x 1 x double>* %p1
		br i1 %cond, label %if.then, label %if.end

		if.then: ; preds = %entry
		store <vscale x 1 x float> %c, <vscale x 1 x float>* %p2
		br label %if.end

		if.end: ; preds = %if.else, %if.then
		ret <vscale x 1 x double> %res
		}

declare i64 @llvm.riscv.vsetvlimax.i64(i64, i64)		declare i64 @llvm.riscv.vsetvlimax.i64(i64, i64)
declare <vscale x 1 x double> @llvm.riscv.vle.nxv1f64.i64(<vscale x 1 x double>, <vscale x 1 x double>* nocapture, i64)		declare <vscale x 1 x double> @llvm.riscv.vle.nxv1f64.i64(<vscale x 1 x double>, <vscale x 1 x double>* nocapture, i64)
declare <vscale x 1 x double> @llvm.riscv.vfadd.nxv1f64.nxv1f64.i64(<vscale x 1 x double>, <vscale x 1 x double>, <vscale x 1 x double>, i64)		declare <vscale x 1 x double> @llvm.riscv.vfadd.nxv1f64.nxv1f64.i64(<vscale x 1 x double>, <vscale x 1 x double>, <vscale x 1 x double>, i64)
declare void @llvm.riscv.vse.nxv1f64.i64(<vscale x 1 x double>, <vscale x 1 x double>* nocapture, i64)		declare void @llvm.riscv.vse.nxv1f64.i64(<vscale x 1 x double>, <vscale x 1 x double>* nocapture, i64)
declare <vscale x 4 x i32> @llvm.riscv.vadd.mask.nxv4i32.nxv4i32(		declare <vscale x 4 x i32> @llvm.riscv.vadd.mask.nxv4i32.nxv4i32(
<vscale x 4 x i32>,		<vscale x 4 x i32>,
<vscale x 4 x i32>,		<vscale x 4 x i32>,
<vscale x 4 x i32>,		<vscale x 4 x i32>,
<vscale x 4 x i1>,		<vscale x 4 x i1>,
i64,		i64,
i64);		i64);

llvm/test/CodeGen/RISCV/rvv/vsetvli-insert-crossbb.mir

Show First 20 Lines • Show All 935 Lines • ▼ Show 20 Lines	body: \|
; CHECK-NEXT: [[COPY4:%[0-9]+]]:gpr = COPY $x0		; CHECK-NEXT: [[COPY4:%[0-9]+]]:gpr = COPY $x0
; CHECK-NEXT: BEQ killed [[PseudoVCPOP_M_B1_]], [[COPY4]], %bb.3		; CHECK-NEXT: BEQ killed [[PseudoVCPOP_M_B1_]], [[COPY4]], %bb.3
; CHECK-NEXT: PseudoBR %bb.2		; CHECK-NEXT: PseudoBR %bb.2
; CHECK-NEXT: {{ $}}		; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.2:		; CHECK-NEXT: bb.2:
; CHECK-NEXT: successors: %bb.3(0x80000000)		; CHECK-NEXT: successors: %bb.3(0x80000000)
; CHECK-NEXT: {{ $}}		; CHECK-NEXT: {{ $}}
; CHECK-NEXT: [[ADD1:%[0-9]+]]:gpr = ADD %src, [[PHI]]		; CHECK-NEXT: [[ADD1:%[0-9]+]]:gpr = ADD %src, [[PHI]]
; CHECK-NEXT: dead $x0 = PseudoVSETVLIX0 killed $x0, 69 /* e8, mf8, ta, mu */, implicit-def $vl, implicit-def $vtype, implicit $vl
; CHECK-NEXT: [[PseudoVLE8_V_MF8_:%[0-9]+]]:vrnov0 = PseudoVLE8_V_MF8 killed [[ADD1]], -1, 3 /* e8 */, implicit $vl, implicit $vtype		; CHECK-NEXT: [[PseudoVLE8_V_MF8_:%[0-9]+]]:vrnov0 = PseudoVLE8_V_MF8 killed [[ADD1]], -1, 3 /* e8 */, implicit $vl, implicit $vtype
		; CHECK-NEXT: dead $x0 = PseudoVSETVLIX0 killed $x0, 69 /* e8, mf8, ta, mu */, implicit-def $vl, implicit-def $vtype, implicit $vl
; CHECK-NEXT: [[PseudoVADD_VI_MF8_:%[0-9]+]]:vrnov0 = PseudoVADD_VI_MF8 [[PseudoVLE8_V_MF8_]], 4, -1, 3 /* e8 */, implicit $vl, implicit $vtype		; CHECK-NEXT: [[PseudoVADD_VI_MF8_:%[0-9]+]]:vrnov0 = PseudoVADD_VI_MF8 [[PseudoVLE8_V_MF8_]], 4, -1, 3 /* e8 */, implicit $vl, implicit $vtype
; CHECK-NEXT: [[ADD2:%[0-9]+]]:gpr = ADD %dst, [[PHI]]		; CHECK-NEXT: [[ADD2:%[0-9]+]]:gpr = ADD %dst, [[PHI]]
; CHECK-NEXT: PseudoVSE8_V_MF8 killed [[PseudoVADD_VI_MF8_]], killed [[ADD2]], -1, 3 /* e8 */, implicit $vl, implicit $vtype		; CHECK-NEXT: PseudoVSE8_V_MF8 killed [[PseudoVADD_VI_MF8_]], killed [[ADD2]], -1, 3 /* e8 */, implicit $vl, implicit $vtype
; CHECK-NEXT: {{ $}}		; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.3:		; CHECK-NEXT: bb.3:
; CHECK-NEXT: successors: %bb.1(0x7c000000), %bb.4(0x04000000)		; CHECK-NEXT: successors: %bb.1(0x7c000000), %bb.4(0x04000000)
; CHECK-NEXT: {{ $}}		; CHECK-NEXT: {{ $}}
; CHECK-NEXT: [[ADD3:%[0-9]+]]:gpr = ADD [[PHI]], %inc		; CHECK-NEXT: [[ADD3:%[0-9]+]]:gpr = ADD [[PHI]], %inc
▲ Show 20 Lines • Show All 53 Lines • Show Last 20 Lines