Download Raw Diff

Details

Reviewers

sdesmalen
efriedma
kmclaughlin
david-arm
craig.topper
paulwalker-arm

Summary

Fold a+truncate(vscale(c1))+truncate(vscale(c2)) to a+truncate(vscale(c1+c2))
As the vscale Constant is legalized with type i64 DAG, so it inserts a truncate
in function LowerVSCALE, which cause the mismatch in D82792.

Diff Detail

Unit TestsFailed

	Time	Test
	1,290 ms	x64 debian > LLVM.CodeGen/RISCV/rvv::calling-conv-fastcc.ll

Event Timeline

Allen created this revision.May 27 2022, 5:40 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 27 2022, 5:40 AM

Herald added subscribers: ecnelises, psnobl, hiraditya, tschuett. · View Herald Transcript

Allen requested review of this revision.May 27 2022, 5:40 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 27 2022, 5:40 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B166629: Diff 432532.May 27 2022, 6:44 AM

efriedma added inline comments.May 27 2022, 11:47 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2622	You can't assert that VSCALE returns an i64. Probably not even on Arm, but definitely not in target-independent code.
2631	Do you really need to explicitly use the number "64" to make this work?

Allen updated this revision to Diff 432681.May 27 2022, 6:33 PM

Harbormaster completed remote builds in B166739: Diff 432681.May 27 2022, 7:14 PM

I don't claim to fully understand this, so my comment here might be off base.

I suspect your fold can be generalized as: fold a+truncate(vscale(c1))+truncate(vscale(c2)) to a+truncate(vscale(c1)+vscale(c2))

The vscale(c1)+vscale(c2) to vscale(C1 + C2) is handled separately above already.

If this is true, that your transform reduces to proving that it's legal to common the truncate. However, as a far as I known trunc(x) + trunc(y) is always equal to trunc(x+y). So why do we need this transform at all? Shouldn't this be covered by generic trunc folds and the existing rule?

Anyways, I'm clearly missing something here. Any idea what?

In D126532#3544144, @reames wrote:

I don't claim to fully understand this, so my comment here might be off base.

I suspect your fold can be generalized as: fold a+truncate(vscale(c1))+truncate(vscale(c2)) to a+truncate(vscale(c1)+vscale(c2))

The vscale(c1)+vscale(c2) to vscale(C1 + C2) is handled separately above already.

If this is true, that your transform reduces to proving that it's legal to common the truncate. However, as a far as I known trunc(x) + trunc(y) is always equal to trunc(x+y). So why do we need this transform at all? Shouldn't this be covered by generic trunc folds and the existing rule?

Anyways, I'm clearly missing something here. Any idea what?

Thanks for your attention. With my debug, I find vscale is a little especial. As the vscale(C1) is not a const type node before LowerVSCALE, so we missing the transform of trunc(x) + trunc(y) --> trunc(x+y) with instcombine.
And in the dagcombine, it match the DAG IR from bottom to top, so we can capture the former a+truncate(vscale(c1))+truncate(vscale(c2)), but not truncate(vscale(c1))+truncate(vscale(c2)).

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2622	Thanks, apply your commit
2631	Yes, it can be deleted

craig.topper added a subscriber: craig.topper.May 28 2022, 3:40 PM

craig.topper added inline comments.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2635	Do you need to check that both vscales have the same types?

Allen marked 2 inline comments as done.May 28 2022, 3:59 PM

Allen added inline comments.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2635	Thanks for your attention. As only the same type node can be on the both side of ISD::ADD, so I think it can be ignored. Am I missing something?

craig.topper added inline comments.May 28 2022, 4:09 PM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2635	There's no guarantee the input types of the two truncates is the same. You could have something like (i16 trunc (i32 vscale)) and (i16 trunc (i64 vscale)).

Add new condtion to check that both vscales have the same types

Herald added a subscriber: StephenFan. · View Herald TranscriptMay 28 2022, 4:56 PM

Harbormaster completed remote builds in B166797: Diff 432759.May 28 2022, 6:44 PM

Allen marked 2 inline comments as done.May 29 2022, 8:48 AM

Allen added inline comments.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2635	Thanks for detail example. Apply your comment

Allen added a reviewer: paulwalker-arm.May 31 2022, 6:12 PM

ping ?

paulwalker-arm added inline comments.Jun 6 2022, 5:58 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2629–2630	Would the following combine be safe? ty1 truncate(ty2 vscale(c1)) -> ty1 vscale(c1) I ask because then the combine just above the new one would just work? I guess the problem is that operation legalisation might be the thing introducing the truncate but then we can just limit the combine to before then. I wouldn't expect the combine to be all that useful after legalisation anyway, although am happy to be proven wrong if you've a test case.
2631	Is the `VT.isScalarInteger()` check necessary? I figure the later `ISD::VSCALE` requirement will guaranteed such, plus I don't see anything in the if block that actually cares.

Allen updated this revision to Diff 434666.Jun 6 2022, 6:15 PM

Harbormaster completed remote builds in B168201: Diff 434666.Jun 6 2022, 6:49 PM

Matt added a subscriber: Matt.Jun 10 2022, 3:11 PM

Add the check Level > AfterLegalizeVectorOps

Harbormaster completed remote builds in B169348: Diff 436266.Jun 12 2022, 8:27 PM

Allen marked 2 inline comments as done.Jun 18 2022, 10:42 PM

Allen added inline comments.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2618	With Further debuging , I found this pattern doen't match during llvm::BeforeLegalizeTypes because when the node t26 is updated, it only add its user node t9 to the Worklist, without its recursive user node t10 https://github.com/llvm/llvm-project/blob/main/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp#L1607 Surely, It's unreasonable to recursively add nodes t10 to the worklist, as it may lead to a significant increase in compile time. SelectionDAG has 16 nodes: t0: ch = EntryToken t2: i32,ch = CopyFromReg t0, Register:i32 %0 t26: i32 = vscale Constant:i32<8> --- update last t9: i32 = add nuw t2, t26 t19: i32 = vscale Constant:i32<2> t10: i32 = add nuw t9, t19 t12: ch,glue = CopyToReg t0, Register:i32 $w0, t10 t4: i32 = vscale Constant:i32<1> t24: i32 = shl t4, Constant:i64<3> t13: ch = AArch64ISD::RET_FLAG t12, Register:i32 $w0, t12:1
2629–2630	thanks @paulwalker-arm for your idea, I thinks ty1 truncate(ty2 vscale(c1)) -> ty1 vscale(c1) is safe as C1 is a const. But If we combine this, the ISel may crash as we don't defined such pattern. Yes, the truncate is introduced in AArch64TargetLowering::LowerVSCALE, where addressing the legalisation for the new added case combine_add_vscale_C_i32. BTW: Added the check Level > AfterLegalizeVectorOps according the comment
2631	Thanks, apply your comment, and deleted the unnecessary check.

adopt as I don't work on it now

Diff 432532

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,609 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitADD(SDNode *N) {

// Fold (add (vscale * C0), (vscale * C1)) to (vscale * (C0 + C1)).		// Fold (add (vscale * C0), (vscale * C1)) to (vscale * (C0 + C1)).
if (N0.getOpcode() == ISD::VSCALE && N1.getOpcode() == ISD::VSCALE) {		if (N0.getOpcode() == ISD::VSCALE && N1.getOpcode() == ISD::VSCALE) {
const APInt &C0 = N0->getConstantOperandAPInt(0);		const APInt &C0 = N0->getConstantOperandAPInt(0);
const APInt &C1 = N1->getConstantOperandAPInt(0);		const APInt &C1 = N1->getConstantOperandAPInt(0);
return DAG.getVScale(DL, VT, C0 + C1);		return DAG.getVScale(DL, VT, C0 + C1);
}		}

// fold a+vscale(c1)+vscale(c2) -> a+vscale(c1+c2)		// fold a+vscale(c1)+vscale(c2) -> a+vscale(c1+c2)
		AllenAuthorUnsubmitted Done Reply Inline Actions With Further debuging , I found this pattern doen't match during llvm::BeforeLegalizeTypes because when the node t26 is updated, it only add its user node t9 to the Worklist, without its recursive user node t10 https://github.com/llvm/llvm-project/blob/main/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp#L1607 Surely, It's unreasonable to recursively add nodes t10 to the worklist, as it may lead to a significant increase in compile time. SelectionDAG has 16 nodes: t0: ch = EntryToken t2: i32,ch = CopyFromReg t0, Register:i32 %0 t26: i32 = vscale Constant:i32<8> --- update last t9: i32 = add nuw t2, t26 t19: i32 = vscale Constant:i32<2> t10: i32 = add nuw t9, t19 t12: ch,glue = CopyToReg t0, Register:i32 $w0, t10 t4: i32 = vscale Constant:i32<1> t24: i32 = shl t4, Constant:i64<3> t13: ch = AArch64ISD::RET_FLAG t12, Register:i32 $w0, t12:1 Allen: With Further debuging , I found this pattern doen't match during llvm::BeforeLegalizeTypes…
if ((N0.getOpcode() == ISD::ADD) &&		if ((N0.getOpcode() == ISD::ADD) &&
(N0.getOperand(1).getOpcode() == ISD::VSCALE) &&		(N0.getOperand(1).getOpcode() == ISD::VSCALE) &&
(N1.getOpcode() == ISD::VSCALE)) {		(N1.getOpcode() == ISD::VSCALE)) {
		assert(VT == MVT::i64 && "Unexpected element type!");
		efriedmaUnsubmitted Done Reply Inline Actions You can't assert that VSCALE returns an i64. Probably not even on Arm, but definitely not in target-independent code. efriedma: You can't assert that VSCALE returns an i64. Probably not even on Arm, but definitely not in…
		AllenAuthorUnsubmitted Done Reply Inline Actions Thanks, apply your commit Allen: Thanks, apply your commit
const APInt &VS0 = N0.getOperand(1)->getConstantOperandAPInt(0);		const APInt &VS0 = N0.getOperand(1)->getConstantOperandAPInt(0);
const APInt &VS1 = N1->getConstantOperandAPInt(0);		const APInt &VS1 = N1->getConstantOperandAPInt(0);
SDValue VS = DAG.getVScale(DL, VT, VS0 + VS1);		SDValue VS = DAG.getVScale(DL, VT, VS0 + VS1);
return DAG.getNode(ISD::ADD, DL, VT, N0.getOperand(0), VS);		return DAG.getNode(ISD::ADD, DL, VT, N0.getOperand(0), VS);
}		}

		// fold a+truncate(vscale(c1))+truncate(vscale(c2))
		// to a+truncate(vscale(c1+c2))
		paulwalker-armUnsubmitted Done Reply Inline Actions Would the following combine be safe? ty1 truncate(ty2 vscale(c1)) -> ty1 vscale(c1) I ask because then the combine just above the new one would just work? I guess the problem is that operation legalisation might be the thing introducing the truncate but then we can just limit the combine to before then. I wouldn't expect the combine to be all that useful after legalisation anyway, although am happy to be proven wrong if you've a test case. paulwalker-arm: Would the following combine be safe? ``` ty1 truncate(ty2 vscale(c1)) -> ty1 vscale(c1) ``` I…
		AllenAuthorUnsubmitted Done Reply Inline Actions thanks @paulwalker-arm for your idea, I thinks ty1 truncate(ty2 vscale(c1)) -> ty1 vscale(c1) is safe as C1 is a const. But If we combine this, the ISel may crash as we don't defined such pattern. Yes, the truncate is introduced in AArch64TargetLowering::LowerVSCALE, where addressing the legalisation for the new added case combine_add_vscale_C_i32. BTW: Added the check Level > AfterLegalizeVectorOps according the comment Allen: thanks @paulwalker-arm for your idea, I thinks **ty1 truncate(ty2 vscale(c1)) -> ty1 vscale…
		if (VT.isScalarInteger() && VT.getSizeInBits() < 64 &&
		efriedmaUnsubmitted Done Reply Inline Actions Do you really need to explicitly use the number "64" to make this work? efriedma: Do you really need to explicitly use the number "64" to make this work?
		AllenAuthorUnsubmitted Done Reply Inline Actions Yes, it can be deleted Allen: Yes, it can be deleted
		paulwalker-armUnsubmitted Done Reply Inline Actions Is the `VT.isScalarInteger()` check necessary? I figure the later `ISD::VSCALE` requirement will guaranteed such, plus I don't see anything in the if block that actually cares. paulwalker-arm: Is the `VT.isScalarInteger()` check necessary? I figure the later `ISD::VSCALE` requirement…
		AllenAuthorUnsubmitted Done Reply Inline Actions Thanks, apply your comment, and deleted the unnecessary check. Allen: Thanks, apply your comment, and deleted the unnecessary check.
		(N0.getOpcode() == ISD::ADD) &&
		(N0.getOperand(1).getOpcode() == ISD::TRUNCATE) &&
		(N0.getOperand(1).getOperand(0).getOpcode() == ISD::VSCALE) &&
		(N1.getOpcode() == ISD::TRUNCATE) &&
		craig.topperUnsubmitted Done Reply Inline Actions Do you need to check that both vscales have the same types? craig.topper: Do you need to check that both vscales have the same types?
		AllenAuthorUnsubmitted Done Reply Inline Actions Thanks for your attention. As only the same type node can be on the both side of ISD::ADD, so I think it can be ignored. Am I missing something? Allen: Thanks for your attention. As only the same type node can be on the both side of ISD::ADD, so…
		craig.topperUnsubmitted Done Reply Inline Actions There's no guarantee the input types of the two truncates is the same. You could have something like (i16 trunc (i32 vscale)) and (i16 trunc (i64 vscale)). craig.topper: There's no guarantee the input types of the two truncates is the same. You could have something…
		AllenAuthorUnsubmitted Done Reply Inline Actions Thanks for detail example. Apply your comment Allen: Thanks for detail example. Apply your comment
		(N1.getOperand(0).getOpcode() == ISD::VSCALE)) {
		const APInt &VS0 =
		N0.getOperand(1).getOperand(0)->getConstantOperandAPInt(0);
		const APInt &VS1 = N1.getOperand(0)->getConstantOperandAPInt(0);
		SDValue VS =
		DAG.getZExtOrTrunc(DAG.getVScale(DL, MVT::i64, VS0 + VS1), DL, VT);
		return DAG.getNode(ISD::ADD, DL, VT, N0.getOperand(0), VS);
		}

// Fold (add step_vector(c1), step_vector(c2) to step_vector(c1+c2))		// Fold (add step_vector(c1), step_vector(c2) to step_vector(c1+c2))
if (N0.getOpcode() == ISD::STEP_VECTOR &&		if (N0.getOpcode() == ISD::STEP_VECTOR &&
N1.getOpcode() == ISD::STEP_VECTOR) {		N1.getOpcode() == ISD::STEP_VECTOR) {
const APInt &C0 = N0->getConstantOperandAPInt(0);		const APInt &C0 = N0->getConstantOperandAPInt(0);
const APInt &C1 = N1->getConstantOperandAPInt(0);		const APInt &C1 = N1->getConstantOperandAPInt(0);
APInt NewStep = C0 + C1;		APInt NewStep = C0 + C1;
return DAG.getStepVector(DL, VT, NewStep);		return DAG.getStepVector(DL, VT, NewStep);
}		}
▲ Show 20 Lines • Show All 22,039 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/sve-vscale-combine.ll

	Show First 20 Lines • Show All 89 Lines • ▼ Show 20 Lines
	; CHECK-LABEL: combine_shl_vscale_i32:			; CHECK-LABEL: combine_shl_vscale_i32:
	; CHECK-NOT: shl			; CHECK-NOT: shl
	; CHECK-NEXT: rdvl x0, #1			; CHECK-NEXT: rdvl x0, #1
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%vscale = call i32 @llvm.vscale.i32()			%vscale = call i32 @llvm.vscale.i32()
	%shl = shl i32 %vscale, 4			%shl = shl i32 %vscale, 4
	ret i32 %shl			ret i32 %shl
	}			}

				; Fold a+truncate(vscale(c1))+truncate(vscale(c2)) to a+truncate(vscale(c1+c2))
				define i32 @combine_add_vscale_C_i32(i32 %index) nounwind {
				; CHECK-LABEL: combine_add_vscale_C_i32:
				; CHECK-NEXT: cntd x8, all, mul #5
				; CHECK-NEXT: add w0, w0, w8
				; CHECK-NEXT: ret
				%vscale = call i32 @llvm.vscale.i32()
				%mul8 = mul i32 %vscale, 8
				%mul2 = mul i32 %vscale, 2
				%index.next = add nuw i32 %index, %mul8
				%add = add nuw i32 %index.next, %mul2
				ret i32 %add
				}

This is an archive of the discontinued LLVM Phabricator instance.

[SVE] Add a DAG combiner fold to visitADD for vscale with truncate
AbandonedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 432532

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

llvm/test/CodeGen/AArch64/sve-vscale-combine.ll

This is an archive of the discontinued LLVM Phabricator instance.

[SVE] Add a DAG combiner fold to visitADD for vscale with truncateAbandonedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 432532

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

llvm/test/CodeGen/AArch64/sve-vscale-combine.ll

[SVE] Add a DAG combiner fold to visitADD for vscale with truncate
AbandonedPublic