This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/lib/CodeGen/SelectionDAG/
-
lib/
-
CodeGen/
-
SelectionDAG/
5/11
DAGCombiner.cpp

Differential D148710

[DAGCombiner] Hoist add/sub binop w/ constant op only if it won't increase divergency node
Needs ReviewPublic

Authored by bcl5980 on Apr 19 2023, 5:49 AM.

Download Raw Diff

Details

Reviewers

RKSimon
foad
arsenm
rampitec

Diff Detail

Unit TestsFailed

	Time	Test
	100 ms	x64 debian > LLVM.CodeGen/AMDGPU::llvm.amdgcn.s.barrier.ll
	120 ms	x64 debian > LLVM.Examples/OrcV2Examples::lljit-with-thinlto-summaries.test
	20 ms	x64 debian > LLVM.Examples/OrcV2Examples::orcv2-cbindings-add-object-file.test
	30 ms	x64 debian > LLVM.Examples/OrcV2Examples::orcv2-cbindings-basic-usage.test
	30 ms	x64 debian > LLVM.Examples/OrcV2Examples::orcv2-cbindings-lazy.test
		View Full Test Results (6 Failed)

Event Timeline

bcl5980 created this revision.Apr 19 2023, 5:49 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 19 2023, 5:49 AM

Herald added subscribers: StephenFan, ecnelises, hiraditya. · View Herald Transcript

bcl5980 requested review of this revision.Apr 19 2023, 5:49 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 19 2023, 5:49 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

This is a general solution for regression of D148463.

Harbormaster completed remote builds in B226582: Diff 514919.Apr 19 2023, 6:31 AM

bcl5980 mentioned this in D148463: [AMDGPU] Ressociate patterns with sub to use SALU.Apr 19 2023, 4:04 PM

update test result.

Herald added subscribers: kosarev, kerbowa, jvesely. · View Herald TranscriptApr 19 2023, 4:07 PM

bcl5980 added reviewers: arsenm, rampitec.Apr 19 2023, 4:08 PM

Herald added a subscriber: wdng. · View Herald TranscriptApr 19 2023, 4:08 PM

Harbormaster completed remote builds in B226718: Diff 515119.Apr 19 2023, 4:41 PM

foad added inline comments.Apr 20 2023, 3:23 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
3874–3899	We also want to do this if x is divergent and y is uniform.

foad added inline comments.Apr 20 2023, 3:25 AM

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.barrier.ll
57–58 ↗	(On Diff #515119)	Regression
78–79 ↗	(On Diff #515119)	Regression

bcl5980 added inline comments.Apr 20 2023, 4:32 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
3874–3899	If x is divergent and y is uniform, we can transform to (x + C) - y --> x + (C - y) y - (x + C) --> (y - C) - x (x - C) -y --> x - (C + y) ;; this is speical case that AMDGPU doesnt' support op1 as sgpr, so this case we needn't optimize it. But this should be a TLI behaivor. Generally I think we still can't enable it. (C - x) - y --> (C -y) -x So I believe in DAGCombiner we only need to enable it when the divergency property is the same.

bcl5980 added inline comments.Apr 20 2023, 5:02 AM

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.barrier.ll
57–58 ↗	(On Diff #515119)	I think the scalar instruction + vop2 instruction should save power compare to v_xad_u32. Especially the xor value is -1. But if you insist I can try to "fix" it.

foad added inline comments.Apr 20 2023, 5:17 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
3874–3899	The rules we implement in DAGCombiner should make sense without having to understand exactly what any particular target can do. The original rule here was "pull constants out of nested adds/subs". The new rule is "pull constants out of nested adds/subs, unless that increases the number of divergent nodes in the dag".

bcl5980 added inline comments.Apr 20 2023, 5:26 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
3874–3899	Yeah, the rule is correct. But it means (N0->isDivergent() == N1->isDivergent()) not (N0->isDivergent() \|\| !N1->isDivergent()) I think.

foad added inline comments.Apr 20 2023, 8:39 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
3874–3899	These reassociations only increase the number of divergent nodes in one case: when x is uniform and y is divergent.

bcl5980 added inline comments.Apr 20 2023, 3:36 PM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
3874–3899	There are 3 operators, 1 operator is uniform, 1 is divergency, 1 is constant, we can always do `(uniform op constant) op divergency`. So we haven't use any particular target information here. I still think `N0->isDivergent() == N1->isDivergent()` is better and cleaner. But if you insist I can change to your way.

bcl5980 updated this revision to Diff 515514.Apr 20 2023, 3:43 PM

bcl5980 retitled this revision from [DAGCombiner] Limit 'hoist add/sub binop w/ constant op' to the same divergency property to [DAGCombiner] Hoist add/sub binop w/ constant op only if it won't increase divergency node.

bcl5980 edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B227004: Diff 515514.Apr 20 2023, 4:46 PM

foad added inline comments.Apr 21 2023, 4:06 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
3874–3899	If you use `N0->isDivergent() == N1->isDivergent()` then this target independent code will not transform: `(divergent op constant) op uniform` into `(divergent op uniform) op constant` I think it should do that transform, because: pulling constants out is generally useful (that is why this code exists in the first place) it does not increase the number of divergent nodes - both before and after the transform, there are two divergent "op" nodes Can you explain why it should not do the transform?

foad added inline comments.Apr 21 2023, 4:07 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
3874–3899	But if you insist I can change to your way. I don't want to insist, I want to reach agreement :)

bcl5980 added inline comments.Apr 21 2023, 10:19 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
3874–3899	I think it’s unnecessary because that case should be transformed to: ‘(uniform op constant) op divergent’ Hoist constant can get potential benefits but uniform instruction can get real benefits.

foad added inline comments.Apr 22 2023, 6:03 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
3874–3899	I think it’s unnecessary because that case should be transformed to: ‘(uniform op constant) op divergent’ That sounds fine but it is implemented yet (in a target-independent way)? Or can you implement it in this patch or in a follow-up?

bcl5980 added inline comments.Apr 23 2023, 4:33 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
3874–3899	Maybe I can move D148463 into target-independent way? I think it's OK for me to use `if (N0->isDivergent() \|\| !N1->isDivergent())` in this patch and later we can do step by step if neceassary. And another question is do I need to fix the "regression" for v_xad_u32 before this patch?

arsenm added inline comments.Jun 22 2023, 6:56 AM

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.barrier.ll
77 ↗	(On Diff #515514)	This is the kind of regression I expect out of globalisel, I'm surprised the DAG regressed here

Allen added a subscriber: Allen.Aug 5 2023, 7:38 AM

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

50 lines

Diff 514919

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 3,865 Lines • ▼ Show 20 Lines SDValue DAGCombiner::visitSUB(SDNode *N) {

// And if the target does not like this form then turn into: // And if the target does not like this form then turn into:

// add (add x, y), 1 // add (add x, y), 1

if (TLI.preferIncOfAddToSubOfNot(VT) && N1.hasOneUse() && isBitwiseNot(N1)) { if (TLI.preferIncOfAddToSubOfNot(VT) && N1.hasOneUse() && isBitwiseNot(N1)) {

SDValue Add = DAG.getNode(ISD::ADD, DL, VT, N0, N1.getOperand(0)); SDValue Add = DAG.getNode(ISD::ADD, DL, VT, N0, N1.getOperand(0));

return DAG.getNode(ISD::ADD, DL, VT, Add, DAG.getConstant(1, DL, VT)); return DAG.getNode(ISD::ADD, DL, VT, Add, DAG.getConstant(1, DL, VT));

} }

// Hoist one-use addition by non-opaque constant: // Hoist one-use addition by non-opaque constant:

if (N0->isDivergent() == N1->isDivergent()) {

// (x + C) - y -> (x - y) + C // (x + C) - y -> (x - y) + C

if (N0.getOpcode() == ISD::ADD && N0.hasOneUse() && if (N0.getOpcode() == ISD::ADD && N0.hasOneUse() &&

isConstantOrConstantVector(N0.getOperand(1), /*NoOpaques=*/true)) { isConstantOrConstantVector(N0.getOperand(1), /*NoOpaques=*/true)) {

SDValue Sub = DAG.getNode(ISD::SUB, DL, VT, N0.getOperand(0), N1); SDValue Sub = DAG.getNode(ISD::SUB, DL, VT, N0.getOperand(0), N1);

return DAG.getNode(ISD::ADD, DL, VT, Sub, N0.getOperand(1)); return DAG.getNode(ISD::ADD, DL, VT, Sub, N0.getOperand(1));

} }

// y - (x + C) -> (y - x) - C // y - (x + C) -> (y - x) - C

if (N1.getOpcode() == ISD::ADD && N1.hasOneUse() && if (N1.getOpcode() == ISD::ADD && N1.hasOneUse() &&

isConstantOrConstantVector(N1.getOperand(1), /*NoOpaques=*/true)) { isConstantOrConstantVector(N1.getOperand(1), /*NoOpaques=*/true)) {

SDValue Sub = DAG.getNode(ISD::SUB, DL, VT, N0, N1.getOperand(0)); SDValue Sub = DAG.getNode(ISD::SUB, DL, VT, N0, N1.getOperand(0));

return DAG.getNode(ISD::SUB, DL, VT, Sub, N1.getOperand(1)); return DAG.getNode(ISD::SUB, DL, VT, Sub, N1.getOperand(1));

} }

// (x - C) - y -> (x - y) - C // (x - C) - y -> (x - y) - C

// This is necessary because SUB(X,C) -> ADD(X,-C) doesn't work for vectors. // This is necessary because SUB(X,C) -> ADD(X,-C) doesn't work for vectors.

if (N0.getOpcode() == ISD::SUB && N0.hasOneUse() && if (N0.getOpcode() == ISD::SUB && N0.hasOneUse() &&

isConstantOrConstantVector(N0.getOperand(1), /*NoOpaques=*/true)) { isConstantOrConstantVector(N0.getOperand(1), /*NoOpaques=*/true)) {

SDValue Sub = DAG.getNode(ISD::SUB, DL, VT, N0.getOperand(0), N1); SDValue Sub = DAG.getNode(ISD::SUB, DL, VT, N0.getOperand(0), N1);

return DAG.getNode(ISD::SUB, DL, VT, Sub, N0.getOperand(1)); return DAG.getNode(ISD::SUB, DL, VT, Sub, N0.getOperand(1));

} }

// (C - x) - y -> C - (x + y) // (C - x) - y -> C - (x + y)

if (N0.getOpcode() == ISD::SUB && N0.hasOneUse() && if (N0.getOpcode() == ISD::SUB && N0.hasOneUse() &&

isConstantOrConstantVector(N0.getOperand(0), /*NoOpaques=*/true)) { isConstantOrConstantVector(N0.getOperand(0), /*NoOpaques=*/true)) {

SDValue Add = DAG.getNode(ISD::ADD, DL, VT, N0.getOperand(1), N1); SDValue Add = DAG.getNode(ISD::ADD, DL, VT, N0.getOperand(1), N1);

return DAG.getNode(ISD::SUB, DL, VT, N0.getOperand(0), Add); return DAG.getNode(ISD::SUB, DL, VT, N0.getOperand(0), Add);

} }

foadUnsubmitted

Not Done

// Hoist one-use addition by non-opaque constant:

- if (N0->isDivergent() == N1->isDivergent()) {

+ if (N0->isDivergent() || !N1->isDivergent()) {

// (x + C) - y -> (x - y) + C

We also want to do this if x is divergent and y is uniform.

foad: We also want to do this if x is divergent and y is uniform.

bcl5980AuthorUnsubmitted

Done

If x is divergent and y is uniform, we can transform to
(x + C) - y --> x + (C - y)
y - (x + C) --> (y - C) - x
(x - C) -y --> x - (C + y) ;; this is speical case that AMDGPU doesnt' support op1 as sgpr, so this case we needn't optimize it. But this should be a TLI behaivor. Generally I think we still can't enable it.
(C - x) - y --> (C -y) -x

So I believe in DAGCombiner we only need to enable it when the divergency property is the same.

bcl5980: If x is divergent and y is uniform, we can transform to (x + C) - y --> x + (C - y) y - (x +…

foadUnsubmitted

Not Done

The rules we implement in DAGCombiner should make sense without having to understand exactly what any particular target can do.

The original rule here was "pull constants out of nested adds/subs".

The new rule is "pull constants out of nested adds/subs, unless that increases the number of divergent nodes in the dag".

foad: The rules we implement in DAGCombiner should make sense without having to understand exactly…

bcl5980AuthorUnsubmitted

Done

Yeah, the rule is correct. But it means (N0->isDivergent() == N1->isDivergent()) not (N0->isDivergent() || !N1->isDivergent()) I think.

bcl5980: Yeah, the rule is correct. But it means (N0->isDivergent() == N1->isDivergent()) not (N0…

foadUnsubmitted

Not Done

These reassociations only increase the number of divergent nodes in one case: when x is uniform and y is divergent.

foad: These reassociations only increase the number of divergent nodes in one case: when x is uniform…

bcl5980AuthorUnsubmitted

Done

There are 3 operators, 1 operator is uniform, 1 is divergency, 1 is constant, we can always do (uniform op constant) op divergency.
So we haven't use any particular target information here. I still think N0->isDivergent() == N1->isDivergent() is better and cleaner.
But if you insist I can change to your way.

bcl5980: There are 3 operators, 1 operator is uniform, 1 is divergency, 1 is constant, we can always do…

foadUnsubmitted

Not Done

If you use N0->isDivergent() == N1->isDivergent() then this target independent code will not transform: (divergent op constant) op uniform into (divergent op uniform) op constant
I think it *should* do that transform, because:

pulling constants out is generally useful (that is why this code exists in the first place)
it does not increase the number of divergent nodes - both before and after the transform, there are two divergent "op" nodes

Can you explain why it should *not* do the transform?

foad: If you use `N0->isDivergent() == N1->isDivergent()` then this target independent code will not…

bcl5980AuthorUnsubmitted

Done

I think it’s unnecessary because that case should be transformed to:
‘(uniform op constant) op divergent’
Hoist constant can get potential benefits but uniform instruction can get real benefits.

bcl5980: I think it’s unnecessary because that case should be transformed to: ‘(uniform op constant) op…

foadUnsubmitted

Not Done

I think it’s unnecessary because that case should be transformed to:
‘(uniform op constant) op divergent’

That sounds fine but it is implemented yet (in a target-independent way)? Or can you implement it in this patch or in a follow-up?

foad: > I think it’s unnecessary because that case should be transformed to: > ‘(uniform op constant)…

bcl5980AuthorUnsubmitted

Done

Maybe I can move D148463 into target-independent way?
I think it's OK for me to use if (N0->isDivergent() || !N1->isDivergent()) in this patch and later we can do step by step if neceassary.
And another question is do I need to fix the "regression" for v_xad_u32 before this patch?

bcl5980: Maybe I can move D148463 into target-independent way? I think it's OK for me to use ` if (N0…

foadUnsubmitted

Not Done

But if you insist I can change to your way.

I don't want to insist, I want to reach agreement :)

foad: > But if you insist I can change to your way. I don't want to insist, I want to reach agreement…

}

// If the target's bool is represented as 0/-1, prefer to make this 'add 0/-1' // If the target's bool is represented as 0/-1, prefer to make this 'add 0/-1'

// rather than 'sub 0/1' (the sext should get folded). // rather than 'sub 0/1' (the sext should get folded).

// sub X, (zext i1 Y) --> add X, (sext i1 Y) // sub X, (zext i1 Y) --> add X, (sext i1 Y)

if (N1.getOpcode() == ISD::ZERO_EXTEND && if (N1.getOpcode() == ISD::ZERO_EXTEND &&

N1.getOperand(0).getScalarValueSizeInBits() == 1 && N1.getOperand(0).getScalarValueSizeInBits() == 1 &&

TLI.getBooleanContents(VT) == TLI.getBooleanContents(VT) ==

TargetLowering::ZeroOrNegativeOneBooleanContent) { TargetLowering::ZeroOrNegativeOneBooleanContent) {

▲ Show 20 Lines • Show All 23,245 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] Hoist add/sub binop w/ constant op only if it won't increase divergency nodeNeeds ReviewPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 514919

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

[DAGCombiner] Hoist add/sub binop w/ constant op only if it won't increase divergency node
Needs ReviewPublic