This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] Hoist add/sub binop w/ constant op only if it won't increase divergency node
Needs ReviewPublic

Authored by bcl5980 on Apr 19 2023, 5:49 AM.

Details

Diff Detail

Event Timeline

bcl5980 created this revision.Apr 19 2023, 5:49 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 19 2023, 5:49 AM
bcl5980 requested review of this revision.Apr 19 2023, 5:49 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 19 2023, 5:49 AM

This is a general solution for regression of D148463.

bcl5980 updated this revision to Diff 515119.Apr 19 2023, 4:07 PM

update test result.

foad added inline comments.Apr 20 2023, 3:23 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
3903–3904

We also want to do this if x is divergent and y is uniform.

foad added inline comments.Apr 20 2023, 3:25 AM
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.barrier.ll
57–59

Regression

78–80

Regression

bcl5980 added inline comments.Apr 20 2023, 4:32 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
3903–3904

If x is divergent and y is uniform, we can transform to
(x + C) - y --> x + (C - y)
y - (x + C) --> (y - C) - x
(x - C) -y --> x - (C + y) ;; this is speical case that AMDGPU doesnt' support op1 as sgpr, so this case we needn't optimize it. But this should be a TLI behaivor. Generally I think we still can't enable it.
(C - x) - y --> (C -y) -x

So I believe in DAGCombiner we only need to enable it when the divergency property is the same.

bcl5980 added inline comments.Apr 20 2023, 5:02 AM
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.barrier.ll
57–59

I think the scalar instruction + vop2 instruction should save power compare to v_xad_u32. Especially the xor value is -1.
But if you insist I can try to "fix" it.

foad added inline comments.Apr 20 2023, 5:17 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
3903–3904

The rules we implement in DAGCombiner should make sense without having to understand exactly what any particular target can do.

The original rule here was "pull constants out of nested adds/subs".

The new rule is "pull constants out of nested adds/subs, unless that increases the number of divergent nodes in the dag".

bcl5980 added inline comments.Apr 20 2023, 5:26 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
3903–3904

Yeah, the rule is correct. But it means (N0->isDivergent() == N1->isDivergent()) not (N0->isDivergent() || !N1->isDivergent()) I think.

foad added inline comments.Apr 20 2023, 8:39 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
3903–3904

These reassociations only increase the number of divergent nodes in one case: when x is uniform and y is divergent.

bcl5980 added inline comments.Apr 20 2023, 3:36 PM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
3903–3904

There are 3 operators, 1 operator is uniform, 1 is divergency, 1 is constant, we can always do (uniform op constant) op divergency.
So we haven't use any particular target information here. I still think N0->isDivergent() == N1->isDivergent() is better and cleaner.
But if you insist I can change to your way.

bcl5980 updated this revision to Diff 515514.Apr 20 2023, 3:43 PM
bcl5980 retitled this revision from [DAGCombiner] Limit 'hoist add/sub binop w/ constant op' to the same divergency property to [DAGCombiner] Hoist add/sub binop w/ constant op only if it won't increase divergency node.
bcl5980 edited the summary of this revision. (Show Details)
foad added inline comments.Apr 21 2023, 4:06 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
3903–3904

If you use N0->isDivergent() == N1->isDivergent() then this target independent code will not transform: (divergent op constant) op uniform into (divergent op uniform) op constant
I think it *should* do that transform, because:

  1. pulling constants out is generally useful (that is why this code exists in the first place)
  2. it does not increase the number of divergent nodes - both before and after the transform, there are two divergent "op" nodes

Can you explain why it should *not* do the transform?

foad added inline comments.Apr 21 2023, 4:07 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
3903–3904

But if you insist I can change to your way.

I don't want to insist, I want to reach agreement :)

bcl5980 added inline comments.Apr 21 2023, 10:19 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
3903–3904

I think it’s unnecessary because that case should be transformed to:
‘(uniform op constant) op divergent’
Hoist constant can get potential benefits but uniform instruction can get real benefits.

foad added inline comments.Apr 22 2023, 6:03 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
3903–3904

I think it’s unnecessary because that case should be transformed to:
‘(uniform op constant) op divergent’

That sounds fine but it is implemented yet (in a target-independent way)? Or can you implement it in this patch or in a follow-up?

bcl5980 added inline comments.Apr 23 2023, 4:33 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
3903–3904

Maybe I can move D148463 into target-independent way?
I think it's OK for me to use if (N0->isDivergent() || !N1->isDivergent()) in this patch and later we can do step by step if neceassary.
And another question is do I need to fix the "regression" for v_xad_u32 before this patch?

arsenm added inline comments.Jun 22 2023, 6:56 AM
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.barrier.ll
77

This is the kind of regression I expect out of globalisel, I'm surprised the DAG regressed here

Allen added a subscriber: Allen.Aug 5 2023, 7:38 AM