Page MenuHomePhabricator

[AMDGPU] Simplify negated condition
ClosedPublic

Authored by rampitec on Thu, Dec 6, 5:31 PM.

Details

Summary

Optimize sequence:

%sel = V_CNDMASK_B32_e64 0, 1, %cc
%cmp = V_CMP_NE_U32 1, %1
$vcc = S_AND_B64 $exec, %cmp
S_CBRANCH_VCC[N]Z

>

$vcc = S_ANDN2_B64 $exec, %cc
S_CBRANCH_VCC[N]Z

It is the negation pattern inserted by DAGCombiner::visitBRCOND() in the rebuildSetCC().

Diff Detail

Repository
rL LLVM

Event Timeline

rampitec created this revision.Thu, Dec 6, 5:31 PM
arsenm added inline comments.Thu, Dec 6, 7:13 PM
lib/Target/AMDGPU/SIOptimizeExecMaskingPreRA.cpp
166 ↗(On Diff #177088)

Op2 doesn't need to be an immediate 1, it just needs to match the selected operand

186–190 ↗(On Diff #177088)

Weird formatting

237–240 ↗(On Diff #177088)

Why is just Reg itself OK? Doesn't this need to use the reg alias iterator?

test/CodeGen/AMDGPU/optimize-negated-cond-exec-masking.mir
420–423 ↗(On Diff #177088)

You can drop all of the register sections

test/CodeGen/AMDGPU/optimize-negated-cond.ll
3 ↗(On Diff #177088)

Function label

10 ↗(On Diff #177088)

Can this be reduced?

rampitec updated this revision to Diff 177125.Thu, Dec 6, 10:02 PM
rampitec marked 8 inline comments as done.
rampitec added inline comments.
lib/Target/AMDGPU/SIOptimizeExecMaskingPreRA.cpp
166 ↗(On Diff #177088)

I do not think it will work if we have select(0, 0, cc). Anyway this tries to catch a very specific pattern coming from DAG, and that is select(0, 1, cc) there. Otherwise I would need to check that selected operand is identical to Op2 *and* src0 != src1 in select. Sounds like an overkill for the problem.

186–190 ↗(On Diff #177088)

Processed with clang-format.

237–240 ↗(On Diff #177088)

Reg can or can be not a VCC. SCC is needed because s_and_b64 produces it as well. Note, RecalcRegs is a set, so if duplicated only unique values will be added. A reg unit iterator for phys is used later at the bottom of the pass to squash it all for the whole pass.

test/CodeGen/AMDGPU/optimize-negated-cond-exec-masking.mir
420–423 ↗(On Diff #177088)

Indeed, thanks!

test/CodeGen/AMDGPU/optimize-negated-cond.ll
10 ↗(On Diff #177088)

It is already reduced with bug point. In presence of mir test I can drop it altogether if you want.

nhaehnle added inline comments.Fri, Dec 7, 8:08 AM
lib/Target/AMDGPU/SIOptimizeExecMaskingPreRA.cpp
140 ↗(On Diff #177125)

Does this (and the other findReachingDef) need to be guarded to ensure the instructions come from the same basic block?

Or, to be more precise, against changes to EXEC in between?

rampitec marked an inline comment as done.Fri, Dec 7, 9:18 AM
rampitec added inline comments.
lib/Target/AMDGPU/SIOptimizeExecMaskingPreRA.cpp
140 ↗(On Diff #177125)

findReachingDef does that check. Instructions must belong to the same BB. Otherwise a dominator tree would be required. Specific check for exec is not needed, none of the instructions in that chain modify exec, and there can be no such instruction in between of first two as they are not terminators.

arsenm added inline comments.Fri, Dec 7, 11:20 AM
lib/Target/AMDGPU/SIOptimizeExecMaskingPreRA.cpp
140 ↗(On Diff #177125)

The dominator tree is implicitly used by LiveIntervals anyway

test/CodeGen/AMDGPU/optimize-negated-cond.ll
10 ↗(On Diff #177088)

Now that you have the specific pattern, you should be able to shrink it?

rampitec updated this revision to Diff 177281.Fri, Dec 7, 12:33 PM
rampitec marked 2 inline comments as done.

Reduced test.

rampitec updated this revision to Diff 177294.Fri, Dec 7, 1:28 PM
rampitec marked 2 inline comments as done.

Switched to MDT use.
Added tests with cross block defines.

lib/Target/AMDGPU/SIOptimizeExecMaskingPreRA.cpp
140 ↗(On Diff #177125)

OK, I have switched to MDT use.

arsenm added inline comments.Tue, Dec 11, 1:10 PM
lib/Target/AMDGPU/SIRegisterInfo.cpp
1617 ↗(On Diff #177298)

Isn’t there a get cached interval or is that only for regunits?

1641 ↗(On Diff #177298)

Do you actually need the dominate check? Don’t you implicitly do that when comparing the value numbers?

rampitec marked 2 inline comments as done.Tue, Dec 11, 1:19 PM
rampitec added inline comments.
lib/Target/AMDGPU/SIRegisterInfo.cpp
1617 ↗(On Diff #177298)

Right, there is getCachedRegUnit(), but not getCachedLiventerval().

1641 ↗(On Diff #177298)

This is physreg. I need to find a last def. Unfortunately individual reg units get individual LiveRanges, unlike virtual registers. So if I query a def of VCC I am getting two individual VNI for VCC_LO and VCC_HI and need to compare to find a last one. This is checked by negated_cond_vop2_redef_vcc2 test, which miscompiles w/o this check.

arsenm accepted this revision.Wed, Dec 12, 6:36 PM

LGTM. I think this is OK, but I think there still might be a more proper way to use the live intervals

This revision is now accepted and ready to land.Wed, Dec 12, 6:36 PM
This revision was automatically updated to reflect the committed changes.