This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Simplify negated condition
ClosedPublic

Authored by rampitec on Dec 6 2018, 5:31 PM.

Details

Summary

Optimize sequence:

%sel = V_CNDMASK_B32_e64 0, 1, %cc
%cmp = V_CMP_NE_U32 1, %1
$vcc = S_AND_B64 $exec, %cmp
S_CBRANCH_VCC[N]Z

>

$vcc = S_ANDN2_B64 $exec, %cc
S_CBRANCH_VCC[N]Z

It is the negation pattern inserted by DAGCombiner::visitBRCOND() in the rebuildSetCC().

Diff Detail

Event Timeline

rampitec created this revision.Dec 6 2018, 5:31 PM
arsenm added inline comments.Dec 6 2018, 7:13 PM
lib/Target/AMDGPU/SIOptimizeExecMaskingPreRA.cpp
166

Op2 doesn't need to be an immediate 1, it just needs to match the selected operand

186–190

Weird formatting

237–240

Why is just Reg itself OK? Doesn't this need to use the reg alias iterator?

test/CodeGen/AMDGPU/optimize-negated-cond-exec-masking.mir
420–423

You can drop all of the register sections

test/CodeGen/AMDGPU/optimize-negated-cond.ll
3

Function label

10

Can this be reduced?

rampitec updated this revision to Diff 177125.Dec 6 2018, 10:02 PM
rampitec marked 8 inline comments as done.
rampitec added inline comments.
lib/Target/AMDGPU/SIOptimizeExecMaskingPreRA.cpp
166

I do not think it will work if we have select(0, 0, cc). Anyway this tries to catch a very specific pattern coming from DAG, and that is select(0, 1, cc) there. Otherwise I would need to check that selected operand is identical to Op2 *and* src0 != src1 in select. Sounds like an overkill for the problem.

186–190

Processed with clang-format.

237–240

Reg can or can be not a VCC. SCC is needed because s_and_b64 produces it as well. Note, RecalcRegs is a set, so if duplicated only unique values will be added. A reg unit iterator for phys is used later at the bottom of the pass to squash it all for the whole pass.

test/CodeGen/AMDGPU/optimize-negated-cond-exec-masking.mir
420–423

Indeed, thanks!

test/CodeGen/AMDGPU/optimize-negated-cond.ll
10

It is already reduced with bug point. In presence of mir test I can drop it altogether if you want.

nhaehnle added inline comments.Dec 7 2018, 8:08 AM
lib/Target/AMDGPU/SIOptimizeExecMaskingPreRA.cpp
140

Does this (and the other findReachingDef) need to be guarded to ensure the instructions come from the same basic block?

Or, to be more precise, against changes to EXEC in between?

rampitec marked an inline comment as done.Dec 7 2018, 9:18 AM
rampitec added inline comments.
lib/Target/AMDGPU/SIOptimizeExecMaskingPreRA.cpp
140

findReachingDef does that check. Instructions must belong to the same BB. Otherwise a dominator tree would be required. Specific check for exec is not needed, none of the instructions in that chain modify exec, and there can be no such instruction in between of first two as they are not terminators.

arsenm added inline comments.Dec 7 2018, 11:20 AM
lib/Target/AMDGPU/SIOptimizeExecMaskingPreRA.cpp
140

The dominator tree is implicitly used by LiveIntervals anyway

test/CodeGen/AMDGPU/optimize-negated-cond.ll
10

Now that you have the specific pattern, you should be able to shrink it?

rampitec updated this revision to Diff 177281.Dec 7 2018, 12:33 PM
rampitec marked 2 inline comments as done.

Reduced test.

rampitec updated this revision to Diff 177294.Dec 7 2018, 1:28 PM
rampitec marked 2 inline comments as done.

Switched to MDT use.
Added tests with cross block defines.

lib/Target/AMDGPU/SIOptimizeExecMaskingPreRA.cpp
140

OK, I have switched to MDT use.

arsenm added inline comments.Dec 11 2018, 1:10 PM
lib/Target/AMDGPU/SIRegisterInfo.cpp
1616

Isn’t there a get cached interval or is that only for regunits?

1640

Do you actually need the dominate check? Don’t you implicitly do that when comparing the value numbers?

rampitec marked 2 inline comments as done.Dec 11 2018, 1:19 PM
rampitec added inline comments.
lib/Target/AMDGPU/SIRegisterInfo.cpp
1616

Right, there is getCachedRegUnit(), but not getCachedLiventerval().

1640

This is physreg. I need to find a last def. Unfortunately individual reg units get individual LiveRanges, unlike virtual registers. So if I query a def of VCC I am getting two individual VNI for VCC_LO and VCC_HI and need to compare to find a last one. This is checked by negated_cond_vop2_redef_vcc2 test, which miscompiles w/o this check.

arsenm accepted this revision.Dec 12 2018, 6:36 PM

LGTM. I think this is OK, but I think there still might be a more proper way to use the live intervals

This revision is now accepted and ready to land.Dec 12 2018, 6:36 PM
This revision was automatically updated to reflect the committed changes.