This is an archive of the discontinued LLVM Phabricator instance.

CodeGenPrepare: Avoid and/or i1 in select conditions.
ClosedPublic

Authored by MatzeB on Feb 3 2015, 8:13 PM.

Details

Summary

This transforms:
select((C0 & C1), a, b) -> select(C0, select(C1, a, b), b)
select((C0 | C1), a, b) -> select(C0, a, select(C1, a, b))

The result is better on most targets as C0 and C1 do not need to be
materialized into an integer register but can stay flags.

Diff Detail

Repository
rL LLVM

Event Timeline

MatzeB updated this revision to Diff 19299.Feb 3 2015, 8:13 PM
MatzeB retitled this revision from to InstCombine: Transform select with and/or cond into select sequence.
MatzeB added a reviewer: hfinkel.
MatzeB updated this object.
MatzeB added a subscriber: Unknown Object (MLST).
hfinkel edited edge metadata.Feb 3 2015, 8:59 PM

In my early tests the transformation was benficial on arm, aarch64, x86 but of course not on powerpc. Either the powerpc target should normalize into the other direction or we add another TargetTransformInfo flag that tells InstCombine which is the preferred form. I am not sure which to pick.

This is fine for PPC too. LLVM's PPC backend stores boolean values in CR bit registers, and on modern cores with isel, this transformation seems fairly neutral. Your first example goes from:

.L.test_select_and:
cmpw 0, 3, 6
cmpw 1, 3, 7
crandc 20, 4, 0
isel 3, 4, 5, 20
blr

to:

.L.test_select_andr:
cmpw 0, 3, 7
cmpw 7, 3, 6
isel 4, 4, 5, 0
isel 3, 5, 4, 28
blr

On a P7/P8, this seems pretty neutral (both the crandc and the isel are cracked). On the A2, you've increased the critical path length by 1 cycle (isel takes two cycles, but crandc only takes one). Nevertheless, I think that I can adjust for that during isel fairly easily.

The real question, as far as InstCombine goes, is does this make the most-useful canonical form? I'd tend to think that the answer is no, because the logical operators seem more likely to be amenable to further simplification than chains of selects. I could certainly be wrong, however. Also, not all cores have conditional moves (although many now do). Another option is to put this in DAGCombine (or CodeGenPrep), and there you might as well make it target customizable (I'd likely disable it for the PPC A2).

MatzeB updated this revision to Diff 19442.Feb 5 2015, 4:05 PM
MatzeB retitled this revision from InstCombine: Transform select with and/or cond into select sequence to CodeGenPrepare: Avoid and/or i1 in select conditions..
MatzeB updated this object.
MatzeB added a reviewer: arsenm.
MatzeB set the repository for this revision to rL LLVM.

New Version: This expects the middleend to have selects normalized to selects with and/or conditions (see http://reviews.llvm.org/D7450). The Code to break it into a sequence of multiple selects is now in the CodeGenPrepare pass and relies on a TargetLowering hook to test whether the transformation is useful.

This revision was automatically updated to reflect the committed changes.

On a side note: Ahmed just mentioned a snipped from SelectionDAGBuilder to me that has pretty much the same logic I do for selects implemented for branches:

// If this is a series of conditions that are or'd or and'd together, emit

// this as a sequence of branches instead of setcc's with and/or operations.
// As long as jumps are not expensive, this should improve performance.
// For example, instead of something like:
//     cmp A, B
//     C = seteq
//     cmp D, E
//     F = setle
//     or C, F
//     jnz foo
// Emit:
//     cmp A, B
//     je foo
//     cmp D, E
//     jle foo
//

...

no idea why that is in SelectionDAGBuilder, it also doesn't seem strictly beneficial to all targets to me...

On a side note: Ahmed just mentioned a snipped from SelectionDAGBuilder to me that has pretty much the same logic I do for selects implemented for branches:

// If this is a series of conditions that are or'd or and'd together, emit

// this as a sequence of branches instead of setcc's with and/or operations.
// As long as jumps are not expensive, this should improve performance.
// For example, instead of something like:
//     cmp A, B
//     C = seteq
//     cmp D, E
//     F = setle
//     or C, F
//     jnz foo
// Emit:
//     cmp A, B
//     je foo
//     cmp D, E
//     jle foo
//

...

no idea why that is in SelectionDAGBuilder, it also doesn't seem strictly beneficial to all targets to me...

It's not (and it was pointed out to me in a different thread a couple of months ago); which function does this again? I keep forgetting to fix this... (my current plan is to turn it off when TLI->hasMultipleConditionRegisters() is true, which is only the case on PPC AFAIK).

SelectionDAGBuilder::visitBr

hfinkel added inline comments.Feb 12 2015, 9:53 PM
llvm/trunk/lib/Transforms/InstCombine/InstCombineSelect.cpp
1195 ↗(On Diff #19492)

I think that you want to explicitly add And to the worklist here (assuming that the builder has not constant-folded it).

if (Instruction *AndI = dyn_cast<Instruction>(And))
  Worklist.Add(AndI);

otherwise we might not actually revisit it until the next main outer instcombine iteration.