This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] try to fold (select C, (sext A), B) into logical ops
ClosedPublic

Authored by nhaehnle on Jul 25 2016, 3:30 AM.

Details

Summary

Turn (select C, (sext A), B) into (sext (select C, A, B')) when A is i1 and
B is a compatible constant, also for zext instead of sext. This will then be
further folded into logical operations.

The transformation would be valid for non-i1 types as well, but other parts of
InstCombine prefer to have sext from non-i1 as an operand of select.

Motivated by the shader compiler frontend in Mesa for AMDGPU, which emits i32
for boolean operations. With this change, the boolean logic is fully
recovered.

Diff Detail

Repository
rL LLVM

Event Timeline

nhaehnle updated this revision to Diff 65321.Jul 25 2016, 3:30 AM
nhaehnle retitled this revision from to [InstCombine] try to fold (select C, (sext A), B) into logical ops.
nhaehnle updated this object.
nhaehnle added a subscriber: llvm-commits.
arsenm added a subscriber: arsenm.Jul 25 2016, 1:19 PM
arsenm added inline comments.
lib/Transforms/InstCombine/InstCombineSelect.cpp
918–921 ↗(On Diff #65321)

I would swap the parameter order so that the Builder is first and all bools are at the end

919 ↗(On Diff #65321)

Make C a ConstantInt and dyn_cast at the call site?

938–941 ↗(On Diff #65321)

No return after else

nhaehnle updated this revision to Diff 65482.Jul 26 2016, 2:29 AM
nhaehnle marked 3 inline comments as done.

Implement review suggestions.

spatel added inline comments.Aug 2 2016, 10:23 AM
test/Transforms/InstCombine/select-bitext.ll
1 ↗(On Diff #65482)

Please:

  1. Remove all CHECK lines from this file.
  2. Auto-generate all CHECK lines using utils/update_test_checks.py against trunk.
  3. Commit the test file to trunk to show the current behavior.
  4. Apply your patch locally.
  5. Regenerate the CHECK lines again using the script.
  6. Update this patch.

This will allow us to see the exact diffs caused by this patch.

nhaehnle updated this revision to Diff 66659.Aug 3 2016, 6:46 AM
nhaehnle marked an inline comment as done.

I have just committed the tests baseline to trunk, this is the updated
patch.

spatel edited edge metadata.Aug 3 2016, 8:04 AM

I have just committed the tests baseline to trunk, this is the updated
patch.

Thanks!

Sorry I didn't notice this earlier, but can you reduce the first 8 tests by making them have i1 parameters instead of having icmp instructions in the tests?

Is there some reason we shouldn't do this transform for vector selects with splatted constants? If that's possible, it could actually simplify the patch by having foldSelectExtConst() take an APInt rather than a ConstantInt parameter with the caller code looking something like this:

if (TI && (TI->getOpcode() == Instruction::ZExt || TI->getOpcode() == Instruction::SExt)) {
  const APInt *C;
  if (match(FalseVal, m_APInt(C))) {
    bool IsSext = TI->getOpcode() == Instruction::SExt;
    if (auto *Res = foldSelectExtConst(*Builder, SI, TI, C, true, IsSext))
      return Res;
  }
}
if (FI... )
nhaehnle updated this revision to Diff 66690.Aug 3 2016, 12:21 PM
nhaehnle edited edge metadata.

Simplified the test cases (baseline is already in SVN), and use the APInt
hint, thanks!

About the vector case: As you can see, it requires a change to
FoldOpIntoSelect (to avoid an infinite loop) and a change to one other test
case. Now the change looks correct, but I don't know if there may be
unintended optimization regressions in some backend...

Reverting the behavior for vectors is easy enough, though, just remove the
->getScalarType() in foldSelectExtConst and FoldOpIntoSelect.

spatel added a comment.Aug 3 2016, 1:19 PM

Simplified the test cases (baseline is already in SVN), and use the APInt
hint, thanks!

About the vector case: As you can see, it requires a change to
FoldOpIntoSelect (to avoid an infinite loop) and a change to one other test
case. Now the change looks correct, but I don't know if there may be
unintended optimization regressions in some backend...

Can you add a test case like:

define <2 x i32> @scalar_select_of_vectors(<2 x i1> %cca, i1 %ccb) {
  %ccax = zext <2 x i1> %cca to <2 x i32>
  %r = select i1 %ccb, <2 x i32> %ccax, <2 x i32> <i32 0, i32 0>   ; scalar condition
  ret <2 x i32> %r
}

Also, I'm curious why this wouldn't be good for non-i1 types. Did you see a case where it caused a problem? Particularly in the case of vectors, I think we should be performing ops in smaller types as much as possible before extending to a wider type (ref: https://llvm.org/bugs/show_bug.cgi?id=28160 ). Add a 'TODO' comment about that?

nhaehnle updated this revision to Diff 66772.Aug 4 2016, 1:53 AM

That type of test was actually already there, but I added a copy with zext
instead of sext for good measure.

I added a TODO about handling larger types as well.

To clarify, I didn't see a case where extending to non-i1 types caused a problem (other than the necessary FoldOpIntoSelect change), but I also didn't see cases where it helped, and in any case I was only looking at the AMDGPU assembly results. I expect that the more common CPU targets could be sensitive to a change there, that's why I kept it conservative.

spatel accepted this revision.Aug 4 2016, 5:41 AM
spatel edited edge metadata.

LGTM. Thanks!

This revision is now accepted and ready to land.Aug 4 2016, 5:41 AM
This revision was automatically updated to reflect the committed changes.