Page MenuHomePhabricator

[InstCombine] Teach foldSelectICmpAnd to recognize a (icmp slt trunc X, 0) and (icmp sgt trunc X, -1) as equivalent to an and with the sign bit of the truncated type
ClosedPublic

Authored by craig.topper on Aug 8 2017, 6:02 PM.

Details

Summary

This is similar to what was already done in foldSelectICmpAndOr. Ultimately I'd like to see if we can call foldSelectICmpAnd from foldSelectIntoOp if we detect a power of 2 constant. This would allow us to remove foldSelectICmpAndOr entirely.

As I'm writing this I wonder if we're also missing (icmp slt X, 0) and (icmp sgt X, -1) without any truncates.

The vector tests added should already work without this change since we don't turn compares with ands into truncate for vectors.

Diff Detail

Repository
rL LLVM

Event Timeline

craig.topper created this revision.Aug 8 2017, 6:02 PM
spatel edited edge metadata.Aug 14 2017, 10:14 AM

I've been operating under the assumption that we want to transform in the opposite direction in IR. Ie, preserve and probably create more select-of-constants (see https://reviews.llvm.org/D24480 which I am planning to abandon). We should be able to reduce selects to logic/math in the DAG where it makes sense (eg, https://reviews.llvm.org/rL310717 ). Some reasons to prefer select were noted in:
http://lists.llvm.org/pipermail/llvm-dev/2016-September/105335.html

Not sure if it was listed there, but another reason we might actually want to keep a cmp/select in IR is in the case where we have profile info / branch stats. If we know in IR that a cmp always goes one direction, then the target might prefer to turn the select into control flow instead of turning it into logic ops. There's some infrastructure for this in CGP already. See also: https://reviews.llvm.org/D36081

This patch is really just making InstCombine self consistent. We currently optimize this case differently depending on whether i8 is legal in datalayout.

define i32 @test71(i32 %x) {
; CHECK-LABEL: @test71(
; CHECK-NEXT: [[TMP1:%.*]] = lshr i32 [[X:%.*]], 6
; CHECK-NEXT: [[TMP2:%.*]] = and i32 [[TMP1]], 2
; CHECK-NEXT: [[TMP3:%.*]] = or i32 [[TMP2]], 40
; CHECK-NEXT: ret i32 [[TMP3]]
;

%1 = and i32 %x, 128
%2 = icmp eq i32 %1, 0
%3 = select i1 %2, i32 40, i32 42
ret i32 %3

}

If we want to remove foldSelectICmpAnd that's a different question.

This patch is really just making InstCombine self consistent. We currently optimize this case differently depending on whether i8 is legal in datalayout.

define i32 @test71(i32 %x) {
; CHECK-LABEL: @test71(
; CHECK-NEXT: [[TMP1:%.*]] = lshr i32 [[X:%.*]], 6
; CHECK-NEXT: [[TMP2:%.*]] = and i32 [[TMP1]], 2
; CHECK-NEXT: [[TMP3:%.*]] = or i32 [[TMP2]], 40
; CHECK-NEXT: ret i32 [[TMP3]]
;

%1 = and i32 %x, 128
%2 = icmp eq i32 %1, 0
%3 = select i1 %2, i32 40, i32 42
ret i32 %3

}

If we want to remove foldSelectICmpAnd that's a different question.

Ah, I didn't recognize what was going on. This is a sibling to D22537. Can you include a test that has a trunc in it from the start, so we are not dependent on the other combine? A code comment to show the complete transform would also make it a bit clearer for me.

FWIW, test71 is converted to math in the x86 backend for all 3 possibilities, but this doesn't happen for AArch64 or PPC where it's also likely a win. And for x86, it's different asm in all 3 cases:

With mask+cmp+sel:

%1 = and i32 %x, 128
%2 = icmp eq i32 %1, 0
%3 = select i1 %2, i32 40, i32 42
ret i32 %3
-->
andl	$128, %edi
shrl	$6, %edi
leal	40(%rdi), %eax

With shift+mask+or:

%1 = lshr i32 %x, 6
%2 = and i32 %1, 2
%3 = or i32 %2, 40
-->
shrl	$6, %edi
andl	$2, %edi
leal	40(%rdi), %eax

With trunc+cmp+sel

%1 = trunc i32 %x to i8
%2 = icmp sgt i8 %1, -1
%3 = select i1 %2, i32 40, i32 42
-->
xorl	%eax, %eax
testb	%dil, %dil
sets	%al
leal	40(%rax,%rax), %eax

Rewritten to use decomposeBitTestICmp.

The use of decomposeBitTestICmp removes the truncate check I had in the old patch. It's not really needed. We just care about the mask. This exposed that I almost broke another transform in here that used ashr to generate constants for selects. Reordered the combines a little and added a TODO to try to merge these combines. We may want to look at turning the lshr, and, or sequence into the ashr pattern independently too.

Looks like this also exposed that canEvaluateZExtd can't handle vector shifts correctly. So that prevented the zext and trunc from being folded in the vector tests. I'll submit a follow up patch for that.

spatel added inline comments.Aug 15 2017, 2:22 PM
lib/Transforms/InstCombine/InstCombineSelect.cpp
631 ↗(On Diff #111212)

'IC->getPredicate()' can be shortened to 'Pred'?

634 ↗(On Diff #111212)

IsEqualZero is not initialized on this path.

test/Transforms/InstCombine/select-with-bitwise-ops.ll
496–498 ↗(On Diff #111212)

This is miscompiling:
http://rise4fun.com/Alive/ACO

Should be the same ops as the scalar test?

craig.topper added inline comments.Aug 15 2017, 2:31 PM
test/Transforms/InstCombine/select-with-bitwise-ops.ll
496–498 ↗(On Diff #111212)

Yeah it should, but it isn't. See the child revision of this D36763.

craig.topper added inline comments.Aug 15 2017, 2:49 PM
test/Transforms/InstCombine/select-with-bitwise-ops.ll
496–498 ↗(On Diff #111212)

h oops. I thought you were asking about the other vector tests, not this one. I believe the scalar matching code uses CosntantInt. I can submit another patch to fix it too.

Use Pred more consistently through. Remove EqualAnd and just use Pred directly later.

The comment about multi-uses of the compare in D36711 made me wonder what we're doing here. Should we check one-use before trying to transform?

define i32 @test71_multi_use_cmp(i32 %x) {
  %t1 = and i32 %x, 128
  %t2 = icmp ne i32 %t1, 0
  %t3 = select i1 %t2, i32 40, i32 42
  %t4 = select i1 %t2, i32 60, i32 82
  %add = add i32 %t3, %t4
  ret i32 %add
}

$ ./opt -instcombine test71.ll -S

define i32 @test71_multi_use_cmp(i32 %x) {
  %t1 = and i32 %x, 128
  %t2 = icmp eq i32 %t1, 0
  %1 = lshr exact i32 %t1, 6
  %2 = xor i32 %1, 42
  %t4 = select i1 %t2, i32 82, i32 60
  %add = add nuw nsw i32 %2, %t4
  ret i32 %add
}

Yeah we probably need to check fpr one use, but I'd like to do that separately from this.

I want to reimplement foldSelectICmpAndOr by calling foldSelectICmpAnd from foldSelectIntoOp if the immediate in foldSelectIntoOp is a power of 2. This will allow the transform in foldSelectICmpAndOr to support Xor as well because there was no reason to restrict it to just Or. And of course it will remove some very similar duplicated code. To do that I'll need to revisit all the one use enhancements that were already added to foldSelectICmpAndOr anyway.

Rebase after fixing the canEvaluateTruncated/ZExtd issues with vectors.

Fix mad comment placement

spatel added inline comments.Aug 16 2017, 1:26 PM
test/Transforms/InstCombine/select-with-bitwise-ops.ll
425–431 ↗(On Diff #111277)

Sorry if I missed it, but do you know why canEvaluateZExtd() failed on this?
If you have a fix, I'd rather see that go in first because this doesn't look better in IR or x86 codegen.

Rebasing this on top of D36781 which fixes the ashr transform for vectors.

@spatel is this ok now? I'd like to work on replacing the foldSelectICmpAndOr implementation using this.

spatel added inline comments.Aug 18 2017, 3:32 PM
test/Transforms/InstCombine/select-with-bitwise-ops.ll
425–431 ↗(On Diff #111277)

This question came in close proximity to the last update, so you might not have seen it.

I believe the issue is that this needs to handle And slightly differently. I think if MaskedValueIsZero returns true for the And, we should reset the BitsToClear to 0 before returning true.

// If the operation is an AND/OR/XOR and the bits to clear are zero in the
// other side, BitsToClear is ok.
if (Tmp == 0 && I->isBitwiseLogicOp()) {
  // We use MaskedValueIsZero here for generality, but the case we care
  // about the most is constant RHS.
  unsigned VSize = V->getType()->getScalarSizeInBits();
  if (IC.MaskedValueIsZero(I->getOperand(1),
                           APInt::getHighBitsSet(VSize, BitsToClear),
                           0, CxtI))
    return true;
}

I believe the issue is that this needs to handle And slightly differently. I think if MaskedValueIsZero returns true for the And, we should reset the BitsToClear to 0 before returning true.

// If the operation is an AND/OR/XOR and the bits to clear are zero in the
// other side, BitsToClear is ok.
if (Tmp == 0 && I->isBitwiseLogicOp()) {
  // We use MaskedValueIsZero here for generality, but the case we care
  // about the most is constant RHS.
  unsigned VSize = V->getType()->getScalarSizeInBits();
  if (IC.MaskedValueIsZero(I->getOperand(1),
                           APInt::getHighBitsSet(VSize, BitsToClear),
                           0, CxtI))
    return true;
}

Looks like that problem will be fixed in D36944. I think that will make this one good, but please update after that.

I'm still trying to get the backend prepared for IR to go in the other direction with patches like D36840.

Update after r311343

spatel accepted this revision.Aug 21 2017, 10:47 AM

LGTM.

This revision is now accepted and ready to land.Aug 21 2017, 10:47 AM
This revision was automatically updated to reflect the committed changes.