This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
lib/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
-
InstCombineCasts.cpp
-
test/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
-
cast.ll

Differential D33338

[InstCombineCasts] Take in account final size when transforming sext->lshr->trunc patterns
ClosedPublic

Authored by davide on May 18 2017, 2:20 PM.

Download Raw Diff

Details

Reviewers

spatel

Commits

rG21a49dcdf19e: [InstCombine] Take in account the size in sext->lshr->trunc patterns.
rL303513: [InstCombine] Take in account the size in sext->lshr->trunc patterns.

Summary

Counter example where this miscompiles:

define i8 @tinky() {

%sext = sext i1 1 to i16
%hibit = lshr i16 %sext, 15
%tr = trunc i16 %hibit to i8
ret i8 %tr

}

define i8 @winky() {

%sext = sext i1 1 to i8
ret i8 %sext

}

which gets folded to:

define i8 @tinky() {

ret i8 1

}

define i8 @winky() {

ret i8 -1

}

Alive confirms: http://rise4fun.com/Alive/plX

Diff Detail

Repository: rL LLVM

Event Timeline

This also refactors the codepath for clarity (as suggested by Sanjay).

jacobly added a subscriber: jacobly.May 18 2017, 2:29 PM

jacobly added inline comments.May 18 2017, 7:01 PM

lib/Transforms/InstCombine/InstCombineCasts.cpp
608 ↗	(On Diff #99493)	This needs to be checked in every case, so maybe it should be factored out.
611 ↗	(On Diff #99493)	This code also handles `CISize < ASize` correctly: http://rise4fun.com/Alive/eSU. There was no need to check `ShiftAmt <= SExtSize - ASize` in the first place since we only care about zero bits being shifted into the final type.

I think it's good to solve the miscompile case ASAP, but fundamentally, this block of code is error-prone and incomplete because we're missing earlier folds that would have kept us out of trouble from trying to reason about a 3 instruction sequence (trunc(lshr (sext A), C).

For example, the modified PR33078 pattern in the test here could be canonicalized like this

Name: smear_hibit
%sext = sext i3 %x to i16
%r = lshr i16 %sext, 13
  =>
%s = ashr i3 %x, 2
%r = zext i3 %s to i16

Ie, we should lift the shift because narrower ops are easier for value tracking. We might also consider it an optimization / strength-reduction to replace a wide shift with a narrow shift.

If we had this pattern, the zext followed by trunc would get optimized away cleanly using an existing simple fold that's easy to reason about.

In the actual PR33078 problem description case, there is no question that this is a missed fold:

Name: bool_hibit
%sext = sext i1 %x to i16
%hibit = lshr i16 %sext, 15
  =>
%hibit = zext i1 %x to i16

This should again become a zext/trunc pair for PR33078, and we would never reach the buggy block of code.
I would prefer to solve the bug by fixing the most basic patterns. By doing that, we should be able to simplify - or hopefully remove entirely - this block of code that is looking for longer patterns.

So what youre suggesting here is to canonicalize and remove this Xform entirely?

In D33338#759474, @davide wrote:

So what youre suggesting here is to canonicalize and remove this Xform entirely?

Yes - but I acknowledge this may be a longer path than the quick fix of just adding extra predicates to this transform directly to avoid the known bug.

If you think it's better to have the quick fix to avoid the known miscompile, I would support that solution with a FIXME note about trying to get rid of this transform.

Here are some tests for the smaller patterns that we might want to match:

; FIXME: The bool bit got smeared across a wide val, but then we zero'd out those bits. This is just a zext.

define i16 @bool_zext(i1 %x) {
  %sext = sext i1 %x to i16
  %hibit = lshr i16 %sext, 15
  ret i16 %hibit
}

define <2 x i8> @bool_zext_splat(<2 x i1> %x) {
  %sext = sext <2 x i1> %x to <2 x i8>
  %hibit = lshr <2 x i8> %sext, <i8 7, i8 7>
  ret <2 x i8> %hibit
}

; FIXME: We could use a narrow arithmetic shift first and then zext.

define i16 @smear_sign_and_widen(i4 %x) {
  %sext = sext i4 %x to i16
  %hibit = lshr i16 %sext, 12
  ret i16 %hibit
}

define <2 x i8> @smear_sign_and_widen_splat(<2 x i6> %x) {
  %sext = sext <2 x i6> %x to <2 x i8>
  %hibit = lshr <2 x i8> %sext, <i8 2, i8 2>
  ret <2 x i8> %hibit
}

; FIXME: All of the replicated sign bits are wiped out by the lshr. This could be lshr+zext.

define i16 @fake_sext(i3 %x) {
  %sext = sext i3 %x to i16
  %sh = lshr i16 %sext, 15
  ret i16 %sh
}

define <2 x i8> @fake_sext_splat(<2 x i3> %x) {
  %sext = sext <2 x i3> %x to <2 x i8>
  %sh = lshr <2 x i8> %sext, <i8 7, i8 7>
  ret <2 x i8> %sh
}

In D33338#759486, @spatel wrote:

Here are some tests for the smaller patterns that we might want to match:

; FIXME: The bool bit got smeared across a wide val, but then we zero'd out those bits. This is just a zext.

define i16 @bool_zext(i1 %x) {
  %sext = sext i1 %x to i16
  %hibit = lshr i16 %sext, 15
  ret i16 %hibit
}

define <2 x i8> @bool_zext_splat(<2 x i1> %x) {
  %sext = sext <2 x i1> %x to <2 x i8>
  %hibit = lshr <2 x i8> %sext, <i8 7, i8 7>
  ret <2 x i8> %hibit
}

; FIXME: We could use a narrow arithmetic shift first and then zext.

define i16 @smear_sign_and_widen(i4 %x) {
  %sext = sext i4 %x to i16
  %hibit = lshr i16 %sext, 12
  ret i16 %hibit
}

define <2 x i8> @smear_sign_and_widen_splat(<2 x i6> %x) {
  %sext = sext <2 x i6> %x to <2 x i8>
  %hibit = lshr <2 x i8> %sext, <i8 2, i8 2>
  ret <2 x i8> %hibit
}

; FIXME: All of the replicated sign bits are wiped out by the lshr. This could be lshr+zext.

define i16 @fake_sext(i3 %x) {
  %sext = sext i3 %x to i16
  %sh = lshr i16 %sext, 15
  ret i16 %sh
}

define <2 x i8> @fake_sext_splat(<2 x i3> %x) {
  %sext = sext <2 x i3> %x to <2 x i8>
  %sh = lshr <2 x i8> %sext, <i8 7, i8 7>
  ret <2 x i8> %sh
}

Feel free to commit these tests, I think they're good.
I'm worried about fixing the miscompile. I'll commit some tests where we get this wrong, then upload a new version of the patch so that you can take a look.

Updated after Sanjay's/Jacob's feedback. Thanks for your review!

Correct version of the patch & clang-formatted.

This is strictly safer (does the transform less often), so LGTM. Bonus: we're not creating an unnecessary cast in the case where the result size is the same as the source op.

@davide, let me know if you plan to work on the (lshr (sext X), C) patterns. I checked in the tests with rL303504.

This revision is now accepted and ready to land.May 21 2017, 8:22 AM

In D33338#760426, @spatel wrote:

This is strictly safer (does the transform less often), so LGTM. Bonus: we're not creating an unnecessary cast in the case where the result size is the same as the source op.

@davide, let me know if you plan to work on the (lshr (sext X), C) patterns. I checked in the tests with rL303504.

I do plan to work on them, but if you have some bandwidth, any help would be appreciated :)
Should we open a bug so that we don't forget?

Closed by commit rL303513: [InstCombine] Take in account the size in sext->lshr->trunc patterns. (authored by davide). · Explain WhyMay 21 2017, 1:30 PM

This revision was automatically updated to reflect the committed changes.

In D33338#760470, @davide wrote:

In D33338#760426, @spatel wrote:

This is strictly safer (does the transform less often), so LGTM. Bonus: we're not creating an unnecessary cast in the case where the result size is the same as the source op.

@davide, let me know if you plan to work on the (lshr (sext X), C) patterns. I checked in the tests with rL303504.

I do plan to work on them, but if you have some bandwidth, any help would be appreciated :)
Should we open a bug so that we don't forget?

Bugs reports are always good for me, but since this one is in mental cache right now, I'd rather just squash it now. :)

The interesting thing about this case is that it feeds into the larger current discussion about how to split up InstCombine to make the whole thing less obnoxious.

Take these two cases:

Name: shift_less
%sext = sext i3 %x to i8
%hibit = lshr i8 %sext, 7
  =>
%tr = lshr i3 %x, 2
%hibit = zext i3 %tr to i8

Name: bool_hibit
%sext = sext i1 %x to i8
%hibit = lshr i8 %sext, 7
  =>
%hibit = zext i1 %x to i8

If we think of InstCombine as a pure canonicalizer, we might make a lshr rule like (semi-pseudo-code):

// Are we moving the sign bit to the low bit and widening a value with high zeros?
// lshr (sext X), C --> zext (lshr X, C2)
if (match(Op0, m_SExt(m_Value(X))) && 
    match(Op1, m_APInt(C)) && 
    C->getZextValue() == Op0->getScalarSizeInBits() - 1) {
   return createZext(createLShr(X, X->getScalarSizeInBits() - 1));
}

In other words, we would not special-case the i1 (bool) example and eliminate this no-op before it ever existed:
%tr = lshr i3 %x, 0
...we'd leave that to the optimizer pass because eliminating that instruction is an optimization, not a canonicalization.

Now, we might just say this is a bug in the IRBuilder - why doesn't it check for a zero shift constant before creating an obviously unnecessary instruction? On the other hand, that's not its job. It shouldn't think at all about that special case; that's the caller's responsibility.

We'll find many examples like this if we start trying to create a pure canonicalizer. If I interpreted the statement correctly, @dberlin mentioned this potential inefficiency in:
http://lists.llvm.org/pipermail/llvm-dev/2017-May/113220.html

spatel mentioned this in rL303860: [InstCombine] make icmp-mul fold more efficient.May 25 2017, 7:14 AM

spatel mentioned this in D33879: [InstCombine] fold lshr (sext X), C1 --> zext (lshr X, C2).Jun 4 2017, 8:41 AM

spatel mentioned this in rL304939: [InstCombine] fold lshr (sext X), C1 --> zext (lshr X, C2).Jun 7 2017, 1:32 PM

Revision Contents

Path

Size

llvm/

trunk/

lib/

Transforms/

InstCombine/

InstCombineCasts.cpp

19 lines

test/

Transforms/

InstCombine/

cast.ll

5 lines

Diff 99702

llvm/trunk/lib/Transforms/InstCombine/InstCombineCasts.cpp

Show First 20 Lines • Show All 579 Lines • ▼ Show 20 Lines	if (Src->hasOneUse() &&
// Since we're doing an lshr and a zero extend, and know that the shift		// Since we're doing an lshr and a zero extend, and know that the shift
// amount is smaller than ASize, it is always safe to do the shift in A's		// amount is smaller than ASize, it is always safe to do the shift in A's
// type, then zero extend or truncate to the result.		// type, then zero extend or truncate to the result.
Value *Shift = Builder->CreateLShr(A, Cst->getZExtValue());		Value *Shift = Builder->CreateLShr(A, Cst->getZExtValue());
Shift->takeName(Src);		Shift->takeName(Src);
return CastInst::CreateIntegerCast(Shift, DestTy, false);		return CastInst::CreateIntegerCast(Shift, DestTy, false);
}		}

		// FIXME: We should canonicalize to zext/trunc and remove this transform.
// Transform trunc(lshr (sext A), Cst) to ashr A, Cst to eliminate type		// Transform trunc(lshr (sext A), Cst) to ashr A, Cst to eliminate type
// conversion.		// conversion.
// It works because bits coming from sign extension have the same value as		// It works because bits coming from sign extension have the same value as
// the sign bit of the original value; performing ashr instead of lshr		// the sign bit of the original value; performing ashr instead of lshr
// generates bits of the same value as the sign bit.		// generates bits of the same value as the sign bit.
if (Src->hasOneUse() &&		if (Src->hasOneUse() &&
match(Src, m_LShr(m_SExt(m_Value(A)), m_ConstantInt(Cst)))) {		match(Src, m_LShr(m_SExt(m_Value(A)), m_ConstantInt(Cst)))) {
Value *SExt = cast<Instruction>(Src)->getOperand(0);		Value *SExt = cast<Instruction>(Src)->getOperand(0);
const unsigned SExtSize = SExt->getType()->getPrimitiveSizeInBits();		const unsigned SExtSize = SExt->getType()->getPrimitiveSizeInBits();
const unsigned ASize = A->getType()->getPrimitiveSizeInBits();		const unsigned ASize = A->getType()->getPrimitiveSizeInBits();
		const unsigned CISize = CI.getType()->getPrimitiveSizeInBits();
		const unsigned MaxAmt = SExtSize - std::max(CISize, ASize);
unsigned ShiftAmt = Cst->getZExtValue();		unsigned ShiftAmt = Cst->getZExtValue();

// This optimization can be only performed when zero bits generated by		// This optimization can be only performed when zero bits generated by
// the original lshr aren't pulled into the value after truncation, so we		// the original lshr aren't pulled into the value after truncation, so we
// can only shift by values no larger than the number of extension bits.		// can only shift by values no larger than the number of extension bits.
// FIXME: Instead of bailing when the shift is too large, use and to clear		// FIXME: Instead of bailing when the shift is too large, use and to clear
// the extra bits.		// the extra bits.
if (SExt->hasOneUse() && ShiftAmt <= SExtSize - ASize) {		if (ShiftAmt <= MaxAmt) {
// If shifting by the size of the original value in bits or more, it is		if (CISize == ASize)
// being filled with the sign bit, so shift by ASize-1 to avoid ub.		return BinaryOperator::CreateAShr(A, ConstantInt::get(CI.getType(),
		std::min(ShiftAmt, ASize - 1)));
		if (SExt->hasOneUse()) {
Value *Shift = Builder->CreateAShr(A, std::min(ShiftAmt, ASize-1));		Value *Shift = Builder->CreateAShr(A, std::min(ShiftAmt, ASize-1));
Shift->takeName(Src);		Shift->takeName(Src);
return CastInst::CreateIntegerCast(Shift, CI.getType(), true);		return CastInst::CreateIntegerCast(Shift, CI.getType(), true);
}		}
}		}
		}

if (Instruction *I = shrinkBitwiseLogic(CI))		if (Instruction *I = shrinkBitwiseLogic(CI))
return I;		return I;

if (Instruction I = shrinkSplatShuffle(CI, Builder))		if (Instruction I = shrinkSplatShuffle(CI, Builder))
return I;		return I;

if (Instruction I = shrinkInsertElt(CI, Builder))		if (Instruction I = shrinkInsertElt(CI, Builder))
▲ Show 20 Lines • Show All 1,610 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/InstCombine/cast.ll

Show First 20 Lines • Show All 1,507 Lines • ▼ Show 20 Lines	;
%C = lshr i16 %B, 12		%C = lshr i16 %B, 12
%D = trunc i16 %C to i4		%D = trunc i16 %C to i4
ret i4 %D		ret i4 %D
}		}

define i8 @pr33078_4(i3 %x) {		define i8 @pr33078_4(i3 %x) {
; Don't turn this in an `ashr`. This was getting miscompiled		; Don't turn this in an `ashr`. This was getting miscompiled
; CHECK-LABEL: @pr33078_4(		; CHECK-LABEL: @pr33078_4(
; CHECK-NEXT: [[C:%.*]] = ashr i3 %x, 2		; CHECK-NEXT: [[B:%.*]] = sext i3 %x to i16
; CHECK-NEXT: [[B:%.*]] = sext i3 [[C]] to i8		; CHECK-NEXT: [[C:%.*]] = lshr i16 [[B]], 13
		; CHECK-NEXT: [[D:%.*]] = trunc i16 [[C]] to i8
; CHECK-NEXT: ret i8 [[D]]		; CHECK-NEXT: ret i8 [[D]]
%B = sext i3 %x to i16		%B = sext i3 %x to i16
%C = lshr i16 %B, 13		%C = lshr i16 %B, 13
%D = trunc i16 %C to i8		%D = trunc i16 %C to i8
ret i8 %D		ret i8 %D
}		}