This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
1/8
InstCombineCasts.cpp
-
InstCombineInternal.h
-
test/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
-
rotate.ll

Differential D36395

[InstCombine] narrow rotate left/right patterns to eliminate zext/trunc (PR34046)
ClosedPublic

Authored by spatel on Aug 7 2017, 7:43 AM.

Download Raw Diff

Details

Reviewers

craig.topper
efriedma
majnemer

Commits

rGc50e55d0e63d: [InstCombine] narrow rotate left/right patterns to eliminate zext/trunc…
rL310509: [InstCombine] narrow rotate left/right patterns to eliminate zext/trunc…

Summary

I couldn't find any smaller folds to help the cases in:
https://bugs.llvm.org/show_bug.cgi?id=34046
after:
rL310141

The truncated rotate-by-variable patterns elude all of the existing transforms because of multiple uses and knowledge about demanded bits and knownbits that doesn't exist without the whole pattern. So we need an unfortunately large pattern match. But by simplifying this pattern in IR, the backend is already able to generate rolb/rolw/rorb/rorw for x86 using its existing rotate matching logic. Note that rotate-by-constant doesn't have this problem - smaller folds should already produce the narrow IR ops.

For the motivating cases from the bug report, in addition to using narrow ops, we have a net win of two less instructions (kill 3 zext/trunc but add a mask op). I initially forgot that we need that mask, but Alive confirms that it would be wrong to leave the mask of the opposite shift amount off:
http://rise4fun.com/Alive/GSy
http://rise4fun.com/Alive/hif

Diff Detail

Event Timeline

spatel created this revision.Aug 7 2017, 7:43 AM

Herald added a subscriber: mcrosier. · View Herald TranscriptAug 7 2017, 7:43 AM

spatel retitled this revision from [InstCombine] narrow rotate left/patterns to eliminate zext/trunc (PR34046) to [InstCombine] narrow rotate left/right patterns to eliminate zext/trunc (PR34046).Aug 7 2017, 7:57 AM

I spent a bit of time playing with the generated for for various implementations of rotate for 16-bit values. It looks like this transform actually makes things worse; e.g. on ARM the transformed version compiles to one more instruction. Granted, that's probably not important; I think the optimal lowering for a 16-bit rotate on ARM actually involves lowering it to a 32-bit rotate and fixing up the result, and that's probably a transform which doesn't make sense until legalization.

Also, on x86, it looks like we're missing an important pattern here: we generate the sequence andb $15, %cl; rolw %cl, %ax.

lib/Transforms/InstCombine/InstCombineCasts.cpp
468	Would it make sense to use known bits here rather than explicitly checking for zext? This matters because we tend to transform `zext(trunc(x)) -> x & c`.
492	It seems like generating the mask is worth doing here, if we think it'll be eliminated by later lowering.

I think we miss the and x, 15 because we use the same pattern we do for shifts where we look for 5-bit masking even in 16-bit mode because hardware doesn't ignore bit 4. For shift the implication of that is obvious, if bit 4 is set then the shift will fill with 0s on 16-bit shift. For rotate, I think it just means we'd rotate around a second time and still get the correct result, but I'm not sure.

In D36395#834263, @efriedma wrote:

Also, on x86, it looks like we're missing an important pattern here: we generate the sequence andb $15, %cl; rolw %cl, %ax.

Yes, I saw that too. This was noted back in:
https://bugs.llvm.org/show_bug.cgi?id=17332#c6
...so we likely need to add to the tablegen pattern-matching.

lib/Transforms/InstCombine/InstCombineCasts.cpp
468	Yes, that would be a general more solution.
492	OK - I wasn't sure if the extra instruction would make this transform less desirable. But yes, if we add the mask ourselves, then I think we'll be able to handle more sloppy versions of the incoming code. I'll add more tests.

In D36395#834263, @efriedma wrote:

I spent a bit of time playing with the generated for for various implementations of rotate for 16-bit values. It looks like this transform actually makes things worse; e.g. on ARM the transformed version compiles to one more instruction. Granted, that's probably not important; I think the optimal lowering for a 16-bit rotate on ARM actually involves lowering it to a 32-bit rotate and fixing up the result, and that's probably a transform which doesn't make sense until legalization.

I just looked at this, and I think it's a non-issue because the transform is guarded by shouldChangeType(). We shouldn't transform to i8/i16 for ARM/AArch64 because they don't list those types as legal in the datalayout:

$ ./clang -O1 rot16.c -S -o - -target aarch64 -emit-llvm
...
target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"

$ ./clang -O1 rot16.c -S -o - -target x86_64 -emit-llvm
...
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

Patch updated:

Add an assert for the shouldChangeType() guard that's in the caller function.
Replace the explicit match for zext of the shift value with a call to MaskedValueIsZero().
Remove the MaskedValueIsZero() check for the shift amount and generate our own masks to make the transform safe:

http://rise4fun.com/Alive/Qzm

Add test to exercise #2 and #3.

Looks fine, but give it a couple of days to see if anyone else wants to review.

craig.topper added inline comments.Aug 8 2017, 4:38 PM

lib/Transforms/InstCombine/InstCombineCasts.cpp
484	Since we don't support vectors here. Should this be getIntegerBitWidth()?
485	Could you have used m_SpecificInt in the matchers above? to avoid this separate check?

spatel added inline comments.Aug 8 2017, 4:47 PM

lib/Transforms/InstCombine/InstCombineCasts.cpp
484	But we do support vectors - see the 2nd regression test.
485	Because we do support vectors, we can't do that?

I think I confused myself reading this assert. So the !isa<IntegerType> is for the vector case? Would it be more readable as isa<VectorType>?

assert((!isa<IntegerType>(Trunc.getSrcTy()) ||
        shouldChangeType(Trunc.getSrcTy(), Trunc.getType())) &&
       "Don't narrow to an illegal type");

m_SpecificInt should work on vectors right?

/// \brief Match a specific integer value or vector with all elements equal to
/// the value.
inline specific_intval m_SpecificInt(uint64_t V) { return specific_intval(V); }

In D36395#836103, @craig.topper wrote:

/// \brief Match a specific integer value or vector with all elements equal to
/// the value.
inline specific_intval m_SpecificInt(uint64_t V) { return specific_intval(V); }

Ah, never noticed/used that. So yes, let me use that and update the assert to be clearer. Thanks!

Patch updated - no functional changes intended from the previous rev, but cleaner:

Use isa<VectorType> in the assert and the callers' type check to make it clearer that vectors are allowed.
Use m_SpecificInt in pattern matches to reduce code.

LGTM

This revision is now accepted and ready to land.Aug 9 2017, 9:48 AM

Closed by commit rL310509: [InstCombine] narrow rotate left/right patterns to eliminate zext/trunc… (authored by spatel). · Explain WhyAug 9 2017, 11:38 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Transforms/

InstCombine/

InstCombineCasts.cpp

73 lines

InstCombineInternal.h

1 line

test/

Transforms/

InstCombine/

rotate.ll

124 lines

Diff 110088

lib/Transforms/InstCombine/InstCombineCasts.cpp

Show First 20 Lines • Show All 437 Lines • ▼ Show 20 Lines	static Instruction *foldVecTruncToExtElt(TruncInst &Trunc, InstCombiner &IC) {

unsigned Elt = ShiftAmount / DestWidth;		unsigned Elt = ShiftAmount / DestWidth;
if (IC.getDataLayout().isBigEndian())		if (IC.getDataLayout().isBigEndian())
Elt = NumVecElts - 1 - Elt;		Elt = NumVecElts - 1 - Elt;

return ExtractElementInst::Create(VecInput, IC.Builder.getInt32(Elt));		return ExtractElementInst::Create(VecInput, IC.Builder.getInt32(Elt));
}		}

		/// Rotate left/right may occur in a wider type than necessary because of type
		/// promotion rules. Try to narrow all of the component instructions.
		Instruction *InstCombiner::narrowRotate(TruncInst &Trunc) {
		assert((!isa<IntegerType>(Trunc.getSrcTy()) \|\|
		shouldChangeType(Trunc.getSrcTy(), Trunc.getType())) &&
		"Don't narrow to an illegal type");

		// First, find an or'd pair of opposite shifts with the same shifted operand:
		// trunc (or (lshr ShVal, ShAmt0), (shl ShVal, ShAmt1))
		Value Or0, Or1;
		if (!match(Trunc.getOperand(0), m_OneUse(m_Or(m_Value(Or0), m_Value(Or1)))))
		return nullptr;

		Value ShVal, ShAmt0, *ShAmt1;
		if (!match(Or0, m_OneUse(m_LogicalShift(m_Value(ShVal), m_Value(ShAmt0)))) \|\|
		!match(Or1, m_OneUse(m_LogicalShift(m_Specific(ShVal), m_Value(ShAmt1)))))
		return nullptr;

		auto ShiftOpcode0 = cast<BinaryOperator>(Or0)->getOpcode();
		auto ShiftOpcode1 = cast<BinaryOperator>(Or1)->getOpcode();
		if (ShiftOpcode0 == ShiftOpcode1)
		return nullptr;

		efriedmaUnsubmitted Not Done Reply Inline Actions Would it make sense to use known bits here rather than explicitly checking for zext? This matters because we tend to transform `zext(trunc(x)) -> x & c`. efriedma: Would it make sense to use known bits here rather than explicitly checking for zext? This…
		spatelAuthorUnsubmitted Not Done Reply Inline Actions Yes, that would be a general more solution. spatel: Yes, that would be a general more solution.
		// The shift amounts must add up to the narrow bit width.
		Value *ShAmt;
		const APInt *BitWidthC;
		bool SubIsOnLHS;
		if (match(ShAmt0, m_OneUse(m_Sub(m_APInt(BitWidthC), m_Specific(ShAmt1))))) {
		ShAmt = ShAmt1;
		SubIsOnLHS = true;
		} else if (match(ShAmt1,
		m_OneUse(m_Sub(m_APInt(BitWidthC), m_Specific(ShAmt0))))) {
		ShAmt = ShAmt0;
		SubIsOnLHS = false;
		} else {
		return nullptr;
		}
		Type *DestTy = Trunc.getType();
		unsigned NarrowWidth = DestTy->getScalarSizeInBits();
		craig.topperUnsubmitted Not Done Reply Inline Actions Since we don't support vectors here. Should this be getIntegerBitWidth()? craig.topper: Since we don't support vectors here. Should this be getIntegerBitWidth()?
		spatelAuthorUnsubmitted Not Done Reply Inline Actions But we do support vectors - see the 2nd regression test. spatel: But we do support vectors - see the 2nd regression test.
		if (NarrowWidth != *BitWidthC)
		craig.topperUnsubmitted Done Reply Inline Actions Could you have used m_SpecificInt in the matchers above? to avoid this separate check? craig.topper: Could you have used m_SpecificInt in the matchers above? to avoid this separate check?
		spatelAuthorUnsubmitted Not Done Reply Inline Actions Because we do support vectors, we can't do that? spatel: Because we do support vectors, we can't do that?
		return nullptr;

		// The shifted value must have high zeros in the wide type. Typically, this
		// will be a zext, but it could also be the result of an 'and' or 'shift'.
		unsigned WideWidth = Trunc.getSrcTy()->getScalarSizeInBits();
		APInt HiBitMask = APInt::getHighBitsSet(WideWidth, WideWidth - NarrowWidth);
		if (!MaskedValueIsZero(ShVal, HiBitMask, 0, &Trunc))
		efriedmaUnsubmitted Not Done Reply Inline Actions It seems like generating the mask is worth doing here, if we think it'll be eliminated by later lowering. efriedma: It seems like generating the mask is worth doing here, if we think it'll be eliminated by later…
		spatelAuthorUnsubmitted Not Done Reply Inline Actions OK - I wasn't sure if the extra instruction would make this transform less desirable. But yes, if we add the mask ourselves, then I think we'll be able to handle more sloppy versions of the incoming code. I'll add more tests. spatel: OK - I wasn't sure if the extra instruction would make this transform less desirable. But yes…
		return nullptr;

		// We have an unnecessarily wide rotate!
		// trunc (or (lshr ShVal, ShAmt), (shl ShVal, BitWidth - ShAmt))
		// Narrow it down to eliminate the zext/trunc:
		// or (lshr trunc(ShVal), ShAmt0'), (shl trunc(ShVal), ShAmt1')
		Value *NarrowShAmt = Builder.CreateTrunc(ShAmt, DestTy);
		Value *NegShAmt = Builder.CreateNeg(NarrowShAmt);

		// Mask both shift amounts to ensure there's no UB from oversized shifts.
		Constant *MaskC = ConstantInt::get(DestTy, NarrowWidth - 1);
		Value *MaskedShAmt = Builder.CreateAnd(NarrowShAmt, MaskC);
		Value *MaskedNegShAmt = Builder.CreateAnd(NegShAmt, MaskC);

		// Truncate the original value and use narrow ops.
		Value *X = Builder.CreateTrunc(ShVal, DestTy);
		Value *NarrowShAmt0 = SubIsOnLHS ? MaskedNegShAmt : MaskedShAmt;
		Value *NarrowShAmt1 = SubIsOnLHS ? MaskedShAmt : MaskedNegShAmt;
		Value *NarrowSh0 = Builder.CreateBinOp(ShiftOpcode0, X, NarrowShAmt0);
		Value *NarrowSh1 = Builder.CreateBinOp(ShiftOpcode1, X, NarrowShAmt1);
		return BinaryOperator::CreateOr(NarrowSh0, NarrowSh1);
		}

/// Try to narrow the width of math or bitwise logic instructions by pulling a		/// Try to narrow the width of math or bitwise logic instructions by pulling a
/// truncate ahead of binary operators.		/// truncate ahead of binary operators.
/// TODO: Transforms for truncated shifts should be moved into here.		/// TODO: Transforms for truncated shifts should be moved into here.
Instruction *InstCombiner::narrowBinOp(TruncInst &Trunc) {		Instruction *InstCombiner::narrowBinOp(TruncInst &Trunc) {
Type *SrcTy = Trunc.getSrcTy();		Type *SrcTy = Trunc.getSrcTy();
Type *DestTy = Trunc.getType();		Type *DestTy = Trunc.getType();
if (isa<IntegerType>(SrcTy) && !shouldChangeType(SrcTy, DestTy))		if (isa<IntegerType>(SrcTy) && !shouldChangeType(SrcTy, DestTy))
return nullptr;		return nullptr;
Show All 26 Lines	if (match(BinOp->getOperand(0), m_Constant(C))) {
return BinaryOperator::Create(BinOp->getOpcode(), NarrowC, TruncX);		return BinaryOperator::Create(BinOp->getOpcode(), NarrowC, TruncX);
}		}
break;		break;
}		}

default: break;		default: break;
}		}

		if (Instruction *NarrowOr = narrowRotate(Trunc))
		return NarrowOr;

return nullptr;		return nullptr;
}		}

/// Try to narrow the width of a splat shuffle. This could be generalized to any		/// Try to narrow the width of a splat shuffle. This could be generalized to any
/// shuffle with a constant operand, but we limit the transform to avoid		/// shuffle with a constant operand, but we limit the transform to avoid
/// creating a shuffle type that targets may not be able to lower effectively.		/// creating a shuffle type that targets may not be able to lower effectively.
static Instruction *shrinkSplatShuffle(TruncInst &Trunc,		static Instruction *shrinkSplatShuffle(TruncInst &Trunc,
InstCombiner::BuilderTy &Builder) {		InstCombiner::BuilderTy &Builder) {
▲ Show 20 Lines • Show All 1,768 Lines • Show Last 20 Lines

lib/Transforms/InstCombine/InstCombineInternal.h

Show First 20 Lines • Show All 434 Lines • ▼ Show 20 Lines	bool willNotOverflowUnsignedMul(const Value LHS, const Value RHS,
return computeOverflowForUnsignedMul(LHS, RHS, &CxtI) ==		return computeOverflowForUnsignedMul(LHS, RHS, &CxtI) ==
OverflowResult::NeverOverflows;		OverflowResult::NeverOverflows;
};		};
Value EmitGEPOffset(User GEP);		Value EmitGEPOffset(User GEP);
Instruction scalarizePHI(ExtractElementInst &EI, PHINode PN);		Instruction scalarizePHI(ExtractElementInst &EI, PHINode PN);
Value EvaluateInDifferentElementOrder(Value V, ArrayRef<int> Mask);		Value EvaluateInDifferentElementOrder(Value V, ArrayRef<int> Mask);
Instruction *foldCastedBitwiseLogic(BinaryOperator &I);		Instruction *foldCastedBitwiseLogic(BinaryOperator &I);
Instruction *narrowBinOp(TruncInst &Trunc);		Instruction *narrowBinOp(TruncInst &Trunc);
		Instruction *narrowRotate(TruncInst &Trunc);
Instruction optimizeBitCastFromPhi(CastInst &CI, PHINode PN);		Instruction optimizeBitCastFromPhi(CastInst &CI, PHINode PN);

/// Determine if a pair of casts can be replaced by a single cast.		/// Determine if a pair of casts can be replaced by a single cast.
///		///
/// \param CI1 The first of a pair of casts.		/// \param CI1 The first of a pair of casts.
/// \param CI2 The second of a pair of casts.		/// \param CI2 The second of a pair of casts.
///		///
/// \return 0 if the cast pair cannot be eliminated, otherwise returns an		/// \return 0 if the cast pair cannot be eliminated, otherwise returns an
▲ Show 20 Lines • Show All 313 Lines • Show Last 20 Lines

test/Transforms/InstCombine/rotate.ll

				; RUN: opt < %s -instcombine -S \| FileCheck %s

				target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:128:128"

				; These are UB-free rotate left/right patterns that are narrowed to a smaller bitwidth.
				; See PR34046 and PR16726 for motivating examples:
				; https://bugs.llvm.org/show_bug.cgi?id=34046
				; https://bugs.llvm.org/show_bug.cgi?id=16726

				define i16 @rotate_left_16bit(i16 %v, i32 %shift) {
				; CHECK-LABEL: @rotate_left_16bit(
				; CHECK-NEXT: [[TMP1:%.*]] = trunc i32 %shift to i16
				; CHECK-NEXT: [[TMP2:%.*]] = and i16 [[TMP1]], 15
				; CHECK-NEXT: [[TMP3:%.*]] = sub i16 0, [[TMP1]]
				; CHECK-NEXT: [[TMP4:%.*]] = and i16 [[TMP3]], 15
				; CHECK-NEXT: [[TMP5:%.*]] = lshr i16 %v, [[TMP4]]
				; CHECK-NEXT: [[TMP6:%.*]] = shl i16 %v, [[TMP2]]
				; CHECK-NEXT: [[CONV2:%.*]] = or i16 [[TMP5]], [[TMP6]]
				; CHECK-NEXT: ret i16 [[CONV2]]
				;
				%and = and i32 %shift, 15
				%conv = zext i16 %v to i32
				%shl = shl i32 %conv, %and
				%sub = sub i32 16, %and
				%shr = lshr i32 %conv, %sub
				%or = or i32 %shr, %shl
				%conv2 = trunc i32 %or to i16
				ret i16 %conv2
				}

				; Commute the 'or' operands and try a vector type.

				define <2 x i16> @rotate_left_commute_16bit_vec(<2 x i16> %v, <2 x i32> %shift) {
				; CHECK-LABEL: @rotate_left_commute_16bit_vec(
				; CHECK-NEXT: [[TMP1:%.*]] = trunc <2 x i32> %shift to <2 x i16>
				; CHECK-NEXT: [[TMP2:%.*]] = and <2 x i16> [[TMP1]], <i16 15, i16 15>
				; CHECK-NEXT: [[TMP3:%.*]] = sub <2 x i16> zeroinitializer, [[TMP1]]
				; CHECK-NEXT: [[TMP4:%.*]] = and <2 x i16> [[TMP3]], <i16 15, i16 15>
				; CHECK-NEXT: [[TMP5:%.*]] = shl <2 x i16> %v, [[TMP2]]
				; CHECK-NEXT: [[TMP6:%.*]] = lshr <2 x i16> %v, [[TMP4]]
				; CHECK-NEXT: [[CONV2:%.*]] = or <2 x i16> [[TMP5]], [[TMP6]]
				; CHECK-NEXT: ret <2 x i16> [[CONV2]]
				;
				%and = and <2 x i32> %shift, <i32 15, i32 15>
				%conv = zext <2 x i16> %v to <2 x i32>
				%shl = shl <2 x i32> %conv, %and
				%sub = sub <2 x i32> <i32 16, i32 16>, %and
				%shr = lshr <2 x i32> %conv, %sub
				%or = or <2 x i32> %shl, %shr
				%conv2 = trunc <2 x i32> %or to <2 x i16>
				ret <2 x i16> %conv2
				}

				; Change the size, rotation direction (the subtract is on the left-shift), and mask op.

				define i8 @rotate_right_8bit(i8 %v, i3 %shift) {
				; CHECK-LABEL: @rotate_right_8bit(
				; CHECK-NEXT: [[TMP1:%.*]] = zext i3 %shift to i8
				; CHECK-NEXT: [[TMP2:%.*]] = sub i3 0, %shift
				; CHECK-NEXT: [[TMP3:%.*]] = zext i3 [[TMP2]] to i8
				; CHECK-NEXT: [[TMP4:%.*]] = shl i8 %v, [[TMP3]]
				; CHECK-NEXT: [[TMP5:%.*]] = lshr i8 %v, [[TMP1]]
				; CHECK-NEXT: [[CONV2:%.*]] = or i8 [[TMP4]], [[TMP5]]
				; CHECK-NEXT: ret i8 [[CONV2]]
				;
				%and = zext i3 %shift to i32
				%conv = zext i8 %v to i32
				%shr = lshr i32 %conv, %and
				%sub = sub i32 8, %and
				%shl = shl i32 %conv, %sub
				%or = or i32 %shl, %shr
				%conv2 = trunc i32 %or to i8
				ret i8 %conv2
				}

				; The shifted value does not need to be a zexted value; here it is masked.
				; The shift mask could be less than the bitwidth, but this is still ok.

				define i8 @rotate_right_commute_8bit(i32 %v, i32 %shift) {
				; CHECK-LABEL: @rotate_right_commute_8bit(
				; CHECK-NEXT: [[TMP1:%.*]] = trunc i32 %shift to i8
				; CHECK-NEXT: [[TMP2:%.*]] = and i8 [[TMP1]], 3
				; CHECK-NEXT: [[TMP3:%.*]] = sub nsw i8 0, [[TMP2]]
				; CHECK-NEXT: [[TMP4:%.*]] = and i8 [[TMP3]], 7
				; CHECK-NEXT: [[TMP5:%.*]] = trunc i32 %v to i8
				; CHECK-NEXT: [[TMP6:%.*]] = lshr i8 [[TMP5]], [[TMP2]]
				; CHECK-NEXT: [[TMP7:%.*]] = shl i8 [[TMP5]], [[TMP4]]
				; CHECK-NEXT: [[CONV2:%.*]] = or i8 [[TMP6]], [[TMP7]]
				; CHECK-NEXT: ret i8 [[CONV2]]
				;
				%and = and i32 %shift, 3
				%conv = and i32 %v, 255
				%shr = lshr i32 %conv, %and
				%sub = sub i32 8, %and
				%shl = shl i32 %conv, %sub
				%or = or i32 %shr, %shl
				%conv2 = trunc i32 %or to i8
				ret i8 %conv2
				}

				; If the original source does not mask the shift amount,
				; we still do the transform by adding masks to make it safe.

				define i8 @rotate8_not_safe(i8 %v, i32 %shamt) {
				; CHECK-LABEL: @rotate8_not_safe(
				; CHECK-NEXT: [[TMP1:%.*]] = trunc i32 %shamt to i8
				; CHECK-NEXT: [[TMP2:%.*]] = sub i8 0, [[TMP1]]
				; CHECK-NEXT: [[TMP3:%.*]] = and i8 [[TMP1]], 7
				; CHECK-NEXT: [[TMP4:%.*]] = and i8 [[TMP2]], 7
				; CHECK-NEXT: [[TMP5:%.*]] = lshr i8 %v, [[TMP4]]
				; CHECK-NEXT: [[TMP6:%.*]] = shl i8 %v, [[TMP3]]
				; CHECK-NEXT: [[RET:%.*]] = or i8 [[TMP5]], [[TMP6]]
				; CHECK-NEXT: ret i8 [[RET]]
				;
				%conv = zext i8 %v to i32
				%sub = sub i32 8, %shamt
				%shr = lshr i32 %conv, %sub
				%shl = shl i32 %conv, %shamt
				%or = or i32 %shr, %shl
				%ret = trunc i32 %or to i8
				ret i8 %ret
				}