This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Support/
-
Support/
1/2
KnownBits.cpp
-
test/
-
CodeGen/X86/
-
X86/
-
mul128.ll
-
Transforms/
-
InstCombine/
-
icmp-mul.ll
-
narrow-switch.ll
-
PhaseOrdering/X86/
-
X86/
-
pixel-splat.ll

Differential D115969

[Support] improve known bits analysis for leading zeros of multiply
ClosedPublic

Authored by spatel on Dec 17 2021, 1:34 PM.

Download Raw Diff

Details

Reviewers

lebedev.ri
efriedma
RKSimon
craig.topper

Commits

rG892c731681df: [Support] improve known bits analysis for leading zeros of multiply

Summary

Instead of summing leading zeros on the input operands, multiply the max possible values of those inputs and count the leading zeros of the result. This can give us an extra zero bit (typically in cases where one of the operands is a known constant).

This allows folding away the remaining 'add' ops in the motivating bug (modeled in the PhaseOrdering IR test):
https://github.com/llvm/llvm-project/issues/48399

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

spatel created this revision.Dec 17 2021, 1:34 PM

Herald added subscribers: dexonsmith, pengfei, hiraditya, mcrosier. · View Herald TranscriptDec 17 2021, 1:34 PM

spatel requested review of this revision.Dec 17 2021, 1:34 PM

Herald added a project: Restricted Project. · View Herald TranscriptDec 17 2021, 1:34 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B139913: Diff 395213.Dec 17 2021, 1:34 PM

spatel edited the summary of this revision. (Show Details)Dec 17 2021, 1:42 PM

RKSimon added inline comments.Dec 18 2021, 2:44 PM

llvm/lib/Support/KnownBits.cpp
428–435	I'm not certain, but this assert (and the max() feeding it) doesn't look quite right - please can you double check it?

spatel added a reviewer: craig.topper.Dec 19 2021, 5:50 AM

spatel added inline comments.

llvm/lib/Support/KnownBits.cpp
428–435	I think it's correct (otherwise, the exhaustive unit tests for mul should find an error). As an example, try a 4-bit mul -- 0??? * 00?? : Max wide product: 0000_0111 * 0000_0011 = 0001_0101 (7 * 3 = 21) Leading zeros of wide max product = 3 Leading zeros of actual result = max(3, 4) - 4 = 0 (there are no LZ in case of overflow) But this does suggest a more efficient implementation (and possibly easier to read) - use "umul_ov" instead of widening. Let me update and see if that looks better.

Patch updated: use multiply with overflow instead of wide multiply for efficiency (and possibly easier to read). This obsoletes any opportunities for an assert AFAICT.

Harbormaster completed remote builds in B140005: Diff 395326.Dec 19 2021, 6:22 AM

LG
We *really* need precision tests for KnownBits / ConstantRange.

This revision is now accepted and ready to land.Dec 19 2021, 6:24 AM

In D115969#3201733, @lebedev.ri wrote:

LG
We *really* need precision tests for KnownBits / ConstantRange.

Thanks!

Side note: I was wondering how we handle the known sign bit for mul; it is not done here, but it is in the caller code in ValueTracking's computeKnownBitsMul(). I'm not sure if there's a reason for that.

Thanks for checking! LGTM - cheers

This revision was landed with ongoing or failed builds.Dec 20 2021, 6:12 AM

Closed by commit rG892c731681df: [Support] improve known bits analysis for leading zeros of multiply (authored by spatel). · Explain Why

This revision was automatically updated to reflect the committed changes.

spatel added a commit: rG892c731681df: [Support] improve known bits analysis for leading zeros of multiply.

Revision Contents

Path

Size

llvm/

lib/

Support/

KnownBits.cpp

23 lines

test/

CodeGen/

X86/

mul128.ll

13 lines

Transforms/

InstCombine/

icmp-mul.ll

9 lines

narrow-switch.ll

12 lines

PhaseOrdering/

X86/

pixel-splat.ll

33 lines

Diff 395429

llvm/lib/Support/KnownBits.cpp

	Show First 20 Lines • Show All 414 Lines • ▼ Show 20 Lines
	KnownBits KnownBits::mul(const KnownBits &LHS, const KnownBits &RHS,			KnownBits KnownBits::mul(const KnownBits &LHS, const KnownBits &RHS,
	bool SelfMultiply) {			bool SelfMultiply) {
	unsigned BitWidth = LHS.getBitWidth();			unsigned BitWidth = LHS.getBitWidth();
	assert(BitWidth == RHS.getBitWidth() && !LHS.hasConflict() &&			assert(BitWidth == RHS.getBitWidth() && !LHS.hasConflict() &&
	!RHS.hasConflict() && "Operand mismatch");			!RHS.hasConflict() && "Operand mismatch");
	assert((!SelfMultiply \|\| (LHS.One == RHS.One && LHS.Zero == RHS.Zero)) &&			assert((!SelfMultiply \|\| (LHS.One == RHS.One && LHS.Zero == RHS.Zero)) &&
	"Self multiplication knownbits mismatch");			"Self multiplication knownbits mismatch");

	// Compute a conservative estimate for high known-0 bits.			// Compute the high known-0 bits by multiplying the unsigned max of each side.
				// Conservatively, M active bits * N active bits results in M + N bits in the
				// result. But if we know a value is a power-of-2 for example, then this
				// computes one more leading zero.
	// TODO: This could be generalized to number of sign bits (negative numbers).			// TODO: This could be generalized to number of sign bits (negative numbers).
	unsigned LHSLeadZ = LHS.countMinLeadingZeros();			APInt UMaxLHS = LHS.getMaxValue();
	unsigned RHSLeadZ = RHS.countMinLeadingZeros();			APInt UMaxRHS = RHS.getMaxValue();

	// If either operand is a power-of-2, the multiply is only shifting bits in			// For leading zeros in the result to be valid, the unsigned max product must
	// the other operand (there can't be a carry into the M+N bit of the result).			// fit in the bitwidth (it must not overflow).
	// Note: if we know that a value is entirely 0, that should simplify below.			bool HasOverflow;
	bool BonusLZ = LHS.countMaxPopulation() == 1 \|\| RHS.countMaxPopulation() == 1;			APInt UMaxResult = UMaxLHS.umul_ov(UMaxRHS, HasOverflow);
				unsigned LeadZ = HasOverflow ? 0 : UMaxResult.countLeadingZeros();
				RKSimonUnsubmitted Not Done Reply Inline Actions I'm not certain, but this assert (and the max() feeding it) doesn't look quite right - please can you double check it? RKSimon: I'm not certain, but this assert (and the max() feeding it) doesn't look quite right - please…
				spatelAuthorUnsubmitted Done Reply Inline Actions I think it's correct (otherwise, the exhaustive unit tests for mul should find an error). As an example, try a 4-bit mul -- 0??? * 00?? : Max wide product: 0000_0111 * 0000_0011 = 0001_0101 (7 * 3 = 21) Leading zeros of wide max product = 3 Leading zeros of actual result = max(3, 4) - 4 = 0 (there are no LZ in case of overflow) But this does suggest a more efficient implementation (and possibly easier to read) - use "umul_ov" instead of widening. Let me update and see if that looks better. spatel: I think it's correct (otherwise, the exhaustive unit tests for mul should find an error). As an…
	unsigned LeadZ = std::max(LHSLeadZ + RHSLeadZ + BonusLZ, BitWidth) - BitWidth;
	assert(LeadZ <= BitWidth && "More zeros than bits?");

	// The result of the bottom bits of an integer multiply can be			// The result of the bottom bits of an integer multiply can be
	// inferred by looking at the bottom bits of both operands and			// inferred by looking at the bottom bits of both operands and
	// multiplying them together.			// multiplying them together.
	// We can infer at least the minimum number of known trailing bits			// We can infer at least the minimum number of known trailing bits
	// of both operands. Depending on number of trailing zeros, we can			// of both operands. Depending on number of trailing zeros, we can
	// infer more bits, because (ab) <=> ((a/m) (b/n)) * (m*n) assuming			// infer more bits, because (ab) <=> ((a/m) (b/n)) * (m*n) assuming
	// a and b are divisible by m and n respectively.			// a and b are divisible by m and n respectively.
	▲ Show 20 Lines • Show All 189 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/mul128.ll

	Show First 20 Lines • Show All 101 Lines • ▼ Show 20 Lines
	}			}

	@aaa = external dso_local global i128			@aaa = external dso_local global i128
	@bbb = external dso_local global i128			@bbb = external dso_local global i128

	define void @PR13897() nounwind {			define void @PR13897() nounwind {
	; X64-LABEL: PR13897:			; X64-LABEL: PR13897:
	; X64: # %bb.0: # %"0x0"			; X64: # %bb.0: # %"0x0"
	; X64-NEXT: movl bbb(%rip), %ecx			; X64-NEXT: movl bbb(%rip), %eax
	; X64-NEXT: movabsq $4294967297, %rdx # imm = 0x100000001			; X64-NEXT: movq %rax, %rcx
	; X64-NEXT: movq %rcx, %rax
	; X64-NEXT: mulq %rdx
	; X64-NEXT: addq %rcx, %rdx
	; X64-NEXT: shlq $32, %rcx			; X64-NEXT: shlq $32, %rcx
	; X64-NEXT: addq %rcx, %rdx			; X64-NEXT: orq %rax, %rcx
	; X64-NEXT: movq %rax, aaa(%rip)			; X64-NEXT: movq %rcx, aaa+8(%rip)
	; X64-NEXT: movq %rdx, aaa+8(%rip)			; X64-NEXT: movq %rcx, aaa(%rip)
	; X64-NEXT: retq			; X64-NEXT: retq
	;			;
	; X86-LABEL: PR13897:			; X86-LABEL: PR13897:
	; X86: # %bb.0: # %"0x0"			; X86: # %bb.0: # %"0x0"
	; X86-NEXT: movl bbb, %eax			; X86-NEXT: movl bbb, %eax
	; X86-NEXT: movl %eax, aaa+12			; X86-NEXT: movl %eax, aaa+12
	; X86-NEXT: movl %eax, aaa+8			; X86-NEXT: movl %eax, aaa+8
	; X86-NEXT: movl %eax, aaa+4			; X86-NEXT: movl %eax, aaa+4
	Show All 11 Lines

llvm/test/Transforms/InstCombine/icmp-mul.ll

	Show First 20 Lines • Show All 852 Lines • ▼ Show 20 Lines
	;			;
	%b = and i32 %x, 2			%b = and i32 %x, 2
	%s = sext i8 %y to i32			%s = sext i8 %y to i32
	%m = mul nuw nsw i32 %b, %s			%m = mul nuw nsw i32 %b, %s
	%r = icmp sgt i32 %m, 254			%r = icmp sgt i32 %m, 254
	ret i1 %r			ret i1 %r
	}			}

				; The top 32-bits must be zero.

	define i1 @splat_mul_known_lz(i32 %x) {			define i1 @splat_mul_known_lz(i32 %x) {
	; CHECK-LABEL: @splat_mul_known_lz(			; CHECK-LABEL: @splat_mul_known_lz(
	; CHECK-NEXT: [[Z:%.]] = zext i32 [[X:%.]] to i128			; CHECK-NEXT: ret i1 true
	; CHECK-NEXT: [[M:%.*]] = mul nuw nsw i128 [[Z]], 18446744078004518913
	; CHECK-NEXT: [[R:%.*]] = icmp ult i128 [[M]], 79228162514264337593543950336
	; CHECK-NEXT: ret i1 [[R]]
	;			;
	%z = zext i32 %x to i128			%z = zext i32 %x to i128
	%m = mul i128 %z, 18446744078004518913 ; 0x00000000_00000001_00000001_00000001			%m = mul i128 %z, 18446744078004518913 ; 0x00000000_00000001_00000001_00000001
	%s = lshr i128 %m, 96			%s = lshr i128 %m, 96
	%r = icmp eq i128 %s, 0			%r = icmp eq i128 %s, 0
	ret i1 %r			ret i1 %r
	}			}

				; Negative test - the 33rd bit could be set.

	define i1 @splat_mul_unknown_lz(i32 %x) {			define i1 @splat_mul_unknown_lz(i32 %x) {
	; CHECK-LABEL: @splat_mul_unknown_lz(			; CHECK-LABEL: @splat_mul_unknown_lz(
	; CHECK-NEXT: [[Z:%.]] = zext i32 [[X:%.]] to i128			; CHECK-NEXT: [[Z:%.]] = zext i32 [[X:%.]] to i128
	; CHECK-NEXT: [[M:%.*]] = mul nuw nsw i128 [[Z]], 18446744078004518913			; CHECK-NEXT: [[M:%.*]] = mul nuw nsw i128 [[Z]], 18446744078004518913
	; CHECK-NEXT: [[R:%.*]] = icmp ult i128 [[M]], 39614081257132168796771975168			; CHECK-NEXT: [[R:%.*]] = icmp ult i128 [[M]], 39614081257132168796771975168
	; CHECK-NEXT: ret i1 [[R]]			; CHECK-NEXT: ret i1 [[R]]
	;			;
	%z = zext i32 %x to i128			%z = zext i32 %x to i128
	%m = mul i128 %z, 18446744078004518913 ; 0x00000000_00000001_00000001_00000001			%m = mul i128 %z, 18446744078004518913 ; 0x00000000_00000001_00000001_00000001
	%s = lshr i128 %m, 95			%s = lshr i128 %m, 95
	%r = icmp eq i128 %s, 0			%r = icmp eq i128 %s, 0
	ret i1 %r			ret i1 %r
	}			}

llvm/test/Transforms/InstCombine/narrow-switch.ll

	Show First 20 Lines • Show All 93 Lines • ▼ Show 20 Lines
	return:			return:
	%retval.0 = phi i32 [ 24, %sw.default ], [ 123, %sw.bb2 ], [ 213, %sw.bb1 ], [ 231, %entry ]			%retval.0 = phi i32 [ 24, %sw.default ], [ 123, %sw.bb2 ], [ 213, %sw.bb1 ], [ 231, %entry ]
	ret i32 %retval.0			ret i32 %retval.0
	}			}

	; Make sure to avoid assertion crashes and use the type before			; Make sure to avoid assertion crashes and use the type before
	; truncation to generate the sub constant expressions that leads			; truncation to generate the sub constant expressions that leads
	; to the recomputed condition.			; to the recomputed condition.
	; We allow to truncate from i64 to i59 if in 32-bit mode,			; We allow truncate from i64 to i58 if in 32-bit mode,
	; because both are illegal.			; because both are illegal.

	define void @trunc64to59(i64 %a) {			define void @trunc64to58(i64 %a) {
	; ALL-LABEL: @trunc64to59(			; ALL-LABEL: @trunc64to58(
	; CHECK32: switch i59			; CHECK32: switch i58
	; CHECK32-NEXT: i59 0, label %sw.bb1			; CHECK32-NEXT: i58 0, label %sw.bb1
	; CHECK32-NEXT: i59 18717182647723699, label %sw.bb2			; CHECK32-NEXT: i58 18717182647723699, label %sw.bb2
	; CHECK32-NEXT: ]			; CHECK32-NEXT: ]
	; CHECK64: switch i64			; CHECK64: switch i64
	; CHECK64-NEXT: i64 0, label %sw.bb1			; CHECK64-NEXT: i64 0, label %sw.bb1
	; CHECK64-NEXT: i64 18717182647723699, label %sw.bb2			; CHECK64-NEXT: i64 18717182647723699, label %sw.bb2
	; CHECK64-NEXT: ]			; CHECK64-NEXT: ]
	;			;
	entry:			entry:
	%tmp0 = and i64 %a, 15			%tmp0 = and i64 %a, 15
	▲ Show 20 Lines • Show All 145 Lines • Show Last 20 Lines

llvm/test/Transforms/PhaseOrdering/X86/pixel-splat.ll

	Show All 34 Lines
	; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds i8, i8 [[PIN:%.*]], i64 [[INDEX]]			; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds i8, i8 [[PIN:%.*]], i64 [[INDEX]]
	; CHECK-NEXT: [[TMP1:%.]] = bitcast i8 [[TMP0]] to <4 x i8>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast i8 [[TMP0]] to <4 x i8>*
	; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <4 x i8>, <4 x i8> [[TMP1]], align 1			; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <4 x i8>, <4 x i8> [[TMP1]], align 1
	; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i8, i8 [[TMP0]], i64 4			; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i8, i8 [[TMP0]], i64 4
	; CHECK-NEXT: [[TMP3:%.]] = bitcast i8 [[TMP2]] to <4 x i8>*			; CHECK-NEXT: [[TMP3:%.]] = bitcast i8 [[TMP2]] to <4 x i8>*
	; CHECK-NEXT: [[WIDE_LOAD4:%.]] = load <4 x i8>, <4 x i8> [[TMP3]], align 1			; CHECK-NEXT: [[WIDE_LOAD4:%.]] = load <4 x i8>, <4 x i8> [[TMP3]], align 1
	; CHECK-NEXT: [[TMP4:%.*]] = zext <4 x i8> [[WIDE_LOAD]] to <4 x i32>			; CHECK-NEXT: [[TMP4:%.*]] = zext <4 x i8> [[WIDE_LOAD]] to <4 x i32>
	; CHECK-NEXT: [[TMP5:%.*]] = zext <4 x i8> [[WIDE_LOAD4]] to <4 x i32>			; CHECK-NEXT: [[TMP5:%.*]] = zext <4 x i8> [[WIDE_LOAD4]] to <4 x i32>
	; CHECK-NEXT: [[TMP6:%.*]] = mul nuw nsw <4 x i32> [[TMP4]], <i32 65792, i32 65792, i32 65792, i32 65792>			; CHECK-NEXT: [[TMP6:%.*]] = mul nuw nsw <4 x i32> [[TMP4]], <i32 65793, i32 65793, i32 65793, i32 65793>
	; CHECK-NEXT: [[TMP7:%.*]] = mul nuw nsw <4 x i32> [[TMP5]], <i32 65792, i32 65792, i32 65792, i32 65792>			; CHECK-NEXT: [[TMP7:%.*]] = mul nuw nsw <4 x i32> [[TMP5]], <i32 65793, i32 65793, i32 65793, i32 65793>
	; CHECK-NEXT: [[TMP8:%.*]] = or <4 x i32> [[TMP4]], <i32 -16777216, i32 -16777216, i32 -16777216, i32 -16777216>			; CHECK-NEXT: [[TMP8:%.*]] = or <4 x i32> [[TMP6]], <i32 -16777216, i32 -16777216, i32 -16777216, i32 -16777216>
	; CHECK-NEXT: [[TMP9:%.*]] = or <4 x i32> [[TMP5]], <i32 -16777216, i32 -16777216, i32 -16777216, i32 -16777216>			; CHECK-NEXT: [[TMP9:%.*]] = or <4 x i32> [[TMP7]], <i32 -16777216, i32 -16777216, i32 -16777216, i32 -16777216>
	; CHECK-NEXT: [[TMP10:%.*]] = add nsw <4 x i32> [[TMP8]], [[TMP6]]			; CHECK-NEXT: [[TMP10:%.]] = getelementptr inbounds i32, i32 [[POUT:%.*]], i64 [[INDEX]]
	; CHECK-NEXT: [[TMP11:%.*]] = add nsw <4 x i32> [[TMP9]], [[TMP7]]			; CHECK-NEXT: [[TMP11:%.]] = bitcast i32 [[TMP10]] to <4 x i32>*
	; CHECK-NEXT: [[TMP12:%.]] = getelementptr inbounds i32, i32 [[POUT:%.*]], i64 [[INDEX]]			; CHECK-NEXT: store <4 x i32> [[TMP8]], <4 x i32>* [[TMP11]], align 4
				; CHECK-NEXT: [[TMP12:%.]] = getelementptr inbounds i32, i32 [[TMP10]], i64 4
	; CHECK-NEXT: [[TMP13:%.]] = bitcast i32 [[TMP12]] to <4 x i32>*			; CHECK-NEXT: [[TMP13:%.]] = bitcast i32 [[TMP12]] to <4 x i32>*
	; CHECK-NEXT: store <4 x i32> [[TMP10]], <4 x i32>* [[TMP13]], align 4			; CHECK-NEXT: store <4 x i32> [[TMP9]], <4 x i32>* [[TMP13]], align 4
	; CHECK-NEXT: [[TMP14:%.]] = getelementptr inbounds i32, i32 [[TMP12]], i64 4
	; CHECK-NEXT: [[TMP15:%.]] = bitcast i32 [[TMP14]] to <4 x i32>*
	; CHECK-NEXT: store <4 x i32> [[TMP11]], <4 x i32>* [[TMP15]], align 4
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8
	; CHECK-NEXT: [[TMP16:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP14:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP16]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP14]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_VEC]], [[WIDE_TRIP_COUNT]]			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_VEC]], [[WIDE_TRIP_COUNT]]
	; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END]], label [[FOR_BODY_PREHEADER5]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END]], label [[FOR_BODY_PREHEADER5]]
	; CHECK: for.body.preheader5:			; CHECK: for.body.preheader5:
	; CHECK-NEXT: [[INDVARS_IV_PH:%.*]] = phi i64 [ 0, [[FOR_BODY_PREHEADER]] ], [ [[N_VEC]], [[MIDDLE_BLOCK]] ]			; CHECK-NEXT: [[INDVARS_IV_PH:%.*]] = phi i64 [ 0, [[FOR_BODY_PREHEADER]] ], [ [[N_VEC]], [[MIDDLE_BLOCK]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ], [ [[INDVARS_IV_PH]], [[FOR_BODY_PREHEADER5]] ]			; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ], [ [[INDVARS_IV_PH]], [[FOR_BODY_PREHEADER5]] ]
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i8, i8 [[PIN]], i64 [[INDVARS_IV]]			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i8, i8 [[PIN]], i64 [[INDVARS_IV]]
	; CHECK-NEXT: [[TMP17:%.]] = load i8, i8 [[ARRAYIDX]], align 1			; CHECK-NEXT: [[TMP15:%.]] = load i8, i8 [[ARRAYIDX]], align 1
	; CHECK-NEXT: [[CONV:%.*]] = zext i8 [[TMP17]] to i32			; CHECK-NEXT: [[CONV:%.*]] = zext i8 [[TMP15]] to i32
	; CHECK-NEXT: [[REASS_MUL:%.*]] = mul nuw nsw i32 [[CONV]], 65792			; CHECK-NEXT: [[OR2:%.*]] = mul nuw nsw i32 [[CONV]], 65793
	; CHECK-NEXT: [[OR2:%.*]] = or i32 [[CONV]], -16777216			; CHECK-NEXT: [[OR3:%.*]] = or i32 [[OR2]], -16777216
	; CHECK-NEXT: [[OR3:%.*]] = add nsw i32 [[OR2]], [[REASS_MUL]]
	; CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds i32, i32 [[POUT]], i64 [[INDVARS_IV]]			; CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds i32, i32 [[POUT]], i64 [[INDVARS_IV]]
	; CHECK-NEXT: store i32 [[OR3]], i32* [[ARRAYIDX5]], align 4			; CHECK-NEXT: store i32 [[OR3]], i32* [[ARRAYIDX5]], align 4
	; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1			; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[WIDE_TRIP_COUNT]]			; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[WIDE_TRIP_COUNT]]
	; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP2:![0-9]+]]			; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP2:![0-9]+]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	Show All 33 Lines