This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] reverse 'trunc X to <N x i1>' canonicalization
ClosedPublic

Authored by spatel on Oct 1 2018, 2:02 PM.

Download Raw Diff

Details

Reviewers

efriedma
craig.topper
RKSimon
lebedev.ri
javed.absar

Commits

rG05aadf885d52: [InstCombine] reverse 'trunc X to <N x i1>' canonicalization; 2nd try
rGe9ca7ea3e5c0: [InstCombine] reverse 'trunc X to <N x i1>' canonicalization
rL344181: [InstCombine] reverse 'trunc X to <N x i1>' canonicalization; 2nd try
rL344082: [InstCombine] reverse 'trunc X to <N x i1>' canonicalization

Summary

icmp ne (and X, 1), 0 --> trunc X to N x i1

Ideally, I think we'd do the same for scalars, but I'm afraid of unintended consequences.
The motivating vector case is from PR37549:
https://bugs.llvm.org/show_bug.cgi?id=37549

define <4 x float> @bitwise_select(<4 x float> %x, <4 x float> %y, <4 x float> %z, <4 x float> %w) {
  %c = fcmp ole <4 x float> %x, %y
  %s = sext <4 x i1> %c to <4 x i32>
  %s1 = shufflevector <4 x i32> %s, <4 x i32> undef, <4 x i32> <i32 0, i32 0, i32 1, i32 1>
  %s2 = shufflevector <4 x i32> %s, <4 x i32> undef, <4 x i32> <i32 2, i32 2, i32 3, i32 3>
  %cond = or <4 x i32> %s1, %s2
  %condtr = trunc <4 x i32> %cond to <4 x i1>
  %r = select <4 x i1> %condtr, <4 x float> %z, <4 x float> %w
  ret <4 x float> %r
}

Here's a sampling of the vector codegen for that case using mask+icmp (current behavior) vs. trunc (with this patch):

AVX before:

vcmpleps	%xmm1, %xmm0, %xmm0
vpermilps	$80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1]
vpermilps	$250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3]
vorps	%xmm0, %xmm1, %xmm0
vandps	LCPI0_0(%rip), %xmm0, %xmm0
vxorps	%xmm1, %xmm1, %xmm1
vpcmpeqd	%xmm1, %xmm0, %xmm0
vblendvps	%xmm0, %xmm3, %xmm2, %xmm0

AVX after:

vcmpleps	%xmm1, %xmm0, %xmm0
vpermilps	$80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1]
vpermilps	$250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3]
vorps	%xmm0, %xmm1, %xmm0
vblendvps	%xmm0, %xmm2, %xmm3, %xmm0

AVX512f before:

vcmpleps	%xmm1, %xmm0, %xmm0
vpermilps	$80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1]
vpermilps	$250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3]
vorps	%xmm0, %xmm1, %xmm0
vpbroadcastd	LCPI0_0(%rip), %xmm1 ## xmm1 = [1,1,1,1]
vptestnmd	%zmm1, %zmm0, %k1
vblendmps	%zmm3, %zmm2, %zmm0 {%k1}

AVX512f after:

vcmpleps	%xmm1, %xmm0, %xmm0
vpermilps	$80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1]
vpermilps	$250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3]
vorps	%xmm0, %xmm1, %xmm0
vpslld	$31, %xmm0, %xmm0
vptestmd	%zmm0, %zmm0, %k1
vblendmps	%zmm2, %zmm3, %zmm0 {%k1}

AArch64 before:

fcmge	v0.4s, v1.4s, v0.4s
zip1	v1.4s, v0.4s, v0.4s
zip2	v0.4s, v0.4s, v0.4s
orr	v0.16b, v1.16b, v0.16b
movi	v1.4s, #1
and	v0.16b, v0.16b, v1.16b
cmeq	v0.4s, v0.4s, #0
bsl	v0.16b, v3.16b, v2.16b

AArch64 after:

fcmge	v0.4s, v1.4s, v0.4s
zip1	v1.4s, v0.4s, v0.4s
zip2	v0.4s, v0.4s, v0.4s
orr	v0.16b, v1.16b, v0.16b
bsl	v0.16b, v2.16b, v3.16b

PowerPC-le before:

xvcmpgesp 34, 35, 34
vspltisw 0, 1
vmrglw 3, 2, 2
vmrghw 2, 2, 2
xxlor 0, 35, 34
xxlxor 35, 35, 35
xxland 34, 0, 32
vcmpequw 2, 2, 3
xxsel 34, 36, 37, 34

PowerPC-le after:

xvcmpgesp 34, 35, 34
vmrglw 3, 2, 2
vmrghw 2, 2, 2
xxlor 0, 35, 34
xxsel 34, 37, 36, 0

Diff Detail

Repository: rL LLVM

Event Timeline

spatel created this revision.Oct 1 2018, 2:02 PM

Herald added a reviewer: javed.absar. · View Herald TranscriptOct 1 2018, 2:02 PM

Herald added subscribers: kristof.beyls, mcrosier. · View Herald Transcript

craig.topper added inline comments.Oct 1 2018, 11:17 PM

lib/Transforms/InstCombine/InstCombineCompares.cpp
1712 ↗	(On Diff #167820)	Should this be in foldICmpAndConstConst? And it should use the APInts we already extracted?

Ideally, I think we'd do the same for scalars, but I'm afraid of unintended consequences.

I *think* this originates from rL67635.

In D52747#1252143, @lebedev.ri wrote:

Ideally, I think we'd do the same for scalars, but I'm afraid of unintended consequences.

I *think* this originates from rL67635.

Thanks for digging that up! As the codegen examples here show, the icmp variant is not always better for vector codegen at least (we could just fix the backend, but since we can reduce the IR, I figured that's the better option).

My bigger worry is that we're going to expose IR-level holes for trunc patterns with scalar code (and-of-icmps and similar or the example with shift that's now shown here). Those seem less likely for vector code.

lib/Transforms/InstCombine/InstCombineCompares.cpp
1712 ↗	(On Diff #167820)	There are 2 independent problems here, and I should have put this in a code comment: If we use the already extracted APInt values, we won't handle vectors with undefs because m_APInt doesn't match those (yet). So in cases like this, I've been using the more specific matcher even if it looks redundant. I will add a test that includes undefs in the constant vector values. Depending on where we position this fold, it exposes another canonicalization question because it will affect patterns with shifts like this: %shr = ashr <2 x i84> %X, <i84 4, i84 4> %and = and <2 x i84> %shr, <i84 1, i84 1> %cmp = icmp ne <2 x i84> %and, zeroinitializer Should that become: %m = and <2 x i84> %X, <i84 16, i84 16> %cmp = icmp ne <2 x i84> %m, zeroinitializer or: %sh = lshr <2 x i84> %X, <i84 4, i84 4> %cmp = trunc <2 x i84> %sh to <2 x i1> This patch sidesteps that question by allowing the larger pattern to match first, but we could make that a prerequisite step for this patch.

If we use the already extracted APInt values, we won't handle vectors with undefs because m_APInt doesn't match those (yet).
So in cases like this, I've been using the more specific matcher even if it looks redundant.
I will add a test that includes undefs in the constant vector values.

Looking at this closer...as the patch is written currently, we would fail to match if the compare constant (zero) has undefs because we already used m_APInt as a condition to get here in the first place.

spatel mentioned this in rL343595: [InstCombine] add tests with undef elements; NFC.Oct 2 2018, 8:02 AM

Patch updated:

Moved the and+icmp --> trunc transform earlier in visitICmpInst, so we have a better idea about potential regressions.
This required adding 2 trunc folds to avoid known regressions. These transforms have phantom (cosmetic-only) test diffs in apint-shift.ll and icmp.ll::icmp_and_or_lshr_cst_vec(), so we can see that the new code is firing on the patterns with trunc.
The other test diffs are all wins in IR (less instructions). Included in that, we see that 1 of the existing icmp transforms that we're replacing doesn't work if the operands are commuted.
The loop vectorizer tests (running with full -O3 in that test file...) produce mixed results in codegen: both tests improve (less instructions in the inner loop) on KNL, but regress (more instructions in the inner loop) with AVX2. Note: that diff should've been in the previous rev of this patch, but I missed it.

So we have IR improvements in all cases shown here (but there could be regressions for patterns that have no vector test coverage), codegen improvements for the motivating blendv cases across a range of targets, codegen improvements on larger loop tests on AVX512, but codegen regressions with that same IR on AVX2.

@craig.topper The final codegen from the updated IR in masked_load_store.ll regresses due to masked stores not making use of only requiring the MSB of the mask vector (lots of SIGN_EXTEND_INREG etc.) - X86ISelLowering's combineMaskedStore only handles the PCMPGT case, how tricky would it be to replace it with a general SimplifyDemandedBits calls?

In D52747#1257240, @RKSimon wrote:

@craig.topper The final codegen from the updated IR in masked_load_store.ll regresses due to masked stores not making use of only requiring the MSB of the mask vector (lots of SIGN_EXTEND_INREG etc.) - X86ISelLowering's combineMaskedStore only handles the PCMPGT case, how tricky would it be to replace it with a general SimplifyDemandedBits calls?

I have a draft of that patch in progress. Let me add some tests, clean it up, and post it. That's the only regression that I'm aware of from this patch, so we can make this patch dependent on that one.

spatel mentioned this in D52964: [x86] use demanded bits to simplify masked store codegen.Oct 6 2018, 7:30 AM

spatel added a parent revision: D52964: [x86] use demanded bits to simplify masked store codegen.Oct 6 2018, 7:32 AM

spatel mentioned this in rL344048: [x86] use demanded bits to simplify masked store codegen.Oct 9 2018, 7:06 AM

Now that D52964 has landed - is there anything stopping this?

In D52747#1258801, @RKSimon wrote:

Now that D52964 has landed - is there anything stopping this?

IMO, no. The known IR regressions are now handled with additional trunc pattern matching, so all changes in IR shown here are improvements. All known codegen regressions have been squashed.

LGTM - thanks

This revision is now accepted and ready to land.Oct 9 2018, 8:02 AM

Closed by commit rL344082: [InstCombine] reverse 'trunc X to <N x i1>' canonicalization (authored by spatel). · Explain WhyOct 9 2018, 2:27 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Transforms/

InstCombine/

InstCombineCasts.cpp

31 lines

InstCombineCompares.cpp

7 lines

InstCombineVectorOps.cpp

30 lines

test/

Transforms/

InstCombine/

4 lines

5 lines

20 lines

5 lines

19 lines

LoopVectorize/

X86/

masked_load_store.ll

336 lines

Diff 168875

llvm/trunk/lib/Transforms/InstCombine/InstCombineCasts.cpp

Show First 20 Lines • Show All 700 Lines • ▼ Show 20 Lines	if (SelectInst *SI = dyn_cast<SelectInst>(CI.getOperand(0)))
if (matchSelectPattern(SI, LHS, RHS).Flavor != SPF_UNKNOWN)		if (matchSelectPattern(SI, LHS, RHS).Flavor != SPF_UNKNOWN)
return nullptr;		return nullptr;

// See if we can simplify any instructions used by the input whose sole		// See if we can simplify any instructions used by the input whose sole
// purpose is to compute bits we don't care about.		// purpose is to compute bits we don't care about.
if (SimplifyDemandedInstructionBits(CI))		if (SimplifyDemandedInstructionBits(CI))
return &CI;		return &CI;

// Canonicalize trunc x to i1 -> (icmp ne (and x, 1), 0), likewise for vector.
if (DestTy->getScalarSizeInBits() == 1) {		if (DestTy->getScalarSizeInBits() == 1) {
Constant *One = ConstantInt::get(SrcTy, 1);
Src = Builder.CreateAnd(Src, One);
Value *Zero = Constant::getNullValue(Src->getType());		Value *Zero = Constant::getNullValue(Src->getType());
return new ICmpInst(ICmpInst::ICMP_NE, Src, Zero);		if (DestTy->isIntegerTy()) {
		// Canonicalize trunc x to i1 -> icmp ne (and x, 1), 0 (scalar only).
		// TODO: We canonicalize to more instructions here because we are probably
		// lacking equivalent analysis for trunc relative to icmp. There may also
		// be codegen concerns. If those trunc limitations were removed, we could
		// remove this transform.
		Value *And = Builder.CreateAnd(Src, ConstantInt::get(SrcTy, 1));
		return new ICmpInst(ICmpInst::ICMP_NE, And, Zero);
		}

		// For vectors, we do not canonicalize all truncs to icmp, so optimize
		// patterns that would be covered within visitICmpInst.
		Value *X;
		const APInt *C;
		if (match(Src, m_OneUse(m_LShr(m_Value(X), m_APInt(C))))) {
		// trunc (lshr X, C) to i1 --> icmp ne (and X, C'), 0
		APInt MaskC = APInt(SrcTy->getScalarSizeInBits(), 1).shl(*C);
		Value *And = Builder.CreateAnd(X, ConstantInt::get(SrcTy, MaskC));
		return new ICmpInst(ICmpInst::ICMP_NE, And, Zero);
		}
		if (match(Src, m_OneUse(m_c_Or(m_LShr(m_Value(X), m_APInt(C)),
		m_Deferred(X))))) {
		// trunc (or (lshr X, C), X) to i1 --> icmp ne (and X, C'), 0
		APInt MaskC = APInt(SrcTy->getScalarSizeInBits(), 1).shl(*C) \| 1;
		Value *And = Builder.CreateAnd(X, ConstantInt::get(SrcTy, MaskC));
		return new ICmpInst(ICmpInst::ICMP_NE, And, Zero);
		}
}		}

// FIXME: Maybe combine the next two transforms to handle the no cast case		// FIXME: Maybe combine the next two transforms to handle the no cast case
// more efficiently. Support vector types. Cleanup code by using m_OneUse.		// more efficiently. Support vector types. Cleanup code by using m_OneUse.

// Transform trunc(lshr (zext A), Cst) to eliminate one type conversion.		// Transform trunc(lshr (zext A), Cst) to eliminate one type conversion.
Value A = nullptr; ConstantInt Cst = nullptr;		Value A = nullptr; ConstantInt Cst = nullptr;
if (Src->hasOneUse() &&		if (Src->hasOneUse() &&
▲ Show 20 Lines • Show All 1,676 Lines • Show Last 20 Lines

llvm/trunk/lib/Transforms/InstCombine/InstCombineCompares.cpp

Show First 20 Lines • Show All 1,603 Lines • ▼ Show 20 Lines	Instruction InstCombiner::foldICmpAndShift(ICmpInst &Cmp, BinaryOperator And,

return nullptr;		return nullptr;
}		}

/// Fold icmp (and X, C2), C1.		/// Fold icmp (and X, C2), C1.
Instruction *InstCombiner::foldICmpAndConstConst(ICmpInst &Cmp,		Instruction *InstCombiner::foldICmpAndConstConst(ICmpInst &Cmp,
BinaryOperator *And,		BinaryOperator *And,
const APInt &C1) {		const APInt &C1) {
		// For vectors: icmp ne (and X, 1), 0 --> trunc X to N x i1
		// TODO: We canonicalize to the longer form for scalars because we have
		// better analysis/folds for icmp, and codegen may be better with icmp.
		if (Cmp.getPredicate() == CmpInst::ICMP_NE && Cmp.getType()->isVectorTy() &&
		C1.isNullValue() && match(And->getOperand(1), m_One()))
		return new TruncInst(And->getOperand(0), Cmp.getType());

const APInt *C2;		const APInt *C2;
if (!match(And->getOperand(1), m_APInt(C2)))		if (!match(And->getOperand(1), m_APInt(C2)))
return nullptr;		return nullptr;

if (!And->hasOneUse())		if (!And->hasOneUse())
return nullptr;		return nullptr;

// If the LHS is an 'and' of a truncate and we can widen the and/compare to		// If the LHS is an 'and' of a truncate and we can widen the and/compare to
▲ Show 20 Lines • Show All 3,866 Lines • Show Last 20 Lines

llvm/trunk/lib/Transforms/InstCombine/InstCombineVectorOps.cpp

Show First 20 Lines • Show All 1,471 Lines • ▼ Show 20 Lines	static Instruction *narrowVectorSelect(ShuffleVectorInst &Shuf,
// shuf (sel (shuf NarrowCond, undef, WideMask), X, Y), undef, NarrowMask) -->		// shuf (sel (shuf NarrowCond, undef, WideMask), X, Y), undef, NarrowMask) -->
// sel NarrowCond, (shuf X, undef, NarrowMask), (shuf Y, undef, NarrowMask)		// sel NarrowCond, (shuf X, undef, NarrowMask), (shuf Y, undef, NarrowMask)
Value *Undef = UndefValue::get(X->getType());		Value *Undef = UndefValue::get(X->getType());
Value *NarrowX = Builder.CreateShuffleVector(X, Undef, Shuf.getMask());		Value *NarrowX = Builder.CreateShuffleVector(X, Undef, Shuf.getMask());
Value *NarrowY = Builder.CreateShuffleVector(Y, Undef, Shuf.getMask());		Value *NarrowY = Builder.CreateShuffleVector(Y, Undef, Shuf.getMask());
return SelectInst::Create(NarrowCond, NarrowX, NarrowY);		return SelectInst::Create(NarrowCond, NarrowX, NarrowY);
}		}

		/// Try to combine 2 shuffles into 1 shuffle by concatenating a shuffle mask.
		static Instruction *foldIdentityExtractShuffle(ShuffleVectorInst &Shuf) {
		Value Op0 = Shuf.getOperand(0), Op1 = Shuf.getOperand(1);
		if (!Shuf.isIdentityWithExtract() \|\| !isa<UndefValue>(Op1))
		return nullptr;

		Value X, Y;
		Constant *Mask;
		if (!match(Op0, m_ShuffleVector(m_Value(X), m_Value(Y), m_Constant(Mask))))
		return nullptr;

		// We are extracting a subvector from a shuffle. Remove excess elements from
		// the 1st shuffle mask to eliminate the extract.
		// shuf (shuf X, Y, <C0, C1, C2, C3>), undef, <0, undef, 2> -->
		// shuf X, Y, <C0, undef, C2>
		unsigned NumElts = Shuf.getType()->getVectorNumElements();
		SmallVector<Constant *, 16> NewMask(NumElts);
		for (unsigned i = 0; i != NumElts; ++i) {
		// If the extracting shuffle has an undef mask element, it transfers to the
		// new shuffle mask. Otherwise, copy the original mask element.
		Constant *ExtractMaskElt = Shuf.getMask()->getAggregateElement(i);
		Constant *MaskElt = Mask->getAggregateElement(i);
		NewMask[i] = isa<UndefValue>(ExtractMaskElt) ? ExtractMaskElt : MaskElt;
		}
		return new ShuffleVectorInst(X, Y, ConstantVector::get(NewMask));
		}

Instruction *InstCombiner::visitShuffleVectorInst(ShuffleVectorInst &SVI) {		Instruction *InstCombiner::visitShuffleVectorInst(ShuffleVectorInst &SVI) {
Value *LHS = SVI.getOperand(0);		Value *LHS = SVI.getOperand(0);
Value *RHS = SVI.getOperand(1);		Value *RHS = SVI.getOperand(1);
if (auto *V = SimplifyShuffleVectorInst(		if (auto *V = SimplifyShuffleVectorInst(
LHS, RHS, SVI.getMask(), SVI.getType(), SQ.getWithInstruction(&SVI)))		LHS, RHS, SVI.getMask(), SVI.getType(), SQ.getWithInstruction(&SVI)))
return replaceInstUsesWith(SVI, V);		return replaceInstUsesWith(SVI, V);

if (Instruction *I = foldSelectShuffle(SVI, Builder, DL))		if (Instruction *I = foldSelectShuffle(SVI, Builder, DL))
return I;		return I;

if (Instruction *I = narrowVectorSelect(SVI, Builder))		if (Instruction *I = narrowVectorSelect(SVI, Builder))
return I;		return I;

unsigned VWidth = SVI.getType()->getVectorNumElements();		unsigned VWidth = SVI.getType()->getVectorNumElements();
APInt UndefElts(VWidth, 0);		APInt UndefElts(VWidth, 0);
APInt AllOnesEltMask(APInt::getAllOnesValue(VWidth));		APInt AllOnesEltMask(APInt::getAllOnesValue(VWidth));
if (Value *V = SimplifyDemandedVectorElts(&SVI, AllOnesEltMask, UndefElts)) {		if (Value *V = SimplifyDemandedVectorElts(&SVI, AllOnesEltMask, UndefElts)) {
if (V != &SVI)		if (V != &SVI)
return replaceInstUsesWith(SVI, V);		return replaceInstUsesWith(SVI, V);
return &SVI;		return &SVI;
}		}

		if (Instruction *I = foldIdentityExtractShuffle(SVI))
		return I;

SmallVector<int, 16> Mask = SVI.getShuffleMask();		SmallVector<int, 16> Mask = SVI.getShuffleMask();
Type *Int32Ty = Type::getInt32Ty(SVI.getContext());		Type *Int32Ty = Type::getInt32Ty(SVI.getContext());
unsigned LHSWidth = LHS->getType()->getVectorNumElements();		unsigned LHSWidth = LHS->getType()->getVectorNumElements();
bool MadeChange = false;		bool MadeChange = false;

// Canonicalize shuffle(x ,x,mask) -> shuffle(x, undef,mask')		// Canonicalize shuffle(x ,x,mask) -> shuffle(x, undef,mask')
// Canonicalize shuffle(undef,x,mask) -> shuffle(x, undef,mask').		// Canonicalize shuffle(undef,x,mask) -> shuffle(x, undef,mask').
if (LHS == RHS \|\| isa<UndefValue>(LHS)) {		if (LHS == RHS \|\| isa<UndefValue>(LHS)) {
▲ Show 20 Lines • Show All 320 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/InstCombine/apint-shift.ll

Show First 20 Lines • Show All 313 Lines • ▼ Show 20 Lines	;
%shr = ashr i84 %X, 4		%shr = ashr i84 %X, 4
%and = and i84 %shr, 1		%and = and i84 %shr, 1
%cmp = icmp ne i84 %and, 0		%cmp = icmp ne i84 %and, 0
ret i1 %cmp		ret i1 %cmp
}		}

define <2 x i1> @test16vec(<2 x i84> %X) {		define <2 x i1> @test16vec(<2 x i84> %X) {
; CHECK-LABEL: @test16vec(		; CHECK-LABEL: @test16vec(
; CHECK-NEXT: [[AND:%.*]] = and <2 x i84> %X, <i84 16, i84 16>		; CHECK-NEXT: [[TMP1:%.]] = and <2 x i84> [[X:%.]], <i84 16, i84 16>
; CHECK-NEXT: [[CMP:%.*]] = icmp ne <2 x i84> [[AND]], zeroinitializer		; CHECK-NEXT: [[CMP:%.*]] = icmp ne <2 x i84> [[TMP1]], zeroinitializer
; CHECK-NEXT: ret <2 x i1> [[CMP]]		; CHECK-NEXT: ret <2 x i1> [[CMP]]
;		;
%shr = ashr <2 x i84> %X, <i84 4, i84 4>		%shr = ashr <2 x i84> %X, <i84 4, i84 4>
%and = and <2 x i84> %shr, <i84 1, i84 1>		%and = and <2 x i84> %shr, <i84 1, i84 1>
%cmp = icmp ne <2 x i84> %and, zeroinitializer		%cmp = icmp ne <2 x i84> %and, zeroinitializer
ret <2 x i1> %cmp		ret <2 x i1> %cmp
}		}

▲ Show 20 Lines • Show All 197 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/InstCombine/apint-shl-trunc.ll

	Show All 21 Lines
	;			;
	%B = lshr i799 %X, %A			%B = lshr i799 %X, %A
	%D = trunc i799 %B to i1			%D = trunc i799 %B to i1
	ret i1 %D			ret i1 %D
	}			}

	define <2 x i1> @test0vec(<2 x i39> %X, <2 x i39> %A) {			define <2 x i1> @test0vec(<2 x i39> %X, <2 x i39> %A) {
	; CHECK-LABEL: @test0vec(			; CHECK-LABEL: @test0vec(
	; CHECK-NEXT: [[TMP1:%.]] = shl <2 x i39> <i39 1, i39 1>, [[A:%.]]			; CHECK-NEXT: [[B:%.]] = lshr <2 x i39> [[X:%.]], [[A:%.*]]
	; CHECK-NEXT: [[TMP2:%.]] = and <2 x i39> [[TMP1]], [[X:%.]]			; CHECK-NEXT: [[D:%.*]] = trunc <2 x i39> [[B]] to <2 x i1>
	; CHECK-NEXT: [[D:%.*]] = icmp ne <2 x i39> [[TMP2]], zeroinitializer
	; CHECK-NEXT: ret <2 x i1> [[D]]			; CHECK-NEXT: ret <2 x i1> [[D]]
	;			;
	%B = lshr <2 x i39> %X, %A			%B = lshr <2 x i39> %X, %A
	%D = trunc <2 x i39> %B to <2 x i1>			%D = trunc <2 x i39> %B to <2 x i1>
	ret <2 x i1> %D			ret <2 x i1> %D
	}			}

llvm/trunk/test/Transforms/InstCombine/icmp.ll

Show First 20 Lines • Show All 2,421 Lines • ▼ Show 20 Lines	;
%or = or i32 %shf, %x		%or = or i32 %shf, %x
%and = and i32 %or, 1		%and = and i32 %or, 1
%ret = icmp ne i32 %and, 0		%ret = icmp ne i32 %and, 0
ret i1 %ret		ret i1 %ret
}		}

define <2 x i1> @icmp_and_or_lshr_vec(<2 x i32> %x, <2 x i32> %y) {		define <2 x i1> @icmp_and_or_lshr_vec(<2 x i32> %x, <2 x i32> %y) {
; CHECK-LABEL: @icmp_and_or_lshr_vec(		; CHECK-LABEL: @icmp_and_or_lshr_vec(
; CHECK-NEXT: [[SHF1:%.]] = shl nuw <2 x i32> <i32 1, i32 1>, [[Y:%.]]		; CHECK-NEXT: [[SHF:%.]] = lshr <2 x i32> [[X:%.]], [[Y:%.*]]
; CHECK-NEXT: [[OR2:%.*]] = or <2 x i32> [[SHF1]], <i32 1, i32 1>		; CHECK-NEXT: [[OR:%.*]] = or <2 x i32> [[SHF]], [[X]]
; CHECK-NEXT: [[AND3:%.]] = and <2 x i32> [[OR2]], [[X:%.]]		; CHECK-NEXT: [[RET:%.*]] = trunc <2 x i32> [[OR]] to <2 x i1>
; CHECK-NEXT: [[RET:%.*]] = icmp ne <2 x i32> [[AND3]], zeroinitializer
; CHECK-NEXT: ret <2 x i1> [[RET]]		; CHECK-NEXT: ret <2 x i1> [[RET]]
;		;
%shf = lshr <2 x i32> %x, %y		%shf = lshr <2 x i32> %x, %y
%or = or <2 x i32> %shf, %x		%or = or <2 x i32> %shf, %x
%and = and <2 x i32> %or, <i32 1, i32 1>		%and = and <2 x i32> %or, <i32 1, i32 1>
%ret = icmp ne <2 x i32> %and, zeroinitializer		%ret = icmp ne <2 x i32> %and, zeroinitializer
ret <2 x i1> %ret		ret <2 x i1> %ret
}		}

define <2 x i1> @icmp_and_or_lshr_vec_commute(<2 x i32> %xp, <2 x i32> %y) {		define <2 x i1> @icmp_and_or_lshr_vec_commute(<2 x i32> %xp, <2 x i32> %y) {
; CHECK-LABEL: @icmp_and_or_lshr_vec_commute(		; CHECK-LABEL: @icmp_and_or_lshr_vec_commute(
; CHECK-NEXT: [[X:%.]] = srem <2 x i32> [[XP:%.]], <i32 42, i32 42>		; CHECK-NEXT: [[X:%.]] = srem <2 x i32> [[XP:%.]], <i32 42, i32 42>
; CHECK-NEXT: [[SHF:%.]] = lshr <2 x i32> [[X]], [[Y:%.]]		; CHECK-NEXT: [[SHF:%.]] = lshr <2 x i32> [[X]], [[Y:%.]]
; CHECK-NEXT: [[OR:%.*]] = or <2 x i32> [[X]], [[SHF]]		; CHECK-NEXT: [[OR:%.*]] = or <2 x i32> [[X]], [[SHF]]
; CHECK-NEXT: [[AND:%.*]] = and <2 x i32> [[OR]], <i32 1, i32 1>		; CHECK-NEXT: [[RET:%.*]] = trunc <2 x i32> [[OR]] to <2 x i1>
; CHECK-NEXT: [[RET:%.*]] = icmp ne <2 x i32> [[AND]], zeroinitializer
; CHECK-NEXT: ret <2 x i1> [[RET]]		; CHECK-NEXT: ret <2 x i1> [[RET]]
;		;
%x = srem <2 x i32> %xp, <i32 42, i32 -42> ; prevent complexity-based canonicalization		%x = srem <2 x i32> %xp, <i32 42, i32 -42> ; prevent complexity-based canonicalization
%shf = lshr <2 x i32> %x, %y		%shf = lshr <2 x i32> %x, %y
%or = or <2 x i32> %x, %shf		%or = or <2 x i32> %x, %shf
%and = and <2 x i32> %or, <i32 1, i32 1>		%and = and <2 x i32> %or, <i32 1, i32 1>
%ret = icmp ne <2 x i32> %and, zeroinitializer		%ret = icmp ne <2 x i32> %and, zeroinitializer
ret <2 x i1> %ret		ret <2 x i1> %ret
Show All 9 Lines	;
%or = or i32 %shf, %x		%or = or i32 %shf, %x
%and = and i32 %or, 1		%and = and i32 %or, 1
%ret = icmp ne i32 %and, 0		%ret = icmp ne i32 %and, 0
ret i1 %ret		ret i1 %ret
}		}

define <2 x i1> @icmp_and_or_lshr_cst_vec(<2 x i32> %x) {		define <2 x i1> @icmp_and_or_lshr_cst_vec(<2 x i32> %x) {
; CHECK-LABEL: @icmp_and_or_lshr_cst_vec(		; CHECK-LABEL: @icmp_and_or_lshr_cst_vec(
; CHECK-NEXT: [[AND1:%.]] = and <2 x i32> [[X:%.]], <i32 3, i32 3>		; CHECK-NEXT: [[TMP1:%.]] = and <2 x i32> [[X:%.]], <i32 3, i32 3>
; CHECK-NEXT: [[RET:%.*]] = icmp ne <2 x i32> [[AND1]], zeroinitializer		; CHECK-NEXT: [[RET:%.*]] = icmp ne <2 x i32> [[TMP1]], zeroinitializer
; CHECK-NEXT: ret <2 x i1> [[RET]]		; CHECK-NEXT: ret <2 x i1> [[RET]]
;		;
%shf = lshr <2 x i32> %x, <i32 1, i32 1>		%shf = lshr <2 x i32> %x, <i32 1, i32 1>
%or = or <2 x i32> %shf, %x		%or = or <2 x i32> %shf, %x
%and = and <2 x i32> %or, <i32 1, i32 1>		%and = and <2 x i32> %or, <i32 1, i32 1>
%ret = icmp ne <2 x i32> %and, zeroinitializer		%ret = icmp ne <2 x i32> %and, zeroinitializer
ret <2 x i1> %ret		ret <2 x i1> %ret
}		}

define <2 x i1> @icmp_and_or_lshr_cst_vec_commute(<2 x i32> %xp) {		define <2 x i1> @icmp_and_or_lshr_cst_vec_commute(<2 x i32> %xp) {
; CHECK-LABEL: @icmp_and_or_lshr_cst_vec_commute(		; CHECK-LABEL: @icmp_and_or_lshr_cst_vec_commute(
; CHECK-NEXT: [[X:%.]] = srem <2 x i32> [[XP:%.]], <i32 42, i32 42>		; CHECK-NEXT: [[X:%.]] = srem <2 x i32> [[XP:%.]], <i32 42, i32 42>
; CHECK-NEXT: [[SHF:%.*]] = lshr <2 x i32> [[X]], <i32 1, i32 1>		; CHECK-NEXT: [[TMP1:%.*]] = and <2 x i32> [[X]], <i32 3, i32 3>
; CHECK-NEXT: [[OR:%.*]] = or <2 x i32> [[X]], [[SHF]]		; CHECK-NEXT: [[RET:%.*]] = icmp ne <2 x i32> [[TMP1]], zeroinitializer
; CHECK-NEXT: [[AND:%.*]] = and <2 x i32> [[OR]], <i32 1, i32 1>
; CHECK-NEXT: [[RET:%.*]] = icmp ne <2 x i32> [[AND]], zeroinitializer
; CHECK-NEXT: ret <2 x i1> [[RET]]		; CHECK-NEXT: ret <2 x i1> [[RET]]
;		;
%x = srem <2 x i32> %xp, <i32 42, i32 -42> ; prevent complexity-based canonicalization		%x = srem <2 x i32> %xp, <i32 42, i32 -42> ; prevent complexity-based canonicalization
%shf = lshr <2 x i32> %x, <i32 1, i32 1>		%shf = lshr <2 x i32> %x, <i32 1, i32 1>
%or = or <2 x i32> %x, %shf		%or = or <2 x i32> %x, %shf
%and = and <2 x i32> %or, <i32 1, i32 1>		%and = and <2 x i32> %or, <i32 1, i32 1>
%ret = icmp ne <2 x i32> %and, zeroinitializer		%ret = icmp ne <2 x i32> %and, zeroinitializer
ret <2 x i1> %ret		ret <2 x i1> %ret
▲ Show 20 Lines • Show All 1,024 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/InstCombine/vec_shuffle.ll

Show First 20 Lines • Show All 168 Lines • ▼ Show 20 Lines	;
%t3 = shufflevector <8 x i8> %t2, <8 x i8> %t1, <8 x i32> <i32 0, i32 3, i32 1, i32 4, i32 8, i32 9, i32 10, i32 11>		%t3 = shufflevector <8 x i8> %t2, <8 x i8> %t1, <8 x i32> <i32 0, i32 3, i32 1, i32 4, i32 8, i32 9, i32 10, i32 11>
ret <8 x i8> %t3		ret <8 x i8> %t3
}		}

; TODO: The mask length of the 1st shuffle can be reduced to eliminate the 2nd shuffle.		; TODO: The mask length of the 1st shuffle can be reduced to eliminate the 2nd shuffle.

define <2 x i8> @extract_subvector_of_shuffle(<2 x i8> %x, <2 x i8> %y) {		define <2 x i8> @extract_subvector_of_shuffle(<2 x i8> %x, <2 x i8> %y) {
; CHECK-LABEL: @extract_subvector_of_shuffle(		; CHECK-LABEL: @extract_subvector_of_shuffle(
; CHECK-NEXT: [[SHUF:%.]] = shufflevector <2 x i8> [[X:%.]], <2 x i8> [[Y:%.*]], <3 x i32> <i32 0, i32 2, i32 undef>		; CHECK-NEXT: [[EXTRACT_SUBV:%.]] = shufflevector <2 x i8> [[X:%.]], <2 x i8> [[Y:%.*]], <2 x i32> <i32 0, i32 2>
; CHECK-NEXT: [[EXTRACT_SUBV:%.*]] = shufflevector <3 x i8> [[SHUF]], <3 x i8> undef, <2 x i32> <i32 0, i32 1>
; CHECK-NEXT: ret <2 x i8> [[EXTRACT_SUBV]]		; CHECK-NEXT: ret <2 x i8> [[EXTRACT_SUBV]]
;		;
%shuf = shufflevector <2 x i8> %x, <2 x i8> %y, <3 x i32> <i32 0, i32 2, i32 0>		%shuf = shufflevector <2 x i8> %x, <2 x i8> %y, <3 x i32> <i32 0, i32 2, i32 0>
%extract_subv = shufflevector <3 x i8> %shuf, <3 x i8> undef, <2 x i32> <i32 0, i32 1>		%extract_subv = shufflevector <3 x i8> %shuf, <3 x i8> undef, <2 x i32> <i32 0, i32 1>
ret <2 x i8> %extract_subv		ret <2 x i8> %extract_subv
}		}

; TODO:		; TODO:
; Extra uses are ok.		; Extra uses are ok.
; Undef elements in either mask are ok. Undefs from the 2nd shuffle mask should propagate to the new shuffle.		; Undef elements in either mask are ok. Undefs from the 2nd shuffle mask should propagate to the new shuffle.
; The type of the inputs does not have to match the output type.		; The type of the inputs does not have to match the output type.

declare void @use_v5i8(<5 x i8>)		declare void @use_v5i8(<5 x i8>)

define <4 x i8> @extract_subvector_of_shuffle_extra_use(<2 x i8> %x, <2 x i8> %y) {		define <4 x i8> @extract_subvector_of_shuffle_extra_use(<2 x i8> %x, <2 x i8> %y) {
; CHECK-LABEL: @extract_subvector_of_shuffle_extra_use(		; CHECK-LABEL: @extract_subvector_of_shuffle_extra_use(
; CHECK-NEXT: [[SHUF:%.]] = shufflevector <2 x i8> [[X:%.]], <2 x i8> [[Y:%.*]], <5 x i32> <i32 undef, i32 2, i32 0, i32 1, i32 0>		; CHECK-NEXT: [[SHUF:%.]] = shufflevector <2 x i8> [[X:%.]], <2 x i8> [[Y:%.*]], <5 x i32> <i32 undef, i32 2, i32 0, i32 1, i32 0>
; CHECK-NEXT: call void @use_v5i8(<5 x i8> [[SHUF]])		; CHECK-NEXT: call void @use_v5i8(<5 x i8> [[SHUF]])
; CHECK-NEXT: [[EXTRACT_SUBV:%.*]] = shufflevector <5 x i8> [[SHUF]], <5 x i8> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 undef>		; CHECK-NEXT: [[EXTRACT_SUBV:%.*]] = shufflevector <2 x i8> [[X]], <2 x i8> [[Y]], <4 x i32> <i32 undef, i32 2, i32 0, i32 undef>
; CHECK-NEXT: ret <4 x i8> [[EXTRACT_SUBV]]		; CHECK-NEXT: ret <4 x i8> [[EXTRACT_SUBV]]
;		;
%shuf = shufflevector <2 x i8> %x, <2 x i8> %y, <5 x i32> <i32 undef, i32 2, i32 0, i32 1, i32 0>		%shuf = shufflevector <2 x i8> %x, <2 x i8> %y, <5 x i32> <i32 undef, i32 2, i32 0, i32 1, i32 0>
call void @use_v5i8(<5 x i8> %shuf)		call void @use_v5i8(<5 x i8> %shuf)
%extract_subv = shufflevector <5 x i8> %shuf, <5 x i8> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 undef>		%extract_subv = shufflevector <5 x i8> %shuf, <5 x i8> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 undef>
ret <4 x i8> %extract_subv		ret <4 x i8> %extract_subv
}		}

▲ Show 20 Lines • Show All 857 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/InstCombine/vector-casts.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -instcombine -S \| FileCheck %s			; RUN: opt < %s -instcombine -S \| FileCheck %s

	; This turns into a&1 != 0			; Can't get smaller than this.
	; TODO: The bar for canonicalizing to something bigger than the minimal length IR is very high.
	; This pattern does not appear to meet that standard.

	define <2 x i1> @trunc(<2 x i64> %a) {			define <2 x i1> @trunc(<2 x i64> %a) {
	; CHECK-LABEL: @trunc(			; CHECK-LABEL: @trunc(
	; CHECK-NEXT: [[TMP1:%.]] = and <2 x i64> [[A:%.]], <i64 1, i64 1>			; CHECK-NEXT: [[T:%.]] = trunc <2 x i64> [[A:%.]] to <2 x i1>
	; CHECK-NEXT: [[T:%.*]] = icmp ne <2 x i64> [[TMP1]], zeroinitializer
	; CHECK-NEXT: ret <2 x i1> [[T]]			; CHECK-NEXT: ret <2 x i1> [[T]]
	;			;
	%t = trunc <2 x i64> %a to <2 x i1>			%t = trunc <2 x i64> %a to <2 x i1>
	ret <2 x i1> %t			ret <2 x i1> %t
	}			}

	; TODO: This could be just 1 instruction (trunc).			; This is trunc.

	define <2 x i1> @and_cmp_is_trunc(<2 x i64> %a) {			define <2 x i1> @and_cmp_is_trunc(<2 x i64> %a) {
	; CHECK-LABEL: @and_cmp_is_trunc(			; CHECK-LABEL: @and_cmp_is_trunc(
	; CHECK-NEXT: [[T:%.]] = and <2 x i64> [[A:%.]], <i64 1, i64 1>			; CHECK-NEXT: [[R:%.]] = trunc <2 x i64> [[A:%.]] to <2 x i1>
	; CHECK-NEXT: [[R:%.*]] = icmp ne <2 x i64> [[T]], zeroinitializer
	; CHECK-NEXT: ret <2 x i1> [[R]]			; CHECK-NEXT: ret <2 x i1> [[R]]
	;			;
	%t = and <2 x i64> %a, <i64 1, i64 1>			%t = and <2 x i64> %a, <i64 1, i64 1>
	%r = icmp ne <2 x i64> %t, zeroinitializer			%r = icmp ne <2 x i64> %t, zeroinitializer
	ret <2 x i1> %r			ret <2 x i1> %r
	}			}

	; TODO: This could be just 1 instruction (trunc).			; This is trunc.

	define <2 x i1> @and_cmp_is_trunc_even_with_undef_elt(<2 x i64> %a) {			define <2 x i1> @and_cmp_is_trunc_even_with_undef_elt(<2 x i64> %a) {
	; CHECK-LABEL: @and_cmp_is_trunc_even_with_undef_elt(			; CHECK-LABEL: @and_cmp_is_trunc_even_with_undef_elt(
	; CHECK-NEXT: [[T:%.]] = and <2 x i64> [[A:%.]], <i64 undef, i64 1>			; CHECK-NEXT: [[R:%.]] = trunc <2 x i64> [[A:%.]] to <2 x i1>
	; CHECK-NEXT: [[R:%.*]] = icmp ne <2 x i64> [[T]], zeroinitializer
	; CHECK-NEXT: ret <2 x i1> [[R]]			; CHECK-NEXT: ret <2 x i1> [[R]]
	;			;
	%t = and <2 x i64> %a, <i64 undef, i64 1>			%t = and <2 x i64> %a, <i64 undef, i64 1>
	%r = icmp ne <2 x i64> %t, zeroinitializer			%r = icmp ne <2 x i64> %t, zeroinitializer
	ret <2 x i1> %r			ret <2 x i1> %r
	}			}

	; TODO: This could be just 1 instruction (trunc).			; TODO: This could be just 1 instruction (trunc), but our undef matching is incomplete.

	define <2 x i1> @and_cmp_is_trunc_even_with_undef_elts(<2 x i64> %a) {			define <2 x i1> @and_cmp_is_trunc_even_with_undef_elts(<2 x i64> %a) {
	; CHECK-LABEL: @and_cmp_is_trunc_even_with_undef_elts(			; CHECK-LABEL: @and_cmp_is_trunc_even_with_undef_elts(
	; CHECK-NEXT: [[T:%.]] = and <2 x i64> [[A:%.]], <i64 undef, i64 1>			; CHECK-NEXT: [[T:%.]] = and <2 x i64> [[A:%.]], <i64 undef, i64 1>
	; CHECK-NEXT: [[R:%.*]] = icmp ne <2 x i64> [[T]], <i64 undef, i64 0>			; CHECK-NEXT: [[R:%.*]] = icmp ne <2 x i64> [[T]], <i64 undef, i64 0>
	; CHECK-NEXT: ret <2 x i1> [[R]]			; CHECK-NEXT: ret <2 x i1> [[R]]
	;			;
	%t = and <2 x i64> %a, <i64 undef, i64 1>			%t = and <2 x i64> %a, <i64 undef, i64 1>
	▲ Show 20 Lines • Show All 332 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/LoopVectorize/X86/masked_load_store.ll

	Show First 20 Lines • Show All 2,895 Lines • ▼ Show 20 Lines
	; AVX-NEXT: [[TMP3:%.]] = bitcast i8 [[TMP2]] to <4 x i8>*			; AVX-NEXT: [[TMP3:%.]] = bitcast i8 [[TMP2]] to <4 x i8>*
	; AVX-NEXT: [[WIDE_LOAD10:%.]] = load <4 x i8>, <4 x i8> [[TMP3]], align 1			; AVX-NEXT: [[WIDE_LOAD10:%.]] = load <4 x i8>, <4 x i8> [[TMP3]], align 1
	; AVX-NEXT: [[TMP4:%.]] = getelementptr inbounds i8, i8 [[TMP0]], i64 8			; AVX-NEXT: [[TMP4:%.]] = getelementptr inbounds i8, i8 [[TMP0]], i64 8
	; AVX-NEXT: [[TMP5:%.]] = bitcast i8 [[TMP4]] to <4 x i8>*			; AVX-NEXT: [[TMP5:%.]] = bitcast i8 [[TMP4]] to <4 x i8>*
	; AVX-NEXT: [[WIDE_LOAD11:%.]] = load <4 x i8>, <4 x i8> [[TMP5]], align 1			; AVX-NEXT: [[WIDE_LOAD11:%.]] = load <4 x i8>, <4 x i8> [[TMP5]], align 1
	; AVX-NEXT: [[TMP6:%.]] = getelementptr inbounds i8, i8 [[TMP0]], i64 12			; AVX-NEXT: [[TMP6:%.]] = getelementptr inbounds i8, i8 [[TMP0]], i64 12
	; AVX-NEXT: [[TMP7:%.]] = bitcast i8 [[TMP6]] to <4 x i8>*			; AVX-NEXT: [[TMP7:%.]] = bitcast i8 [[TMP6]] to <4 x i8>*
	; AVX-NEXT: [[WIDE_LOAD12:%.]] = load <4 x i8>, <4 x i8> [[TMP7]], align 1			; AVX-NEXT: [[WIDE_LOAD12:%.]] = load <4 x i8>, <4 x i8> [[TMP7]], align 1
	; AVX-NEXT: [[TMP8:%.*]] = and <4 x i8> [[WIDE_LOAD]], <i8 1, i8 1, i8 1, i8 1>			; AVX-NEXT: [[TMP8:%.*]] = trunc <4 x i8> [[WIDE_LOAD]] to <4 x i1>
	; AVX-NEXT: [[TMP9:%.*]] = and <4 x i8> [[WIDE_LOAD10]], <i8 1, i8 1, i8 1, i8 1>			; AVX-NEXT: [[TMP9:%.*]] = trunc <4 x i8> [[WIDE_LOAD10]] to <4 x i1>
	; AVX-NEXT: [[TMP10:%.*]] = and <4 x i8> [[WIDE_LOAD11]], <i8 1, i8 1, i8 1, i8 1>			; AVX-NEXT: [[TMP10:%.*]] = trunc <4 x i8> [[WIDE_LOAD11]] to <4 x i1>
	; AVX-NEXT: [[TMP11:%.*]] = and <4 x i8> [[WIDE_LOAD12]], <i8 1, i8 1, i8 1, i8 1>			; AVX-NEXT: [[TMP11:%.*]] = trunc <4 x i8> [[WIDE_LOAD12]] to <4 x i1>
	; AVX-NEXT: [[TMP12:%.*]] = icmp ne <4 x i8> [[TMP8]], zeroinitializer			; AVX-NEXT: [[TMP12:%.]] = getelementptr inbounds double, double** [[IN:%.*]], i64 [[INDEX]]
	; AVX-NEXT: [[TMP13:%.*]] = icmp ne <4 x i8> [[TMP9]], zeroinitializer			; AVX-NEXT: [[TMP13:%.]] = bitcast double* [[TMP12]] to <4 x double>
	; AVX-NEXT: [[TMP14:%.*]] = icmp ne <4 x i8> [[TMP10]], zeroinitializer			; AVX-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x double> @llvm.masked.load.v4p0f64.p0v4p0f64(<4 x double> [[TMP13]], i32 8, <4 x i1> [[TMP8]], <4 x double*> undef)
	; AVX-NEXT: [[TMP15:%.*]] = icmp ne <4 x i8> [[TMP11]], zeroinitializer			; AVX-NEXT: [[TMP14:%.]] = getelementptr inbounds double, double** [[TMP12]], i64 4
	; AVX-NEXT: [[TMP16:%.]] = getelementptr inbounds double, double** [[IN:%.*]], i64 [[INDEX]]			; AVX-NEXT: [[TMP15:%.]] = bitcast double* [[TMP14]] to <4 x double>
				; AVX-NEXT: [[WIDE_MASKED_LOAD13:%.]] = call <4 x double> @llvm.masked.load.v4p0f64.p0v4p0f64(<4 x double> [[TMP15]], i32 8, <4 x i1> [[TMP9]], <4 x double*> undef)
				; AVX-NEXT: [[TMP16:%.]] = getelementptr inbounds double, double** [[TMP12]], i64 8
	; AVX-NEXT: [[TMP17:%.]] = bitcast double* [[TMP16]] to <4 x double>			; AVX-NEXT: [[TMP17:%.]] = bitcast double* [[TMP16]] to <4 x double>
	; AVX-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x double> @llvm.masked.load.v4p0f64.p0v4p0f64(<4 x double> [[TMP17]], i32 8, <4 x i1> [[TMP12]], <4 x double*> undef)			; AVX-NEXT: [[WIDE_MASKED_LOAD14:%.]] = call <4 x double> @llvm.masked.load.v4p0f64.p0v4p0f64(<4 x double> [[TMP17]], i32 8, <4 x i1> [[TMP10]], <4 x double*> undef)
	; AVX-NEXT: [[TMP18:%.]] = getelementptr inbounds double, double** [[TMP16]], i64 4			; AVX-NEXT: [[TMP18:%.]] = getelementptr inbounds double, double** [[TMP12]], i64 12
	; AVX-NEXT: [[TMP19:%.]] = bitcast double* [[TMP18]] to <4 x double>			; AVX-NEXT: [[TMP19:%.]] = bitcast double* [[TMP18]] to <4 x double>
	; AVX-NEXT: [[WIDE_MASKED_LOAD13:%.]] = call <4 x double> @llvm.masked.load.v4p0f64.p0v4p0f64(<4 x double> [[TMP19]], i32 8, <4 x i1> [[TMP13]], <4 x double*> undef)			; AVX-NEXT: [[WIDE_MASKED_LOAD15:%.]] = call <4 x double> @llvm.masked.load.v4p0f64.p0v4p0f64(<4 x double> [[TMP19]], i32 8, <4 x i1> [[TMP11]], <4 x double*> undef)
	; AVX-NEXT: [[TMP20:%.]] = getelementptr inbounds double, double** [[TMP16]], i64 8			; AVX-NEXT: [[TMP20:%.]] = icmp ne <4 x double> [[WIDE_MASKED_LOAD]], zeroinitializer
	; AVX-NEXT: [[TMP21:%.]] = bitcast double* [[TMP20]] to <4 x double>			; AVX-NEXT: [[TMP21:%.]] = icmp ne <4 x double> [[WIDE_MASKED_LOAD13]], zeroinitializer
	; AVX-NEXT: [[WIDE_MASKED_LOAD14:%.]] = call <4 x double> @llvm.masked.load.v4p0f64.p0v4p0f64(<4 x double> [[TMP21]], i32 8, <4 x i1> [[TMP14]], <4 x double*> undef)			; AVX-NEXT: [[TMP22:%.]] = icmp ne <4 x double> [[WIDE_MASKED_LOAD14]], zeroinitializer
	; AVX-NEXT: [[TMP22:%.]] = getelementptr inbounds double, double** [[TMP16]], i64 12			; AVX-NEXT: [[TMP23:%.]] = icmp ne <4 x double> [[WIDE_MASKED_LOAD15]], zeroinitializer
	; AVX-NEXT: [[TMP23:%.]] = bitcast double* [[TMP22]] to <4 x double>			; AVX-NEXT: [[TMP24:%.]] = getelementptr inbounds double, double [[OUT:%.*]], i64 [[INDEX]]
	; AVX-NEXT: [[WIDE_MASKED_LOAD15:%.]] = call <4 x double> @llvm.masked.load.v4p0f64.p0v4p0f64(<4 x double> [[TMP23]], i32 8, <4 x i1> [[TMP15]], <4 x double*> undef)			; AVX-NEXT: [[TMP25:%.*]] = and <4 x i1> [[TMP20]], [[TMP8]]
	; AVX-NEXT: [[TMP24:%.]] = icmp ne <4 x double> [[WIDE_MASKED_LOAD]], zeroinitializer			; AVX-NEXT: [[TMP26:%.*]] = and <4 x i1> [[TMP21]], [[TMP9]]
	; AVX-NEXT: [[TMP25:%.]] = icmp ne <4 x double> [[WIDE_MASKED_LOAD13]], zeroinitializer			; AVX-NEXT: [[TMP27:%.*]] = and <4 x i1> [[TMP22]], [[TMP10]]
	; AVX-NEXT: [[TMP26:%.]] = icmp ne <4 x double> [[WIDE_MASKED_LOAD14]], zeroinitializer			; AVX-NEXT: [[TMP28:%.*]] = and <4 x i1> [[TMP23]], [[TMP11]]
	; AVX-NEXT: [[TMP27:%.]] = icmp ne <4 x double> [[WIDE_MASKED_LOAD15]], zeroinitializer			; AVX-NEXT: [[TMP29:%.]] = bitcast double [[TMP24]] to <4 x double>*
	; AVX-NEXT: [[TMP28:%.]] = getelementptr inbounds double, double [[OUT:%.*]], i64 [[INDEX]]			; AVX-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <4 x double>* [[TMP29]], i32 8, <4 x i1> [[TMP25]])
	; AVX-NEXT: [[TMP29:%.*]] = and <4 x i1> [[TMP24]], [[TMP12]]			; AVX-NEXT: [[TMP30:%.]] = getelementptr inbounds double, double [[TMP24]], i64 4
	; AVX-NEXT: [[TMP30:%.*]] = and <4 x i1> [[TMP25]], [[TMP13]]			; AVX-NEXT: [[TMP31:%.]] = bitcast double [[TMP30]] to <4 x double>*
	; AVX-NEXT: [[TMP31:%.*]] = and <4 x i1> [[TMP26]], [[TMP14]]			; AVX-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <4 x double>* [[TMP31]], i32 8, <4 x i1> [[TMP26]])
	; AVX-NEXT: [[TMP32:%.*]] = and <4 x i1> [[TMP27]], [[TMP15]]			; AVX-NEXT: [[TMP32:%.]] = getelementptr inbounds double, double [[TMP24]], i64 8
	; AVX-NEXT: [[TMP33:%.]] = bitcast double [[TMP28]] to <4 x double>*			; AVX-NEXT: [[TMP33:%.]] = bitcast double [[TMP32]] to <4 x double>*
	; AVX-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <4 x double>* [[TMP33]], i32 8, <4 x i1> [[TMP29]])			; AVX-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <4 x double>* [[TMP33]], i32 8, <4 x i1> [[TMP27]])
	; AVX-NEXT: [[TMP34:%.]] = getelementptr inbounds double, double [[TMP28]], i64 4			; AVX-NEXT: [[TMP34:%.]] = getelementptr inbounds double, double [[TMP24]], i64 12
	; AVX-NEXT: [[TMP35:%.]] = bitcast double [[TMP34]] to <4 x double>*			; AVX-NEXT: [[TMP35:%.]] = bitcast double [[TMP34]] to <4 x double>*
	; AVX-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <4 x double>* [[TMP35]], i32 8, <4 x i1> [[TMP30]])			; AVX-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <4 x double>* [[TMP35]], i32 8, <4 x i1> [[TMP28]])
	; AVX-NEXT: [[TMP36:%.]] = getelementptr inbounds double, double [[TMP28]], i64 8
	; AVX-NEXT: [[TMP37:%.]] = bitcast double [[TMP36]] to <4 x double>*
	; AVX-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <4 x double>* [[TMP37]], i32 8, <4 x i1> [[TMP31]])
	; AVX-NEXT: [[TMP38:%.]] = getelementptr inbounds double, double [[TMP28]], i64 12
	; AVX-NEXT: [[TMP39:%.]] = bitcast double [[TMP38]] to <4 x double>*
	; AVX-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <4 x double>* [[TMP39]], i32 8, <4 x i1> [[TMP32]])
	; AVX-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 16			; AVX-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 16
	; AVX-NEXT: [[TMP40:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; AVX-NEXT: [[TMP36:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; AVX-NEXT: br i1 [[TMP40]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !51			; AVX-NEXT: br i1 [[TMP36]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !51
	; AVX: middle.block:			; AVX: middle.block:
	; AVX-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_VEC]], [[WIDE_TRIP_COUNT]]			; AVX-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_VEC]], [[WIDE_TRIP_COUNT]]
	; AVX-NEXT: br i1 [[CMP_N]], label [[FOR_END]], label [[FOR_BODY_PREHEADER16]]			; AVX-NEXT: br i1 [[CMP_N]], label [[FOR_END]], label [[FOR_BODY_PREHEADER16]]
	; AVX: for.body.preheader16:			; AVX: for.body.preheader16:
	; AVX-NEXT: [[INDVARS_IV_PH:%.*]] = phi i64 [ 0, [[FOR_BODY_PREHEADER]] ], [ [[N_VEC]], [[MIDDLE_BLOCK]] ]			; AVX-NEXT: [[INDVARS_IV_PH:%.*]] = phi i64 [ 0, [[FOR_BODY_PREHEADER]] ], [ [[N_VEC]], [[MIDDLE_BLOCK]] ]
	; AVX-NEXT: br label [[FOR_BODY:%.*]]			; AVX-NEXT: br label [[FOR_BODY:%.*]]
	; AVX: for.body:			; AVX: for.body:
	; AVX-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[INDVARS_IV_NEXT:%.]], [[FOR_INC:%.*]] ], [ [[INDVARS_IV_PH]], [[FOR_BODY_PREHEADER16]] ]			; AVX-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[INDVARS_IV_NEXT:%.]], [[FOR_INC:%.*]] ], [ [[INDVARS_IV_PH]], [[FOR_BODY_PREHEADER16]] ]
	; AVX-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i8, i8 [[TRIGGER]], i64 [[INDVARS_IV]]			; AVX-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i8, i8 [[TRIGGER]], i64 [[INDVARS_IV]]
	; AVX-NEXT: [[TMP41:%.]] = load i8, i8 [[ARRAYIDX]], align 1			; AVX-NEXT: [[TMP37:%.]] = load i8, i8 [[ARRAYIDX]], align 1
	; AVX-NEXT: [[TMP42:%.*]] = and i8 [[TMP41]], 1			; AVX-NEXT: [[TMP38:%.*]] = and i8 [[TMP37]], 1
	; AVX-NEXT: [[TOBOOL:%.*]] = icmp eq i8 [[TMP42]], 0			; AVX-NEXT: [[TOBOOL:%.*]] = icmp eq i8 [[TMP38]], 0
	; AVX-NEXT: br i1 [[TOBOOL]], label [[FOR_INC]], label [[LAND_LHS_TRUE:%.*]]			; AVX-NEXT: br i1 [[TOBOOL]], label [[FOR_INC]], label [[LAND_LHS_TRUE:%.*]]
	; AVX: land.lhs.true:			; AVX: land.lhs.true:
	; AVX-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds double, double** [[IN]], i64 [[INDVARS_IV]]			; AVX-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds double, double** [[IN]], i64 [[INDVARS_IV]]
	; AVX-NEXT: [[TMP43:%.]] = load double, double** [[ARRAYIDX2]], align 8			; AVX-NEXT: [[TMP39:%.]] = load double, double** [[ARRAYIDX2]], align 8
	; AVX-NEXT: [[CMP3:%.]] = icmp eq double [[TMP43]], null			; AVX-NEXT: [[CMP3:%.]] = icmp eq double [[TMP39]], null
	; AVX-NEXT: br i1 [[CMP3]], label [[FOR_INC]], label [[IF_THEN:%.*]]			; AVX-NEXT: br i1 [[CMP3]], label [[FOR_INC]], label [[IF_THEN:%.*]]
	; AVX: if.then:			; AVX: if.then:
	; AVX-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds double, double [[OUT]], i64 [[INDVARS_IV]]			; AVX-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds double, double [[OUT]], i64 [[INDVARS_IV]]
	; AVX-NEXT: store double 5.000000e-01, double* [[ARRAYIDX5]], align 8			; AVX-NEXT: store double 5.000000e-01, double* [[ARRAYIDX5]], align 8
	; AVX-NEXT: br label [[FOR_INC]]			; AVX-NEXT: br label [[FOR_INC]]
	; AVX: for.inc:			; AVX: for.inc:
	; AVX-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1			; AVX-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; AVX-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[WIDE_TRIP_COUNT]]			; AVX-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[WIDE_TRIP_COUNT]]
	Show All 21 Lines
	; AVX512-NEXT: [[TMP3:%.]] = bitcast i8 [[TMP2]] to <8 x i8>*			; AVX512-NEXT: [[TMP3:%.]] = bitcast i8 [[TMP2]] to <8 x i8>*
	; AVX512-NEXT: [[WIDE_LOAD10:%.]] = load <8 x i8>, <8 x i8> [[TMP3]], align 1			; AVX512-NEXT: [[WIDE_LOAD10:%.]] = load <8 x i8>, <8 x i8> [[TMP3]], align 1
	; AVX512-NEXT: [[TMP4:%.]] = getelementptr inbounds i8, i8 [[TMP0]], i64 16			; AVX512-NEXT: [[TMP4:%.]] = getelementptr inbounds i8, i8 [[TMP0]], i64 16
	; AVX512-NEXT: [[TMP5:%.]] = bitcast i8 [[TMP4]] to <8 x i8>*			; AVX512-NEXT: [[TMP5:%.]] = bitcast i8 [[TMP4]] to <8 x i8>*
	; AVX512-NEXT: [[WIDE_LOAD11:%.]] = load <8 x i8>, <8 x i8> [[TMP5]], align 1			; AVX512-NEXT: [[WIDE_LOAD11:%.]] = load <8 x i8>, <8 x i8> [[TMP5]], align 1
	; AVX512-NEXT: [[TMP6:%.]] = getelementptr inbounds i8, i8 [[TMP0]], i64 24			; AVX512-NEXT: [[TMP6:%.]] = getelementptr inbounds i8, i8 [[TMP0]], i64 24
	; AVX512-NEXT: [[TMP7:%.]] = bitcast i8 [[TMP6]] to <8 x i8>*			; AVX512-NEXT: [[TMP7:%.]] = bitcast i8 [[TMP6]] to <8 x i8>*
	; AVX512-NEXT: [[WIDE_LOAD12:%.]] = load <8 x i8>, <8 x i8> [[TMP7]], align 1			; AVX512-NEXT: [[WIDE_LOAD12:%.]] = load <8 x i8>, <8 x i8> [[TMP7]], align 1
	; AVX512-NEXT: [[TMP8:%.*]] = and <8 x i8> [[WIDE_LOAD]], <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>			; AVX512-NEXT: [[TMP8:%.*]] = trunc <8 x i8> [[WIDE_LOAD]] to <8 x i1>
	; AVX512-NEXT: [[TMP9:%.*]] = and <8 x i8> [[WIDE_LOAD10]], <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>			; AVX512-NEXT: [[TMP9:%.*]] = trunc <8 x i8> [[WIDE_LOAD10]] to <8 x i1>
	; AVX512-NEXT: [[TMP10:%.*]] = and <8 x i8> [[WIDE_LOAD11]], <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>			; AVX512-NEXT: [[TMP10:%.*]] = trunc <8 x i8> [[WIDE_LOAD11]] to <8 x i1>
	; AVX512-NEXT: [[TMP11:%.*]] = and <8 x i8> [[WIDE_LOAD12]], <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>			; AVX512-NEXT: [[TMP11:%.*]] = trunc <8 x i8> [[WIDE_LOAD12]] to <8 x i1>
	; AVX512-NEXT: [[TMP12:%.*]] = icmp ne <8 x i8> [[TMP8]], zeroinitializer			; AVX512-NEXT: [[TMP12:%.]] = getelementptr inbounds double, double** [[IN:%.*]], i64 [[INDEX]]
	; AVX512-NEXT: [[TMP13:%.*]] = icmp ne <8 x i8> [[TMP9]], zeroinitializer			; AVX512-NEXT: [[TMP13:%.]] = bitcast double* [[TMP12]] to <8 x double>
	; AVX512-NEXT: [[TMP14:%.*]] = icmp ne <8 x i8> [[TMP10]], zeroinitializer			; AVX512-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <8 x double> @llvm.masked.load.v8p0f64.p0v8p0f64(<8 x double> [[TMP13]], i32 8, <8 x i1> [[TMP8]], <8 x double*> undef)
	; AVX512-NEXT: [[TMP15:%.*]] = icmp ne <8 x i8> [[TMP11]], zeroinitializer			; AVX512-NEXT: [[TMP14:%.]] = getelementptr inbounds double, double** [[TMP12]], i64 8
	; AVX512-NEXT: [[TMP16:%.]] = getelementptr inbounds double, double** [[IN:%.*]], i64 [[INDEX]]			; AVX512-NEXT: [[TMP15:%.]] = bitcast double* [[TMP14]] to <8 x double>
				; AVX512-NEXT: [[WIDE_MASKED_LOAD13:%.]] = call <8 x double> @llvm.masked.load.v8p0f64.p0v8p0f64(<8 x double> [[TMP15]], i32 8, <8 x i1> [[TMP9]], <8 x double*> undef)
				; AVX512-NEXT: [[TMP16:%.]] = getelementptr inbounds double, double** [[TMP12]], i64 16
	; AVX512-NEXT: [[TMP17:%.]] = bitcast double* [[TMP16]] to <8 x double>			; AVX512-NEXT: [[TMP17:%.]] = bitcast double* [[TMP16]] to <8 x double>
	; AVX512-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <8 x double> @llvm.masked.load.v8p0f64.p0v8p0f64(<8 x double> [[TMP17]], i32 8, <8 x i1> [[TMP12]], <8 x double*> undef)			; AVX512-NEXT: [[WIDE_MASKED_LOAD14:%.]] = call <8 x double> @llvm.masked.load.v8p0f64.p0v8p0f64(<8 x double> [[TMP17]], i32 8, <8 x i1> [[TMP10]], <8 x double*> undef)
	; AVX512-NEXT: [[TMP18:%.]] = getelementptr inbounds double, double** [[TMP16]], i64 8			; AVX512-NEXT: [[TMP18:%.]] = getelementptr inbounds double, double** [[TMP12]], i64 24
	; AVX512-NEXT: [[TMP19:%.]] = bitcast double* [[TMP18]] to <8 x double>			; AVX512-NEXT: [[TMP19:%.]] = bitcast double* [[TMP18]] to <8 x double>
	; AVX512-NEXT: [[WIDE_MASKED_LOAD13:%.]] = call <8 x double> @llvm.masked.load.v8p0f64.p0v8p0f64(<8 x double> [[TMP19]], i32 8, <8 x i1> [[TMP13]], <8 x double*> undef)			; AVX512-NEXT: [[WIDE_MASKED_LOAD15:%.]] = call <8 x double> @llvm.masked.load.v8p0f64.p0v8p0f64(<8 x double> [[TMP19]], i32 8, <8 x i1> [[TMP11]], <8 x double*> undef)
	; AVX512-NEXT: [[TMP20:%.]] = getelementptr inbounds double, double** [[TMP16]], i64 16			; AVX512-NEXT: [[TMP20:%.]] = icmp ne <8 x double> [[WIDE_MASKED_LOAD]], zeroinitializer
	; AVX512-NEXT: [[TMP21:%.]] = bitcast double* [[TMP20]] to <8 x double>			; AVX512-NEXT: [[TMP21:%.]] = icmp ne <8 x double> [[WIDE_MASKED_LOAD13]], zeroinitializer
	; AVX512-NEXT: [[WIDE_MASKED_LOAD14:%.]] = call <8 x double> @llvm.masked.load.v8p0f64.p0v8p0f64(<8 x double> [[TMP21]], i32 8, <8 x i1> [[TMP14]], <8 x double*> undef)			; AVX512-NEXT: [[TMP22:%.]] = icmp ne <8 x double> [[WIDE_MASKED_LOAD14]], zeroinitializer
	; AVX512-NEXT: [[TMP22:%.]] = getelementptr inbounds double, double** [[TMP16]], i64 24			; AVX512-NEXT: [[TMP23:%.]] = icmp ne <8 x double> [[WIDE_MASKED_LOAD15]], zeroinitializer
	; AVX512-NEXT: [[TMP23:%.]] = bitcast double* [[TMP22]] to <8 x double>			; AVX512-NEXT: [[TMP24:%.]] = getelementptr inbounds double, double [[OUT:%.*]], i64 [[INDEX]]
	; AVX512-NEXT: [[WIDE_MASKED_LOAD15:%.]] = call <8 x double> @llvm.masked.load.v8p0f64.p0v8p0f64(<8 x double> [[TMP23]], i32 8, <8 x i1> [[TMP15]], <8 x double*> undef)			; AVX512-NEXT: [[TMP25:%.*]] = and <8 x i1> [[TMP20]], [[TMP8]]
	; AVX512-NEXT: [[TMP24:%.]] = icmp ne <8 x double> [[WIDE_MASKED_LOAD]], zeroinitializer			; AVX512-NEXT: [[TMP26:%.*]] = and <8 x i1> [[TMP21]], [[TMP9]]
	; AVX512-NEXT: [[TMP25:%.]] = icmp ne <8 x double> [[WIDE_MASKED_LOAD13]], zeroinitializer			; AVX512-NEXT: [[TMP27:%.*]] = and <8 x i1> [[TMP22]], [[TMP10]]
	; AVX512-NEXT: [[TMP26:%.]] = icmp ne <8 x double> [[WIDE_MASKED_LOAD14]], zeroinitializer			; AVX512-NEXT: [[TMP28:%.*]] = and <8 x i1> [[TMP23]], [[TMP11]]
	; AVX512-NEXT: [[TMP27:%.]] = icmp ne <8 x double> [[WIDE_MASKED_LOAD15]], zeroinitializer			; AVX512-NEXT: [[TMP29:%.]] = bitcast double [[TMP24]] to <8 x double>*
	; AVX512-NEXT: [[TMP28:%.]] = getelementptr inbounds double, double [[OUT:%.*]], i64 [[INDEX]]			; AVX512-NEXT: call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <8 x double>* [[TMP29]], i32 8, <8 x i1> [[TMP25]])
	; AVX512-NEXT: [[TMP29:%.*]] = and <8 x i1> [[TMP24]], [[TMP12]]			; AVX512-NEXT: [[TMP30:%.]] = getelementptr inbounds double, double [[TMP24]], i64 8
	; AVX512-NEXT: [[TMP30:%.*]] = and <8 x i1> [[TMP25]], [[TMP13]]			; AVX512-NEXT: [[TMP31:%.]] = bitcast double [[TMP30]] to <8 x double>*
	; AVX512-NEXT: [[TMP31:%.*]] = and <8 x i1> [[TMP26]], [[TMP14]]			; AVX512-NEXT: call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <8 x double>* [[TMP31]], i32 8, <8 x i1> [[TMP26]])
	; AVX512-NEXT: [[TMP32:%.*]] = and <8 x i1> [[TMP27]], [[TMP15]]			; AVX512-NEXT: [[TMP32:%.]] = getelementptr inbounds double, double [[TMP24]], i64 16
	; AVX512-NEXT: [[TMP33:%.]] = bitcast double [[TMP28]] to <8 x double>*			; AVX512-NEXT: [[TMP33:%.]] = bitcast double [[TMP32]] to <8 x double>*
	; AVX512-NEXT: call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <8 x double>* [[TMP33]], i32 8, <8 x i1> [[TMP29]])			; AVX512-NEXT: call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <8 x double>* [[TMP33]], i32 8, <8 x i1> [[TMP27]])
	; AVX512-NEXT: [[TMP34:%.]] = getelementptr inbounds double, double [[TMP28]], i64 8			; AVX512-NEXT: [[TMP34:%.]] = getelementptr inbounds double, double [[TMP24]], i64 24
	; AVX512-NEXT: [[TMP35:%.]] = bitcast double [[TMP34]] to <8 x double>*			; AVX512-NEXT: [[TMP35:%.]] = bitcast double [[TMP34]] to <8 x double>*
	; AVX512-NEXT: call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <8 x double>* [[TMP35]], i32 8, <8 x i1> [[TMP30]])			; AVX512-NEXT: call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <8 x double>* [[TMP35]], i32 8, <8 x i1> [[TMP28]])
	; AVX512-NEXT: [[TMP36:%.]] = getelementptr inbounds double, double [[TMP28]], i64 16
	; AVX512-NEXT: [[TMP37:%.]] = bitcast double [[TMP36]] to <8 x double>*
	; AVX512-NEXT: call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <8 x double>* [[TMP37]], i32 8, <8 x i1> [[TMP31]])
	; AVX512-NEXT: [[TMP38:%.]] = getelementptr inbounds double, double [[TMP28]], i64 24
	; AVX512-NEXT: [[TMP39:%.]] = bitcast double [[TMP38]] to <8 x double>*
	; AVX512-NEXT: call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <8 x double>* [[TMP39]], i32 8, <8 x i1> [[TMP32]])
	; AVX512-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 32			; AVX512-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 32
	; AVX512-NEXT: [[TMP40:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; AVX512-NEXT: [[TMP36:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; AVX512-NEXT: br i1 [[TMP40]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !63			; AVX512-NEXT: br i1 [[TMP36]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !63
	; AVX512: middle.block:			; AVX512: middle.block:
	; AVX512-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_VEC]], [[WIDE_TRIP_COUNT]]			; AVX512-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_VEC]], [[WIDE_TRIP_COUNT]]
	; AVX512-NEXT: br i1 [[CMP_N]], label [[FOR_END]], label [[FOR_BODY_PREHEADER16]]			; AVX512-NEXT: br i1 [[CMP_N]], label [[FOR_END]], label [[FOR_BODY_PREHEADER16]]
	; AVX512: for.body.preheader16:			; AVX512: for.body.preheader16:
	; AVX512-NEXT: [[INDVARS_IV_PH:%.*]] = phi i64 [ 0, [[FOR_BODY_PREHEADER]] ], [ [[N_VEC]], [[MIDDLE_BLOCK]] ]			; AVX512-NEXT: [[INDVARS_IV_PH:%.*]] = phi i64 [ 0, [[FOR_BODY_PREHEADER]] ], [ [[N_VEC]], [[MIDDLE_BLOCK]] ]
	; AVX512-NEXT: br label [[FOR_BODY:%.*]]			; AVX512-NEXT: br label [[FOR_BODY:%.*]]
	; AVX512: for.body:			; AVX512: for.body:
	; AVX512-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[INDVARS_IV_NEXT:%.]], [[FOR_INC:%.*]] ], [ [[INDVARS_IV_PH]], [[FOR_BODY_PREHEADER16]] ]			; AVX512-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[INDVARS_IV_NEXT:%.]], [[FOR_INC:%.*]] ], [ [[INDVARS_IV_PH]], [[FOR_BODY_PREHEADER16]] ]
	; AVX512-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i8, i8 [[TRIGGER]], i64 [[INDVARS_IV]]			; AVX512-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i8, i8 [[TRIGGER]], i64 [[INDVARS_IV]]
	; AVX512-NEXT: [[TMP41:%.]] = load i8, i8 [[ARRAYIDX]], align 1			; AVX512-NEXT: [[TMP37:%.]] = load i8, i8 [[ARRAYIDX]], align 1
	; AVX512-NEXT: [[TMP42:%.*]] = and i8 [[TMP41]], 1			; AVX512-NEXT: [[TMP38:%.*]] = and i8 [[TMP37]], 1
	; AVX512-NEXT: [[TOBOOL:%.*]] = icmp eq i8 [[TMP42]], 0			; AVX512-NEXT: [[TOBOOL:%.*]] = icmp eq i8 [[TMP38]], 0
	; AVX512-NEXT: br i1 [[TOBOOL]], label [[FOR_INC]], label [[LAND_LHS_TRUE:%.*]]			; AVX512-NEXT: br i1 [[TOBOOL]], label [[FOR_INC]], label [[LAND_LHS_TRUE:%.*]]
	; AVX512: land.lhs.true:			; AVX512: land.lhs.true:
	; AVX512-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds double, double** [[IN]], i64 [[INDVARS_IV]]			; AVX512-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds double, double** [[IN]], i64 [[INDVARS_IV]]
	; AVX512-NEXT: [[TMP43:%.]] = load double, double** [[ARRAYIDX2]], align 8			; AVX512-NEXT: [[TMP39:%.]] = load double, double** [[ARRAYIDX2]], align 8
	; AVX512-NEXT: [[CMP3:%.]] = icmp eq double [[TMP43]], null			; AVX512-NEXT: [[CMP3:%.]] = icmp eq double [[TMP39]], null
	; AVX512-NEXT: br i1 [[CMP3]], label [[FOR_INC]], label [[IF_THEN:%.*]]			; AVX512-NEXT: br i1 [[CMP3]], label [[FOR_INC]], label [[IF_THEN:%.*]]
	; AVX512: if.then:			; AVX512: if.then:
	; AVX512-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds double, double [[OUT]], i64 [[INDVARS_IV]]			; AVX512-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds double, double [[OUT]], i64 [[INDVARS_IV]]
	; AVX512-NEXT: store double 5.000000e-01, double* [[ARRAYIDX5]], align 8			; AVX512-NEXT: store double 5.000000e-01, double* [[ARRAYIDX5]], align 8
	; AVX512-NEXT: br label [[FOR_INC]]			; AVX512-NEXT: br label [[FOR_INC]]
	; AVX512: for.inc:			; AVX512: for.inc:
	; AVX512-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1			; AVX512-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; AVX512-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[WIDE_TRIP_COUNT]]			; AVX512-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[WIDE_TRIP_COUNT]]
	▲ Show 20 Lines • Show All 88 Lines • ▼ Show 20 Lines
	; AVX-NEXT: [[TMP3:%.]] = bitcast i8 [[TMP2]] to <4 x i8>*			; AVX-NEXT: [[TMP3:%.]] = bitcast i8 [[TMP2]] to <4 x i8>*
	; AVX-NEXT: [[WIDE_LOAD10:%.]] = load <4 x i8>, <4 x i8> [[TMP3]], align 1			; AVX-NEXT: [[WIDE_LOAD10:%.]] = load <4 x i8>, <4 x i8> [[TMP3]], align 1
	; AVX-NEXT: [[TMP4:%.]] = getelementptr inbounds i8, i8 [[TMP0]], i64 8			; AVX-NEXT: [[TMP4:%.]] = getelementptr inbounds i8, i8 [[TMP0]], i64 8
	; AVX-NEXT: [[TMP5:%.]] = bitcast i8 [[TMP4]] to <4 x i8>*			; AVX-NEXT: [[TMP5:%.]] = bitcast i8 [[TMP4]] to <4 x i8>*
	; AVX-NEXT: [[WIDE_LOAD11:%.]] = load <4 x i8>, <4 x i8> [[TMP5]], align 1			; AVX-NEXT: [[WIDE_LOAD11:%.]] = load <4 x i8>, <4 x i8> [[TMP5]], align 1
	; AVX-NEXT: [[TMP6:%.]] = getelementptr inbounds i8, i8 [[TMP0]], i64 12			; AVX-NEXT: [[TMP6:%.]] = getelementptr inbounds i8, i8 [[TMP0]], i64 12
	; AVX-NEXT: [[TMP7:%.]] = bitcast i8 [[TMP6]] to <4 x i8>*			; AVX-NEXT: [[TMP7:%.]] = bitcast i8 [[TMP6]] to <4 x i8>*
	; AVX-NEXT: [[WIDE_LOAD12:%.]] = load <4 x i8>, <4 x i8> [[TMP7]], align 1			; AVX-NEXT: [[WIDE_LOAD12:%.]] = load <4 x i8>, <4 x i8> [[TMP7]], align 1
	; AVX-NEXT: [[TMP8:%.*]] = and <4 x i8> [[WIDE_LOAD]], <i8 1, i8 1, i8 1, i8 1>			; AVX-NEXT: [[TMP8:%.*]] = trunc <4 x i8> [[WIDE_LOAD]] to <4 x i1>
	; AVX-NEXT: [[TMP9:%.*]] = and <4 x i8> [[WIDE_LOAD10]], <i8 1, i8 1, i8 1, i8 1>			; AVX-NEXT: [[TMP9:%.*]] = trunc <4 x i8> [[WIDE_LOAD10]] to <4 x i1>
	; AVX-NEXT: [[TMP10:%.*]] = and <4 x i8> [[WIDE_LOAD11]], <i8 1, i8 1, i8 1, i8 1>			; AVX-NEXT: [[TMP10:%.*]] = trunc <4 x i8> [[WIDE_LOAD11]] to <4 x i1>
	; AVX-NEXT: [[TMP11:%.*]] = and <4 x i8> [[WIDE_LOAD12]], <i8 1, i8 1, i8 1, i8 1>			; AVX-NEXT: [[TMP11:%.*]] = trunc <4 x i8> [[WIDE_LOAD12]] to <4 x i1>
	; AVX-NEXT: [[TMP12:%.*]] = icmp ne <4 x i8> [[TMP8]], zeroinitializer			; AVX-NEXT: [[TMP12:%.]] = getelementptr inbounds i32 (), i32 ()** [[IN:%.*]], i64 [[INDEX]]
	; AVX-NEXT: [[TMP13:%.*]] = icmp ne <4 x i8> [[TMP9]], zeroinitializer			; AVX-NEXT: [[TMP13:%.]] = bitcast i32 ()* [[TMP12]] to <4 x i32 ()>
	; AVX-NEXT: [[TMP14:%.*]] = icmp ne <4 x i8> [[TMP10]], zeroinitializer			; AVX-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x i32 ()> @llvm.masked.load.v4p0f_i32f.p0v4p0f_i32f(<4 x i32 ()> [[TMP13]], i32 8, <4 x i1> [[TMP8]], <4 x i32 ()*> undef)
	; AVX-NEXT: [[TMP15:%.*]] = icmp ne <4 x i8> [[TMP11]], zeroinitializer			; AVX-NEXT: [[TMP14:%.]] = getelementptr inbounds i32 (), i32 ()** [[TMP12]], i64 4
	; AVX-NEXT: [[TMP16:%.]] = getelementptr inbounds i32 (), i32 ()** [[IN:%.*]], i64 [[INDEX]]			; AVX-NEXT: [[TMP15:%.]] = bitcast i32 ()* [[TMP14]] to <4 x i32 ()>
				; AVX-NEXT: [[WIDE_MASKED_LOAD13:%.]] = call <4 x i32 ()> @llvm.masked.load.v4p0f_i32f.p0v4p0f_i32f(<4 x i32 ()> [[TMP15]], i32 8, <4 x i1> [[TMP9]], <4 x i32 ()*> undef)
				; AVX-NEXT: [[TMP16:%.]] = getelementptr inbounds i32 (), i32 ()** [[TMP12]], i64 8
	; AVX-NEXT: [[TMP17:%.]] = bitcast i32 ()* [[TMP16]] to <4 x i32 ()>			; AVX-NEXT: [[TMP17:%.]] = bitcast i32 ()* [[TMP16]] to <4 x i32 ()>
	; AVX-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x i32 ()> @llvm.masked.load.v4p0f_i32f.p0v4p0f_i32f(<4 x i32 ()> [[TMP17]], i32 8, <4 x i1> [[TMP12]], <4 x i32 ()*> undef)			; AVX-NEXT: [[WIDE_MASKED_LOAD14:%.]] = call <4 x i32 ()> @llvm.masked.load.v4p0f_i32f.p0v4p0f_i32f(<4 x i32 ()> [[TMP17]], i32 8, <4 x i1> [[TMP10]], <4 x i32 ()*> undef)
	; AVX-NEXT: [[TMP18:%.]] = getelementptr inbounds i32 (), i32 ()** [[TMP16]], i64 4			; AVX-NEXT: [[TMP18:%.]] = getelementptr inbounds i32 (), i32 ()** [[TMP12]], i64 12
	; AVX-NEXT: [[TMP19:%.]] = bitcast i32 ()* [[TMP18]] to <4 x i32 ()>			; AVX-NEXT: [[TMP19:%.]] = bitcast i32 ()* [[TMP18]] to <4 x i32 ()>
	; AVX-NEXT: [[WIDE_MASKED_LOAD13:%.]] = call <4 x i32 ()> @llvm.masked.load.v4p0f_i32f.p0v4p0f_i32f(<4 x i32 ()> [[TMP19]], i32 8, <4 x i1> [[TMP13]], <4 x i32 ()*> undef)			; AVX-NEXT: [[WIDE_MASKED_LOAD15:%.]] = call <4 x i32 ()> @llvm.masked.load.v4p0f_i32f.p0v4p0f_i32f(<4 x i32 ()> [[TMP19]], i32 8, <4 x i1> [[TMP11]], <4 x i32 ()*> undef)
	; AVX-NEXT: [[TMP20:%.]] = getelementptr inbounds i32 (), i32 ()** [[TMP16]], i64 8			; AVX-NEXT: [[TMP20:%.]] = icmp ne <4 x i32 ()> [[WIDE_MASKED_LOAD]], zeroinitializer
	; AVX-NEXT: [[TMP21:%.]] = bitcast i32 ()* [[TMP20]] to <4 x i32 ()>			; AVX-NEXT: [[TMP21:%.]] = icmp ne <4 x i32 ()> [[WIDE_MASKED_LOAD13]], zeroinitializer
	; AVX-NEXT: [[WIDE_MASKED_LOAD14:%.]] = call <4 x i32 ()> @llvm.masked.load.v4p0f_i32f.p0v4p0f_i32f(<4 x i32 ()> [[TMP21]], i32 8, <4 x i1> [[TMP14]], <4 x i32 ()*> undef)			; AVX-NEXT: [[TMP22:%.]] = icmp ne <4 x i32 ()> [[WIDE_MASKED_LOAD14]], zeroinitializer
	; AVX-NEXT: [[TMP22:%.]] = getelementptr inbounds i32 (), i32 ()** [[TMP16]], i64 12			; AVX-NEXT: [[TMP23:%.]] = icmp ne <4 x i32 ()> [[WIDE_MASKED_LOAD15]], zeroinitializer
	; AVX-NEXT: [[TMP23:%.]] = bitcast i32 ()* [[TMP22]] to <4 x i32 ()>			; AVX-NEXT: [[TMP24:%.]] = getelementptr inbounds double, double [[OUT:%.*]], i64 [[INDEX]]
	; AVX-NEXT: [[WIDE_MASKED_LOAD15:%.]] = call <4 x i32 ()> @llvm.masked.load.v4p0f_i32f.p0v4p0f_i32f(<4 x i32 ()> [[TMP23]], i32 8, <4 x i1> [[TMP15]], <4 x i32 ()*> undef)			; AVX-NEXT: [[TMP25:%.*]] = and <4 x i1> [[TMP20]], [[TMP8]]
	; AVX-NEXT: [[TMP24:%.]] = icmp ne <4 x i32 ()> [[WIDE_MASKED_LOAD]], zeroinitializer			; AVX-NEXT: [[TMP26:%.*]] = and <4 x i1> [[TMP21]], [[TMP9]]
	; AVX-NEXT: [[TMP25:%.]] = icmp ne <4 x i32 ()> [[WIDE_MASKED_LOAD13]], zeroinitializer			; AVX-NEXT: [[TMP27:%.*]] = and <4 x i1> [[TMP22]], [[TMP10]]
	; AVX-NEXT: [[TMP26:%.]] = icmp ne <4 x i32 ()> [[WIDE_MASKED_LOAD14]], zeroinitializer			; AVX-NEXT: [[TMP28:%.*]] = and <4 x i1> [[TMP23]], [[TMP11]]
	; AVX-NEXT: [[TMP27:%.]] = icmp ne <4 x i32 ()> [[WIDE_MASKED_LOAD15]], zeroinitializer			; AVX-NEXT: [[TMP29:%.]] = bitcast double [[TMP24]] to <4 x double>*
	; AVX-NEXT: [[TMP28:%.]] = getelementptr inbounds double, double [[OUT:%.*]], i64 [[INDEX]]			; AVX-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <4 x double>* [[TMP29]], i32 8, <4 x i1> [[TMP25]])
	; AVX-NEXT: [[TMP29:%.*]] = and <4 x i1> [[TMP24]], [[TMP12]]			; AVX-NEXT: [[TMP30:%.]] = getelementptr inbounds double, double [[TMP24]], i64 4
	; AVX-NEXT: [[TMP30:%.*]] = and <4 x i1> [[TMP25]], [[TMP13]]			; AVX-NEXT: [[TMP31:%.]] = bitcast double [[TMP30]] to <4 x double>*
	; AVX-NEXT: [[TMP31:%.*]] = and <4 x i1> [[TMP26]], [[TMP14]]			; AVX-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <4 x double>* [[TMP31]], i32 8, <4 x i1> [[TMP26]])
	; AVX-NEXT: [[TMP32:%.*]] = and <4 x i1> [[TMP27]], [[TMP15]]			; AVX-NEXT: [[TMP32:%.]] = getelementptr inbounds double, double [[TMP24]], i64 8
	; AVX-NEXT: [[TMP33:%.]] = bitcast double [[TMP28]] to <4 x double>*			; AVX-NEXT: [[TMP33:%.]] = bitcast double [[TMP32]] to <4 x double>*
	; AVX-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <4 x double>* [[TMP33]], i32 8, <4 x i1> [[TMP29]])			; AVX-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <4 x double>* [[TMP33]], i32 8, <4 x i1> [[TMP27]])
	; AVX-NEXT: [[TMP34:%.]] = getelementptr inbounds double, double [[TMP28]], i64 4			; AVX-NEXT: [[TMP34:%.]] = getelementptr inbounds double, double [[TMP24]], i64 12
	; AVX-NEXT: [[TMP35:%.]] = bitcast double [[TMP34]] to <4 x double>*			; AVX-NEXT: [[TMP35:%.]] = bitcast double [[TMP34]] to <4 x double>*
	; AVX-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <4 x double>* [[TMP35]], i32 8, <4 x i1> [[TMP30]])			; AVX-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <4 x double>* [[TMP35]], i32 8, <4 x i1> [[TMP28]])
	; AVX-NEXT: [[TMP36:%.]] = getelementptr inbounds double, double [[TMP28]], i64 8
	; AVX-NEXT: [[TMP37:%.]] = bitcast double [[TMP36]] to <4 x double>*
	; AVX-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <4 x double>* [[TMP37]], i32 8, <4 x i1> [[TMP31]])
	; AVX-NEXT: [[TMP38:%.]] = getelementptr inbounds double, double [[TMP28]], i64 12
	; AVX-NEXT: [[TMP39:%.]] = bitcast double [[TMP38]] to <4 x double>*
	; AVX-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <4 x double>* [[TMP39]], i32 8, <4 x i1> [[TMP32]])
	; AVX-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 16			; AVX-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 16
	; AVX-NEXT: [[TMP40:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; AVX-NEXT: [[TMP36:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; AVX-NEXT: br i1 [[TMP40]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !54			; AVX-NEXT: br i1 [[TMP36]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !54
	; AVX: middle.block:			; AVX: middle.block:
	; AVX-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_VEC]], [[WIDE_TRIP_COUNT]]			; AVX-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_VEC]], [[WIDE_TRIP_COUNT]]
	; AVX-NEXT: br i1 [[CMP_N]], label [[FOR_END]], label [[FOR_BODY_PREHEADER16]]			; AVX-NEXT: br i1 [[CMP_N]], label [[FOR_END]], label [[FOR_BODY_PREHEADER16]]
	; AVX: for.body.preheader16:			; AVX: for.body.preheader16:
	; AVX-NEXT: [[INDVARS_IV_PH:%.*]] = phi i64 [ 0, [[FOR_BODY_PREHEADER]] ], [ [[N_VEC]], [[MIDDLE_BLOCK]] ]			; AVX-NEXT: [[INDVARS_IV_PH:%.*]] = phi i64 [ 0, [[FOR_BODY_PREHEADER]] ], [ [[N_VEC]], [[MIDDLE_BLOCK]] ]
	; AVX-NEXT: br label [[FOR_BODY:%.*]]			; AVX-NEXT: br label [[FOR_BODY:%.*]]
	; AVX: for.body:			; AVX: for.body:
	; AVX-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[INDVARS_IV_NEXT:%.]], [[FOR_INC:%.*]] ], [ [[INDVARS_IV_PH]], [[FOR_BODY_PREHEADER16]] ]			; AVX-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[INDVARS_IV_NEXT:%.]], [[FOR_INC:%.*]] ], [ [[INDVARS_IV_PH]], [[FOR_BODY_PREHEADER16]] ]
	; AVX-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i8, i8 [[TRIGGER]], i64 [[INDVARS_IV]]			; AVX-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i8, i8 [[TRIGGER]], i64 [[INDVARS_IV]]
	; AVX-NEXT: [[TMP41:%.]] = load i8, i8 [[ARRAYIDX]], align 1			; AVX-NEXT: [[TMP37:%.]] = load i8, i8 [[ARRAYIDX]], align 1
	; AVX-NEXT: [[TMP42:%.*]] = and i8 [[TMP41]], 1			; AVX-NEXT: [[TMP38:%.*]] = and i8 [[TMP37]], 1
	; AVX-NEXT: [[TOBOOL:%.*]] = icmp eq i8 [[TMP42]], 0			; AVX-NEXT: [[TOBOOL:%.*]] = icmp eq i8 [[TMP38]], 0
	; AVX-NEXT: br i1 [[TOBOOL]], label [[FOR_INC]], label [[LAND_LHS_TRUE:%.*]]			; AVX-NEXT: br i1 [[TOBOOL]], label [[FOR_INC]], label [[LAND_LHS_TRUE:%.*]]
	; AVX: land.lhs.true:			; AVX: land.lhs.true:
	; AVX-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds i32 (), i32 ()** [[IN]], i64 [[INDVARS_IV]]			; AVX-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds i32 (), i32 ()** [[IN]], i64 [[INDVARS_IV]]
	; AVX-NEXT: [[TMP43:%.]] = load i32 (), i32 ()** [[ARRAYIDX2]], align 8			; AVX-NEXT: [[TMP39:%.]] = load i32 (), i32 ()** [[ARRAYIDX2]], align 8
	; AVX-NEXT: [[CMP3:%.]] = icmp eq i32 () [[TMP43]], null			; AVX-NEXT: [[CMP3:%.]] = icmp eq i32 () [[TMP39]], null
	; AVX-NEXT: br i1 [[CMP3]], label [[FOR_INC]], label [[IF_THEN:%.*]]			; AVX-NEXT: br i1 [[CMP3]], label [[FOR_INC]], label [[IF_THEN:%.*]]
	; AVX: if.then:			; AVX: if.then:
	; AVX-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds double, double [[OUT]], i64 [[INDVARS_IV]]			; AVX-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds double, double [[OUT]], i64 [[INDVARS_IV]]
	; AVX-NEXT: store double 5.000000e-01, double* [[ARRAYIDX5]], align 8			; AVX-NEXT: store double 5.000000e-01, double* [[ARRAYIDX5]], align 8
	; AVX-NEXT: br label [[FOR_INC]]			; AVX-NEXT: br label [[FOR_INC]]
	; AVX: for.inc:			; AVX: for.inc:
	; AVX-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1			; AVX-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; AVX-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[WIDE_TRIP_COUNT]]			; AVX-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[WIDE_TRIP_COUNT]]
	Show All 21 Lines
	; AVX512-NEXT: [[TMP3:%.]] = bitcast i8 [[TMP2]] to <8 x i8>*			; AVX512-NEXT: [[TMP3:%.]] = bitcast i8 [[TMP2]] to <8 x i8>*
	; AVX512-NEXT: [[WIDE_LOAD10:%.]] = load <8 x i8>, <8 x i8> [[TMP3]], align 1			; AVX512-NEXT: [[WIDE_LOAD10:%.]] = load <8 x i8>, <8 x i8> [[TMP3]], align 1
	; AVX512-NEXT: [[TMP4:%.]] = getelementptr inbounds i8, i8 [[TMP0]], i64 16			; AVX512-NEXT: [[TMP4:%.]] = getelementptr inbounds i8, i8 [[TMP0]], i64 16
	; AVX512-NEXT: [[TMP5:%.]] = bitcast i8 [[TMP4]] to <8 x i8>*			; AVX512-NEXT: [[TMP5:%.]] = bitcast i8 [[TMP4]] to <8 x i8>*
	; AVX512-NEXT: [[WIDE_LOAD11:%.]] = load <8 x i8>, <8 x i8> [[TMP5]], align 1			; AVX512-NEXT: [[WIDE_LOAD11:%.]] = load <8 x i8>, <8 x i8> [[TMP5]], align 1
	; AVX512-NEXT: [[TMP6:%.]] = getelementptr inbounds i8, i8 [[TMP0]], i64 24			; AVX512-NEXT: [[TMP6:%.]] = getelementptr inbounds i8, i8 [[TMP0]], i64 24
	; AVX512-NEXT: [[TMP7:%.]] = bitcast i8 [[TMP6]] to <8 x i8>*			; AVX512-NEXT: [[TMP7:%.]] = bitcast i8 [[TMP6]] to <8 x i8>*
	; AVX512-NEXT: [[WIDE_LOAD12:%.]] = load <8 x i8>, <8 x i8> [[TMP7]], align 1			; AVX512-NEXT: [[WIDE_LOAD12:%.]] = load <8 x i8>, <8 x i8> [[TMP7]], align 1
	; AVX512-NEXT: [[TMP8:%.*]] = and <8 x i8> [[WIDE_LOAD]], <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>			; AVX512-NEXT: [[TMP8:%.*]] = trunc <8 x i8> [[WIDE_LOAD]] to <8 x i1>
	; AVX512-NEXT: [[TMP9:%.*]] = and <8 x i8> [[WIDE_LOAD10]], <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>			; AVX512-NEXT: [[TMP9:%.*]] = trunc <8 x i8> [[WIDE_LOAD10]] to <8 x i1>
	; AVX512-NEXT: [[TMP10:%.*]] = and <8 x i8> [[WIDE_LOAD11]], <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>			; AVX512-NEXT: [[TMP10:%.*]] = trunc <8 x i8> [[WIDE_LOAD11]] to <8 x i1>
	; AVX512-NEXT: [[TMP11:%.*]] = and <8 x i8> [[WIDE_LOAD12]], <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>			; AVX512-NEXT: [[TMP11:%.*]] = trunc <8 x i8> [[WIDE_LOAD12]] to <8 x i1>
	; AVX512-NEXT: [[TMP12:%.*]] = icmp ne <8 x i8> [[TMP8]], zeroinitializer			; AVX512-NEXT: [[TMP12:%.]] = getelementptr inbounds i32 (), i32 ()** [[IN:%.*]], i64 [[INDEX]]
	; AVX512-NEXT: [[TMP13:%.*]] = icmp ne <8 x i8> [[TMP9]], zeroinitializer			; AVX512-NEXT: [[TMP13:%.]] = bitcast i32 ()* [[TMP12]] to <8 x i32 ()>
	; AVX512-NEXT: [[TMP14:%.*]] = icmp ne <8 x i8> [[TMP10]], zeroinitializer			; AVX512-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <8 x i32 ()> @llvm.masked.load.v8p0f_i32f.p0v8p0f_i32f(<8 x i32 ()> [[TMP13]], i32 8, <8 x i1> [[TMP8]], <8 x i32 ()*> undef)
	; AVX512-NEXT: [[TMP15:%.*]] = icmp ne <8 x i8> [[TMP11]], zeroinitializer			; AVX512-NEXT: [[TMP14:%.]] = getelementptr inbounds i32 (), i32 ()** [[TMP12]], i64 8
	; AVX512-NEXT: [[TMP16:%.]] = getelementptr inbounds i32 (), i32 ()** [[IN:%.*]], i64 [[INDEX]]			; AVX512-NEXT: [[TMP15:%.]] = bitcast i32 ()* [[TMP14]] to <8 x i32 ()>
				; AVX512-NEXT: [[WIDE_MASKED_LOAD13:%.]] = call <8 x i32 ()> @llvm.masked.load.v8p0f_i32f.p0v8p0f_i32f(<8 x i32 ()> [[TMP15]], i32 8, <8 x i1> [[TMP9]], <8 x i32 ()*> undef)
				; AVX512-NEXT: [[TMP16:%.]] = getelementptr inbounds i32 (), i32 ()** [[TMP12]], i64 16
	; AVX512-NEXT: [[TMP17:%.]] = bitcast i32 ()* [[TMP16]] to <8 x i32 ()>			; AVX512-NEXT: [[TMP17:%.]] = bitcast i32 ()* [[TMP16]] to <8 x i32 ()>
	; AVX512-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <8 x i32 ()> @llvm.masked.load.v8p0f_i32f.p0v8p0f_i32f(<8 x i32 ()> [[TMP17]], i32 8, <8 x i1> [[TMP12]], <8 x i32 ()*> undef)			; AVX512-NEXT: [[WIDE_MASKED_LOAD14:%.]] = call <8 x i32 ()> @llvm.masked.load.v8p0f_i32f.p0v8p0f_i32f(<8 x i32 ()> [[TMP17]], i32 8, <8 x i1> [[TMP10]], <8 x i32 ()*> undef)
	; AVX512-NEXT: [[TMP18:%.]] = getelementptr inbounds i32 (), i32 ()** [[TMP16]], i64 8			; AVX512-NEXT: [[TMP18:%.]] = getelementptr inbounds i32 (), i32 ()** [[TMP12]], i64 24
	; AVX512-NEXT: [[TMP19:%.]] = bitcast i32 ()* [[TMP18]] to <8 x i32 ()>			; AVX512-NEXT: [[TMP19:%.]] = bitcast i32 ()* [[TMP18]] to <8 x i32 ()>
	; AVX512-NEXT: [[WIDE_MASKED_LOAD13:%.]] = call <8 x i32 ()> @llvm.masked.load.v8p0f_i32f.p0v8p0f_i32f(<8 x i32 ()> [[TMP19]], i32 8, <8 x i1> [[TMP13]], <8 x i32 ()*> undef)			; AVX512-NEXT: [[WIDE_MASKED_LOAD15:%.]] = call <8 x i32 ()> @llvm.masked.load.v8p0f_i32f.p0v8p0f_i32f(<8 x i32 ()> [[TMP19]], i32 8, <8 x i1> [[TMP11]], <8 x i32 ()*> undef)
	; AVX512-NEXT: [[TMP20:%.]] = getelementptr inbounds i32 (), i32 ()** [[TMP16]], i64 16			; AVX512-NEXT: [[TMP20:%.]] = icmp ne <8 x i32 ()> [[WIDE_MASKED_LOAD]], zeroinitializer
	; AVX512-NEXT: [[TMP21:%.]] = bitcast i32 ()* [[TMP20]] to <8 x i32 ()>			; AVX512-NEXT: [[TMP21:%.]] = icmp ne <8 x i32 ()> [[WIDE_MASKED_LOAD13]], zeroinitializer
	; AVX512-NEXT: [[WIDE_MASKED_LOAD14:%.]] = call <8 x i32 ()> @llvm.masked.load.v8p0f_i32f.p0v8p0f_i32f(<8 x i32 ()> [[TMP21]], i32 8, <8 x i1> [[TMP14]], <8 x i32 ()*> undef)			; AVX512-NEXT: [[TMP22:%.]] = icmp ne <8 x i32 ()> [[WIDE_MASKED_LOAD14]], zeroinitializer
	; AVX512-NEXT: [[TMP22:%.]] = getelementptr inbounds i32 (), i32 ()** [[TMP16]], i64 24			; AVX512-NEXT: [[TMP23:%.]] = icmp ne <8 x i32 ()> [[WIDE_MASKED_LOAD15]], zeroinitializer
	; AVX512-NEXT: [[TMP23:%.]] = bitcast i32 ()* [[TMP22]] to <8 x i32 ()>			; AVX512-NEXT: [[TMP24:%.]] = getelementptr inbounds double, double [[OUT:%.*]], i64 [[INDEX]]
	; AVX512-NEXT: [[WIDE_MASKED_LOAD15:%.]] = call <8 x i32 ()> @llvm.masked.load.v8p0f_i32f.p0v8p0f_i32f(<8 x i32 ()> [[TMP23]], i32 8, <8 x i1> [[TMP15]], <8 x i32 ()*> undef)			; AVX512-NEXT: [[TMP25:%.*]] = and <8 x i1> [[TMP20]], [[TMP8]]
	; AVX512-NEXT: [[TMP24:%.]] = icmp ne <8 x i32 ()> [[WIDE_MASKED_LOAD]], zeroinitializer			; AVX512-NEXT: [[TMP26:%.*]] = and <8 x i1> [[TMP21]], [[TMP9]]
	; AVX512-NEXT: [[TMP25:%.]] = icmp ne <8 x i32 ()> [[WIDE_MASKED_LOAD13]], zeroinitializer			; AVX512-NEXT: [[TMP27:%.*]] = and <8 x i1> [[TMP22]], [[TMP10]]
	; AVX512-NEXT: [[TMP26:%.]] = icmp ne <8 x i32 ()> [[WIDE_MASKED_LOAD14]], zeroinitializer			; AVX512-NEXT: [[TMP28:%.*]] = and <8 x i1> [[TMP23]], [[TMP11]]
	; AVX512-NEXT: [[TMP27:%.]] = icmp ne <8 x i32 ()> [[WIDE_MASKED_LOAD15]], zeroinitializer			; AVX512-NEXT: [[TMP29:%.]] = bitcast double [[TMP24]] to <8 x double>*
	; AVX512-NEXT: [[TMP28:%.]] = getelementptr inbounds double, double [[OUT:%.*]], i64 [[INDEX]]			; AVX512-NEXT: call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <8 x double>* [[TMP29]], i32 8, <8 x i1> [[TMP25]])
	; AVX512-NEXT: [[TMP29:%.*]] = and <8 x i1> [[TMP24]], [[TMP12]]			; AVX512-NEXT: [[TMP30:%.]] = getelementptr inbounds double, double [[TMP24]], i64 8
	; AVX512-NEXT: [[TMP30:%.*]] = and <8 x i1> [[TMP25]], [[TMP13]]			; AVX512-NEXT: [[TMP31:%.]] = bitcast double [[TMP30]] to <8 x double>*
	; AVX512-NEXT: [[TMP31:%.*]] = and <8 x i1> [[TMP26]], [[TMP14]]			; AVX512-NEXT: call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <8 x double>* [[TMP31]], i32 8, <8 x i1> [[TMP26]])
	; AVX512-NEXT: [[TMP32:%.*]] = and <8 x i1> [[TMP27]], [[TMP15]]			; AVX512-NEXT: [[TMP32:%.]] = getelementptr inbounds double, double [[TMP24]], i64 16
	; AVX512-NEXT: [[TMP33:%.]] = bitcast double [[TMP28]] to <8 x double>*			; AVX512-NEXT: [[TMP33:%.]] = bitcast double [[TMP32]] to <8 x double>*
	; AVX512-NEXT: call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <8 x double>* [[TMP33]], i32 8, <8 x i1> [[TMP29]])			; AVX512-NEXT: call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <8 x double>* [[TMP33]], i32 8, <8 x i1> [[TMP27]])
	; AVX512-NEXT: [[TMP34:%.]] = getelementptr inbounds double, double [[TMP28]], i64 8			; AVX512-NEXT: [[TMP34:%.]] = getelementptr inbounds double, double [[TMP24]], i64 24
	; AVX512-NEXT: [[TMP35:%.]] = bitcast double [[TMP34]] to <8 x double>*			; AVX512-NEXT: [[TMP35:%.]] = bitcast double [[TMP34]] to <8 x double>*
	; AVX512-NEXT: call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <8 x double>* [[TMP35]], i32 8, <8 x i1> [[TMP30]])			; AVX512-NEXT: call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <8 x double>* [[TMP35]], i32 8, <8 x i1> [[TMP28]])
	; AVX512-NEXT: [[TMP36:%.]] = getelementptr inbounds double, double [[TMP28]], i64 16
	; AVX512-NEXT: [[TMP37:%.]] = bitcast double [[TMP36]] to <8 x double>*
	; AVX512-NEXT: call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <8 x double>* [[TMP37]], i32 8, <8 x i1> [[TMP31]])
	; AVX512-NEXT: [[TMP38:%.]] = getelementptr inbounds double, double [[TMP28]], i64 24
	; AVX512-NEXT: [[TMP39:%.]] = bitcast double [[TMP38]] to <8 x double>*
	; AVX512-NEXT: call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <8 x double>* [[TMP39]], i32 8, <8 x i1> [[TMP32]])
	; AVX512-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 32			; AVX512-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 32
	; AVX512-NEXT: [[TMP40:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; AVX512-NEXT: [[TMP36:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; AVX512-NEXT: br i1 [[TMP40]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !66			; AVX512-NEXT: br i1 [[TMP36]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !66
	; AVX512: middle.block:			; AVX512: middle.block:
	; AVX512-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_VEC]], [[WIDE_TRIP_COUNT]]			; AVX512-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_VEC]], [[WIDE_TRIP_COUNT]]
	; AVX512-NEXT: br i1 [[CMP_N]], label [[FOR_END]], label [[FOR_BODY_PREHEADER16]]			; AVX512-NEXT: br i1 [[CMP_N]], label [[FOR_END]], label [[FOR_BODY_PREHEADER16]]
	; AVX512: for.body.preheader16:			; AVX512: for.body.preheader16:
	; AVX512-NEXT: [[INDVARS_IV_PH:%.*]] = phi i64 [ 0, [[FOR_BODY_PREHEADER]] ], [ [[N_VEC]], [[MIDDLE_BLOCK]] ]			; AVX512-NEXT: [[INDVARS_IV_PH:%.*]] = phi i64 [ 0, [[FOR_BODY_PREHEADER]] ], [ [[N_VEC]], [[MIDDLE_BLOCK]] ]
	; AVX512-NEXT: br label [[FOR_BODY:%.*]]			; AVX512-NEXT: br label [[FOR_BODY:%.*]]
	; AVX512: for.body:			; AVX512: for.body:
	; AVX512-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[INDVARS_IV_NEXT:%.]], [[FOR_INC:%.*]] ], [ [[INDVARS_IV_PH]], [[FOR_BODY_PREHEADER16]] ]			; AVX512-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[INDVARS_IV_NEXT:%.]], [[FOR_INC:%.*]] ], [ [[INDVARS_IV_PH]], [[FOR_BODY_PREHEADER16]] ]
	; AVX512-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i8, i8 [[TRIGGER]], i64 [[INDVARS_IV]]			; AVX512-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i8, i8 [[TRIGGER]], i64 [[INDVARS_IV]]
	; AVX512-NEXT: [[TMP41:%.]] = load i8, i8 [[ARRAYIDX]], align 1			; AVX512-NEXT: [[TMP37:%.]] = load i8, i8 [[ARRAYIDX]], align 1
	; AVX512-NEXT: [[TMP42:%.*]] = and i8 [[TMP41]], 1			; AVX512-NEXT: [[TMP38:%.*]] = and i8 [[TMP37]], 1
	; AVX512-NEXT: [[TOBOOL:%.*]] = icmp eq i8 [[TMP42]], 0			; AVX512-NEXT: [[TOBOOL:%.*]] = icmp eq i8 [[TMP38]], 0
	; AVX512-NEXT: br i1 [[TOBOOL]], label [[FOR_INC]], label [[LAND_LHS_TRUE:%.*]]			; AVX512-NEXT: br i1 [[TOBOOL]], label [[FOR_INC]], label [[LAND_LHS_TRUE:%.*]]
	; AVX512: land.lhs.true:			; AVX512: land.lhs.true:
	; AVX512-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds i32 (), i32 ()** [[IN]], i64 [[INDVARS_IV]]			; AVX512-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds i32 (), i32 ()** [[IN]], i64 [[INDVARS_IV]]
	; AVX512-NEXT: [[TMP43:%.]] = load i32 (), i32 ()** [[ARRAYIDX2]], align 8			; AVX512-NEXT: [[TMP39:%.]] = load i32 (), i32 ()** [[ARRAYIDX2]], align 8
	; AVX512-NEXT: [[CMP3:%.]] = icmp eq i32 () [[TMP43]], null			; AVX512-NEXT: [[CMP3:%.]] = icmp eq i32 () [[TMP39]], null
	; AVX512-NEXT: br i1 [[CMP3]], label [[FOR_INC]], label [[IF_THEN:%.*]]			; AVX512-NEXT: br i1 [[CMP3]], label [[FOR_INC]], label [[IF_THEN:%.*]]
	; AVX512: if.then:			; AVX512: if.then:
	; AVX512-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds double, double [[OUT]], i64 [[INDVARS_IV]]			; AVX512-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds double, double [[OUT]], i64 [[INDVARS_IV]]
	; AVX512-NEXT: store double 5.000000e-01, double* [[ARRAYIDX5]], align 8			; AVX512-NEXT: store double 5.000000e-01, double* [[ARRAYIDX5]], align 8
	; AVX512-NEXT: br label [[FOR_INC]]			; AVX512-NEXT: br label [[FOR_INC]]
	; AVX512: for.inc:			; AVX512: for.inc:
	; AVX512-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1			; AVX512-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; AVX512-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[WIDE_TRIP_COUNT]]			; AVX512-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[WIDE_TRIP_COUNT]]
	▲ Show 20 Lines • Show All 61 Lines • Show Last 20 Lines