This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
-
LoopVectorize.cpp
-
test/
-
Analysis/CostModel/X86/
-
CostModel/
-
X86/
-
masked-gather-i32-with-i8-index.ll
-
masked-gather-i64-with-i8-index.ll
-
masked-interleaved-load-i16.ll
-
masked-interleaved-store-i16.ll
-
masked-load-i16.ll
-
masked-load-i32.ll
-
masked-load-i64.ll
-
masked-load-i8.ll
-
Transforms/LoopVectorize/
-
LoopVectorize/
-
AArch64/
-
tail-fold-uniform-memops.ll
-
X86/
-
gather_scatter.ll
-
x86-interleaved-accesses-masked-group.ll
-
if-pred-stores.ll
-
memdep-fold-tail.ll
-
optsize.ll
-
tripcount.ll
-
vplan-sink-scalars-and-merge.ll

Differential D114779

[LV] Remove `LoopVectorizationCostModel::useEmulatedMaskMemRefHack()`
AbandonedPublic

Authored by lebedev.ri on Nov 30 2021, 1:11 AM.

Download Raw Diff

Details

Reviewers

rengolin
RKSimon
fhahn
hsaito
hjyamauchi
Ayal
sdesmalen
simoll
craig.topper

Commits

rG77a0da926c9e: [LV] Remove `LoopVectorizationCostModel::useEmulatedMaskMemRefHack()`

Summary

D43208 extracted useEmulatedMaskMemRefHack() from legality into cost model.
What it essentially does is prevents scalarized vectorization of masked memory operations:

// TODO: Cost model for emulated masked load/store is completely
// broken. This hack guides the cost model to use an artificially
// high enough value to practically disable vectorization with such
// operations, except where previously deployed legality hack allowed
// using very low cost values. This is to avoid regressions coming simply
// from moving "masked load/store" check from legality to cost model.
// Masked Load/Gather emulation was previously never allowed.
// Limited number of Masked Store/Scatter emulation was allowed.

While i don't really understand about what specifically is completely broken
was talking about, i believe that at least on X86 with AVX2-or-later,
this is no longer true. (or at least, i would like to know what is still broken).
So i would like to follow suit after D111460, and like wise disable that hack for AVX2+.

But since this was added for X86 specifically, let's just instead completely remove this hack.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

lebedev.ri created this revision.Nov 30 2021, 1:11 AM

Herald added subscribers: pengfei, arphaman, hiraditya. · View Herald TranscriptNov 30 2021, 1:11 AM

lebedev.ri requested review of this revision.Nov 30 2021, 1:11 AM

Harbormaster completed remote builds in B136625: Diff 390609.Nov 30 2021, 1:46 AM

Ok, so i'm looking at those optsize.ll/tripcount.ll tests, and i'm not sure what exactly they are testing.
They don't specify the triple/attributes, so what costs do they expect to get?

I think what was happening is that they relied on the fact that opt doesn't default to -march=native
and for no machine there is a support for masked gather/scatter/load/store for those types (i8) in baseline ISA,
so useEmulatedMaskMemRefHack() hack would always happen and the instructions(and thus the vectorization cost)
would be bogusly high, and that would prevent them from vectorizing?

I'm not really sure what to do about them. Is my analysis wrong?
Can perhaps someone familiar with those tests comment?

tripcount.ll was added in D32451 and modified in D42946;
cc @twoh, @tejohnson, @davidxl, @mtrofin, @yamauchi

Hide the problem by defaulting TargetTransformInfoImplBase::useEmulatedMaskMemRefHack() to true.
If no triple is specifically requested, that is the TTI that is used.
This highlights that those tests are somewhat of a lie,
and raises questions about the implementation of those features.

Harbormaster completed remote builds in B136999: Diff 391129.Dec 1 2021, 2:35 PM

In D114779#3165305, @lebedev.ri wrote:

Ok, so i'm looking at those optsize.ll/tripcount.ll tests, and i'm not sure what exactly they are testing.
They don't specify the triple/attributes, so what costs do they expect to get?

I think what was happening is that they relied on the fact that opt doesn't default to -march=native
and for no machine there is a support for masked gather/scatter/load/store for those types (i8) in baseline ISA,
so useEmulatedMaskMemRefHack() hack would always happen and the instructions(and thus the vectorization cost)
would be bogusly high, and that would prevent them from vectorizing?

I'm not really sure what to do about them. Is my analysis wrong?
Can perhaps someone familiar with those tests comment?

tripcount.ll was added in D32451 and modified in D42946;
cc @twoh, @tejohnson, @davidxl, @mtrofin, @yamauchi

IIRC, re D42946, the fix in that patch was fundamentally about trip counts not being computed correctly, and then the regression tests are variations of the pre-existing cases. I think you are correct, they all rely on there being a large cost to vectorization that makes it profitable only in certain cases (and the way the trip count is computed changes that)

I think the test should intentionally specify a triple, where the cost is high, and maybe we need replace the instructions with some with a high cost?

(I'd wait for others to chime in though, it's been a while since D42946 and that was basically my brief encounter with vectorization)

Make tests that broke X86-specific.
Ping.

Hm, don't clobber the existing tests though.

Harbormaster completed remote builds in B139407: Diff 394519.Dec 15 2021, 4:45 AM

ping

As per @fhahn's "someone more familiar with X86 cost modeling should review the claim the cost model is fixed :)"

@RKSimon given all the costmodel fixes, i claim that nowadays the relevant parts of
the X86 costmodel (at least as of AVX2+) are correct, and the hack is no longer needed.
(D111460 already landed under the same pretense.)

Do you agree with my assessment?

In D114779#3260815, @lebedev.ri wrote:

As per @fhahn's "someone more familiar with X86 cost modeling should review the claim the cost model is fixed :)"

Sorry - I haven't been following this ticket much at all. Where did @fhahn says this and in what context? I can't see that here or D43208

In D114779#3261615, @RKSimon wrote:

In D114779#3260815, @lebedev.ri wrote:

As per @fhahn's "someone more familiar with X86 cost modeling should review the claim the cost model is fixed :)"

Sorry - I haven't been following this ticket much at all. Where did @fhahn says this and in what context? I can't see that here or D43208

Sorry, that was an IRC disscussion.

lebedev.ri mentioned this in D115710: [LV][NFC] Update test checks using utils/update_test_checks.py.Jan 24 2022, 4:58 AM

sdesmalen added a subscriber: sdesmalen.Jan 24 2022, 5:58 AM

Post-branch ping.
I believe we can now proceed? :)

Harbormaster completed remote builds in B147073: Diff 405177.Feb 2 2022, 3:05 AM

ping

LGTM - based on the offline discussion with the various stakeholders this is the way to go, but please be ready to assist with regression cases that do come up

This revision is now accepted and ready to land.Feb 7 2022, 4:53 AM

In D114779#3300671, @RKSimon wrote:

LGTM - based on the offline discussion with the various stakeholders this is the way to go, but please be ready to assist with regression cases that do come up

Thanks! I'm going to land this now, given that we just branched
we have optimal headroom for dealing with the fallout here.

This revision was landed with ongoing or failed builds.Feb 7 2022, 5:09 AM

Closed by commit rG77a0da926c9e: [LV] Remove `LoopVectorizationCostModel::useEmulatedMaskMemRefHack()` (authored by lebedev.ri). · Explain Why

This revision was automatically updated to reflect the committed changes.

lebedev.ri added a commit: rG77a0da926c9e: [LV] Remove `LoopVectorizationCostModel::useEmulatedMaskMemRefHack()`.

Even though the initial intention of the "hack" might have been to prevent predication on some X86 subtargets, it was added in a way that affected all targets, so it could expose cost model issues on various targets if suddenly removed. I wasn't aware of this and just noticed it....has any performance runs been done on other targets (eg. Power)?

I'm interested - what kind of performance measuring did you do for this patch, and what were the numbers it gave?

Hi,

In D114779#3306221, @bmahjour wrote:

Even though the initial intention of the "hack" might have been to prevent predication on some X86 subtargets, it was added in a way that affected all targets, so it could expose cost model issues on various targets if suddenly removed. I wasn't aware of this and just noticed it....has any performance runs been done on other targets (eg. Power)?

In D114779#3308611, @dmgreen wrote:

I'm interested - what kind of performance measuring did you do for this patch, and what were the numbers it gave?

This is a correctness fix. It will not be at all surprising to learn that
some other architecture has started to unintentionally rely on this
erroneous behavior instead of implementing a correct and precise cost model.

If you have identified one of such places, please feel free to file a bug,
and optionally ask for a temporary revert until you've had a chance to fix said bug.
Though, it would be best to just fix the uncovered issues, it's not like the branch is soon.

dmgreen added a reverting change: rGb55d4c2ad8ea: Revert "[LV] Remove `LoopVectorizationCostModel::useEmulatedMaskMemRefHack()`".Feb 9 2022, 12:03 PM

OK Cool.

This is a correctness fix. It will not be at all surprising to learn that
some other architecture has started to unintentionally rely on this
erroneous behavior instead of implementing a correct and precise cost model.

Sure. I just think that one of the architectures relying on this, at least in places, is X86. I was honestly surprised this didn't hurt performance in more places, I thought it was still pretty important. Hence me asking what kind of performance you had run. But it looks from these very tests that it's still needed for some things and I got reports of a few more. The X86 cases did seem to be hidden more by other costs, like the cost of a vector mul being 6 :)

Could you point me to the bugreport that you have opened before temporarily reverting this? :)

In D114779#3308717, @dmgreen wrote:

OK Cool.

This is a correctness fix. It will not be at all surprising to learn that
some other architecture has started to unintentionally rely on this
erroneous behavior instead of implementing a correct and precise cost model.

Sure. I just think that one of the architectures relying on this, at least in places, is X86.

Could you please be more specific?

I was honestly surprised this didn't hurt performance in more places, I thought it was still pretty important. Hence me asking what kind of performance you had run.

But it looks from these very tests that it's still needed for some things

Could you please be more specific? :)

and I got reports of a few more. The X86 cases did seem to be hidden more by other costs, like the cost of a vector mul being 6 :)

lebedev.ri reopened this revision.Feb 9 2022, 12:20 PM

This revision is now accepted and ready to land.Feb 9 2022, 12:20 PM

Yeah sure, will do. But it will have to be tomorrow morning. It's been a very long day :(

I am still interested in what numbers you had for this change though?

bjope added a subscriber: bjope.Feb 10 2022, 12:30 AM

OK. I think the problems fit into 4 or maybe 5 different categories.

First up - there were some very large downstream regressions we had which certainly fit into the target-dependant bucket. We were allowing tail folding but no masked gather scatter, which was causing some very bad codegen from scalarized loads/stores. That's was simple enough to fix on Tuesday though by disallowing the tail folding in those cases. I would still hope that the cost model would handle it, but at least those problems are no more.

The next two most obvious from the tests here are with optsize and with low trip counts. The optsize tests in LoopVectorize/optsize.ll look both much larger, and much slower to me with all that branching. Not something that you ideally would want to do at Os. The tests in LoopVectorize/tripcount.ll also look worse to me, and is similar to one of the reports I got about this patch. A loop with a very low trip count is often not worth vectorizing, especially so if it is going to produce very inefficient predicated branching code.

The code in LoopVectorize/AArch64/tail-fold-uniform-memops.ll also looks worse to me. It's difficult to see why a target with a gather/scatter should choose to use predicated scalarized load/stores instead. But I haven't looked into the details. Perhaps that one fits into the target-indepentant cost model going wrong, but whatever it it sounds like something that should be fixed before we remove the hack.

Of the other cases I have, one may be similar to the low-trip-count cases. I can't really share the original, but it had a lot of other intrinsic code in it for producing matrix multiplies. There is hopefully a cut-down version here: https://godbolt.org/z/c311Y8j39, where its difficult to see that the vectorized version will be better with all those broadcasts/blends/branches, even if it is doing more per iteration. Some of these required specific targets for the problem to come up - they could easily be hidden by other costs, like the cost of a VF2 mul being 6 under base x86. That one is under skylake, the original was AArch64 and the performance was apparently upto 160% worse, even with all the other code in the original.

The last case is more straight forward with what is getting predicated, but I'm having trouble at the moment seeing why it isn't a problem for any target. The code has some predicated loads, like this: https://godbolt.org/z/E7PdYrn4T. The vectorization seems a lot worse with so many difficult to predict branches.

In that case the vplan it is executing looks like this:

VPlan 'Initial VPlan for VF={2,4},UF>=1' {
Live-in vp<%0> = vector-trip-count

<x1> vector loop: {
  for.body:
    EMIT vp<%1> = CANONICAL-INDUCTION
    WIDEN-INDUCTION %indvars.iv = phi 0, %indvars.iv.next
    WIDEN-REDUCTION-PHI ir<%nz.055> = phi ir<0>, ir<%or>
    CLONE ir<%arrayidx> = getelementptr ir<%dct>, ir<%indvars.iv>
    WIDEN ir<%0> = load ir<%arrayidx>
    WIDEN ir<%conv> = sext ir<%0>
    WIDEN ir<%cmp1> = icmp ir<%0>, ir<0>
    CLONE ir<%arrayidx4> = getelementptr ir<%bias>, ir<%indvars.iv>
    WIDEN ir<%1> = load ir<%arrayidx4>
    WIDEN ir<%conv5> = zext ir<%1>
  Successor(s): if.else

  if.else:
    WIDEN ir<%sub> = sub ir<%conv5>, ir<%conv>
    EMIT vp<%12> = not ir<%cmp1>
  Successor(s): pred.load

  <xVFxUF> pred.load: {
    pred.load.entry:
      BRANCH-ON-MASK vp<%12>
    Successor(s): pred.load.if, pred.load.continue
    CondBit: vp<%12> (if.else)

    pred.load.if:
      REPLICATE ir<%arrayidx22> = getelementptr ir<%mf>, ir<%indvars.iv>
      REPLICATE ir<%4> = load ir<%arrayidx22> (S->V)
    Successor(s): pred.load.continue

    pred.load.continue:
      PHI-PREDICATED-INSTRUCTION vp<%15> = ir<%4>
    No successors
  }
  Successor(s): if.else.0

  if.else.0:
    WIDEN ir<%conv23> = zext vp<%15>
    WIDEN ir<%mul24> = mul ir<%sub>, ir<%conv23>
    WIDEN ir<%5> = lshr ir<%mul24>, ir<16>
    WIDEN ir<%6> = trunc ir<%5>
    WIDEN ir<%conv27> = sub ir<0>, ir<%6>
  Successor(s): if.then

  if.then:
    WIDEN ir<%add> = add ir<%conv5>, ir<%conv>
  Successor(s): pred.load

  <xVFxUF> pred.load: {
    pred.load.entry:
      BRANCH-ON-MASK ir<%cmp1>
    Successor(s): pred.load.if, pred.load.continue
    CondBit: ir<%cmp1>

    pred.load.if:
      REPLICATE ir<%arrayidx10> = getelementptr ir<%mf>, ir<%indvars.iv>
      REPLICATE ir<%2> = load ir<%arrayidx10> (S->V)
    Successor(s): pred.load.continue

    pred.load.continue:
      PHI-PREDICATED-INSTRUCTION vp<%24> = ir<%2>
    No successors
  }
  Successor(s): if.then.0

  if.then.0:
    WIDEN ir<%conv11> = zext vp<%24>
    WIDEN ir<%mul> = mul ir<%add>, ir<%conv11>
    WIDEN ir<%3> = lshr ir<%mul>, ir<16>
    WIDEN ir<%conv12> = trunc ir<%3>
  Successor(s): if.end

  if.end:
    BLEND %storemerge = ir<%conv27>/vp<%12> ir<%conv12>/ir<%cmp1>
    WIDEN store ir<%arrayidx>, ir<%storemerge>
    WIDEN ir<%conv32> = sext ir<%storemerge>
    WIDEN ir<%or> = or ir<%nz.055>, ir<%conv32>
    EMIT vp<%33> = VF * UF +(nuw)  vp<%1>
    EMIT branch-on-count  vp<%33> vp<%0>
  No successors
}

The costs look like:

LV: Found an estimated cost of 0 for VF 4 For instruction:   %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %if.end ]
LV: Found an estimated cost of 0 for VF 4 For instruction:   %nz.055 = phi i32 [ 0, %entry ], [ %or, %if.end ]
LV: Found an estimated cost of 0 for VF 4 For instruction:   %arrayidx = getelementptr inbounds i16, i16* %dct, i64 %indvars.iv
LV: Found an estimated cost of 1 for VF 4 For instruction:   %0 = load i16, i16* %arrayidx, align 2, !tbaa !3
LV: Found an estimated cost of 1 for VF 4 For instruction:   %conv = sext i16 %0 to i32
LV: Found an estimated cost of 1 for VF 4 For instruction:   %cmp1 = icmp sgt i16 %0, 0
LV: Found an estimated cost of 0 for VF 4 For instruction:   %arrayidx4 = getelementptr inbounds i16, i16* %bias, i64 %indvars.iv
LV: Found an estimated cost of 1 for VF 4 For instruction:   %1 = load i16, i16* %arrayidx4, align 2, !tbaa !3
LV: Found an estimated cost of 1 for VF 4 For instruction:   %conv5 = zext i16 %1 to i32
LV: Found an estimated cost of 4 for VF 4 For instruction:   br i1 %cmp1, label %if.then, label %if.else
LV: Found an estimated cost of 1 for VF 4 For instruction:   %sub = sub nsw i32 %conv5, %conv
LV: Found an estimated cost of 0 for VF 4 For instruction:   %arrayidx22 = getelementptr inbounds i16, i16* %mf, i64 %indvars.iv
LV: Found an estimated cost of 4 for VF 4 For instruction:   %4 = load i16, i16* %arrayidx22, align 2, !tbaa !3
LV: Found an estimated cost of 1 for VF 4 For instruction:   %conv23 = zext i16 %4 to i32
LV: Found an estimated cost of 2 for VF 4 For instruction:   %mul24 = mul nsw i32 %sub, %conv23
LV: Found an estimated cost of 1 for VF 4 For instruction:   %5 = lshr i32 %mul24, 16
LV: Found an estimated cost of 2 for VF 4 For instruction:   %6 = trunc i32 %5 to i16
LV: Found an estimated cost of 1 for VF 4 For instruction:   %conv27 = sub i16 0, %6
LV: Found an estimated cost of 0 for VF 4 For instruction:   br label %if.end
LV: Found an estimated cost of 1 for VF 4 For instruction:   %add = add nsw i32 %conv5, %conv
LV: Found an estimated cost of 0 for VF 4 For instruction:   %arrayidx10 = getelementptr inbounds i16, i16* %mf, i64 %indvars.iv
LV: Found an estimated cost of 4 for VF 4 For instruction:   %2 = load i16, i16* %arrayidx10, align 2, !tbaa !3
LV: Found an estimated cost of 1 for VF 4 For instruction:   %conv11 = zext i16 %2 to i32
LV: Found an estimated cost of 2 for VF 4 For instruction:   %mul = mul nsw i32 %add, %conv11
LV: Found an estimated cost of 1 for VF 4 For instruction:   %3 = lshr i32 %mul, 16
LV: Found an estimated cost of 2 for VF 4 For instruction:   %conv12 = trunc i32 %3 to i16
LV: Found an estimated cost of 0 for VF 4 For instruction:   br label %if.end
LV: Found an estimated cost of 1 for VF 4 For instruction:   %storemerge = phi i16 [ %conv27, %if.else ], [ %conv12, %if.then ]
LV: Found an estimated cost of 1 for VF 4 For instruction:   store i16 %storemerge, i16* %arrayidx, align 2, !tbaa !3
LV: Found an estimated cost of 1 for VF 4 For instruction:   %conv32 = sext i16 %storemerge to i32
LV: Found an estimated cost of 1 for VF 4 For instruction:   %or = or i32 %nz.055, %conv32
LV: Found an estimated cost of 1 for VF 4 For instruction:   %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
LV: Found an estimated cost of 1 for VF 4 For instruction:   %exitcond.not = icmp eq i64 %indvars.iv.next, 16
LV: Found an estimated cost of 0 for VF 4 For instruction:   br i1 %exitcond.not, label %for.cond.cleanup, label %for.body, !llvm.loop !7
LV: Vector loop of width 4 costs: 9.
LV: Selecting VF: 4.

So the cost of all that extra branching is accounted for in either the br i1 %cmp1 or the costs of the loads? The cost of any branch is usually 0 in llvm, but adding this many difficult-to-predict branches into an inner loop is going to cause problems, no matter how good the core is at predicting them. The cost of the predicated scalarized loads comes from getMemInstScalarizationCost as far as I understand.

That is what I thought this "useEmulatedMaskMemRefHack" was protecting against - the fact that the costs via getMemInstScalarizationCost for predicated loads/stores wasn't really good enough. And that the vplan for predication can end up quite differently from the original code, but at current the costs are all just added up from the original instructions. I'm surprised this doesn't come up in more cases, to be honest. The cases we had where this was making things worse were not as wide-spread as I would have imagined they would be. But the -Os vectorization and low trip counts are pretty significant regressions.

The real best long term fix for this (that doesn't introduce other hacks) might be to properly implement a vplan-based cost-model in the vectorizer. So that it is really adding up the costs of the things that will be produced by the vectorizer, not trying to guess them from the original instructions. I suspect we may need something to add a cost for all the predicated branching too, although I'm not sure what exactly (what is the cost of a branch? :) ) There will be a point where the benefit of vectorization will make some branching profitable, so it would be great to remove the Hack if we can do so.

lebedev.ri abandoned this revision.Oct 18 2022, 5:49 PM

Herald added a project: Restricted Project. · View Herald TranscriptOct 18 2022, 5:49 PM

Herald added subscribers: • pcwang-thead, shiva0217, StephenFan. · View Herald Transcript

qianzhen added a subscriber: qianzhen.Oct 24 2022, 11:19 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

LoopVectorize.cpp

34 lines

test/

Analysis/

CostModel/

X86/

masked-gather-i32-with-i8-index.ll

40 lines

masked-gather-i64-with-i8-index.ll

40 lines

masked-interleaved-load-i16.ll

36 lines

masked-interleaved-store-i16.ll

24 lines

46 lines

16 lines

16 lines

46 lines

Transforms/

LoopVectorize/

AArch64/

tail-fold-uniform-memops.ll

159 lines

X86/

gather_scatter.ll

1176 lines

x86-interleaved-accesses-masked-group.ll

1041 lines

6 lines

6 lines

837 lines

673 lines

vplan-sink-scalars-and-merge.ll

4 lines

Diff 406405

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 301 Lines • ▼ Show 20 Lines	cl::desc(
"Enable runtime interleaving until load/store ports are saturated"));		"Enable runtime interleaving until load/store ports are saturated"));

/// Interleave small loops with scalar reductions.		/// Interleave small loops with scalar reductions.
static cl::opt<bool> InterleaveSmallLoopScalarReduction(		static cl::opt<bool> InterleaveSmallLoopScalarReduction(
"interleave-small-loop-scalar-reduction", cl::init(false), cl::Hidden,		"interleave-small-loop-scalar-reduction", cl::init(false), cl::Hidden,
cl::desc("Enable interleaving for loops with small iteration counts that "		cl::desc("Enable interleaving for loops with small iteration counts that "
"contain scalar reductions to expose ILP."));		"contain scalar reductions to expose ILP."));

/// The number of stores in a loop that are allowed to need predication.
static cl::opt<unsigned> NumberOfStoresToPredicate(
"vectorize-num-stores-pred", cl::init(1), cl::Hidden,
cl::desc("Max number of stores to be predicated behind an if."));

static cl::opt<bool> EnableIndVarRegisterHeur(		static cl::opt<bool> EnableIndVarRegisterHeur(
"enable-ind-var-reg-heur", cl::init(true), cl::Hidden,		"enable-ind-var-reg-heur", cl::init(true), cl::Hidden,
cl::desc("Count the induction variable only once when interleaving"));		cl::desc("Count the induction variable only once when interleaving"));

static cl::opt<bool> EnableCondStoresVectorization(		static cl::opt<bool> EnableCondStoresVectorization(
"enable-cond-stores-vec", cl::init(true), cl::Hidden,		"enable-cond-stores-vec", cl::init(true), cl::Hidden,
cl::desc("Enable if predication of stores during vectorization."));		cl::desc("Enable if predication of stores during vectorization."));

▲ Show 20 Lines • Show All 1,469 Lines • ▼ Show 20 Lines	private:
/// convenience wrapper for the type-based getScalarizationOverhead API.		/// convenience wrapper for the type-based getScalarizationOverhead API.
InstructionCost getScalarizationOverhead(Instruction *I,		InstructionCost getScalarizationOverhead(Instruction *I,
ElementCount VF) const;		ElementCount VF) const;

/// Returns whether the instruction is a load or store and will be a emitted		/// Returns whether the instruction is a load or store and will be a emitted
/// as a vector operation.		/// as a vector operation.
bool isConsecutiveLoadOrStore(Instruction *I);		bool isConsecutiveLoadOrStore(Instruction *I);

/// Returns true if an artificially high cost for emulated masked memrefs
/// should be used.
bool useEmulatedMaskMemRefHack(Instruction *I, ElementCount VF);

/// Map of scalar integer values to the smallest bitwidth they can be legally		/// Map of scalar integer values to the smallest bitwidth they can be legally
/// represented as. The vector equivalents of these values should be truncated		/// represented as. The vector equivalents of these values should be truncated
/// to this type.		/// to this type.
MapVector<Instruction *, uint64_t> MinBWs;		MapVector<Instruction *, uint64_t> MinBWs;

/// A type representing the costs for instructions if they were to be		/// A type representing the costs for instructions if they were to be
/// scalarized rather than vectorized. The entries are Instruction-Cost		/// scalarized rather than vectorized. The entries are Instruction-Cost
/// pairs.		/// pairs.
▲ Show 20 Lines • Show All 4,620 Lines • ▼ Show 20 Lines	for (unsigned i = 0, e = VFs.size(); i < e; ++i) {
RU.LoopInvariantRegs = Invariant;		RU.LoopInvariantRegs = Invariant;
RU.MaxLocalUsers = MaxUsages[i];		RU.MaxLocalUsers = MaxUsages[i];
RUs[i] = RU;		RUs[i] = RU;
}		}

return RUs;		return RUs;
}		}

bool LoopVectorizationCostModel::useEmulatedMaskMemRefHack(Instruction *I,
ElementCount VF) {
// TODO: Cost model for emulated masked load/store is completely
// broken. This hack guides the cost model to use an artificially
// high enough value to practically disable vectorization with such
// operations, except where previously deployed legality hack allowed
// using very low cost values. This is to avoid regressions coming simply
// from moving "masked load/store" check from legality to cost model.
// Masked Load/Gather emulation was previously never allowed.
// Limited number of Masked Store/Scatter emulation was allowed.
assert(isPredicatedInst(I, VF) && "Expecting a scalar emulated instruction");
return isa<LoadInst>(I) \|\|
(isa<StoreInst>(I) &&
NumPredStores > NumberOfStoresToPredicate);
}

void LoopVectorizationCostModel::collectInstsToScalarize(ElementCount VF) {		void LoopVectorizationCostModel::collectInstsToScalarize(ElementCount VF) {
// If we aren't vectorizing the loop, or if we've already collected the		// If we aren't vectorizing the loop, or if we've already collected the
// instructions to scalarize, there's nothing to do. Collection may already		// instructions to scalarize, there's nothing to do. Collection may already
// have occurred if we have a user-selected VF and are now computing the		// have occurred if we have a user-selected VF and are now computing the
// expected cost for interleaving.		// expected cost for interleaving.
if (VF.isScalar() \|\| VF.isZero() \|\|		if (VF.isScalar() \|\| VF.isZero() \|\|
InstsToScalarize.find(VF) != InstsToScalarize.end())		InstsToScalarize.find(VF) != InstsToScalarize.end())
return;		return;
Show All 9 Lines	void LoopVectorizationCostModel::collectInstsToScalarize(ElementCount VF) {
for (BasicBlock *BB : TheLoop->blocks()) {		for (BasicBlock *BB : TheLoop->blocks()) {
if (!blockNeedsPredicationForAnyReason(BB))		if (!blockNeedsPredicationForAnyReason(BB))
continue;		continue;
for (Instruction &I : *BB)		for (Instruction &I : *BB)
if (isScalarWithPredication(&I, VF)) {		if (isScalarWithPredication(&I, VF)) {
ScalarCostsTy ScalarCosts;		ScalarCostsTy ScalarCosts;
// Do not apply discount if scalable, because that would lead to		// Do not apply discount if scalable, because that would lead to
// invalid scalarization costs.		// invalid scalarization costs.
// Do not apply discount logic if hacked cost is needed		if (!VF.isScalable() &&
// for emulated masked memrefs.
if (!VF.isScalable() && !useEmulatedMaskMemRefHack(&I, VF) &&
computePredInstDiscount(&I, ScalarCosts, VF) >= 0)		computePredInstDiscount(&I, ScalarCosts, VF) >= 0)
ScalarCostsVF.insert(ScalarCosts.begin(), ScalarCosts.end());		ScalarCostsVF.insert(ScalarCosts.begin(), ScalarCosts.end());
// Remember that BB will remain after vectorization.		// Remember that BB will remain after vectorization.
PredicatedBBsAfterVectorization.insert(BB);		PredicatedBBsAfterVectorization.insert(BB);
}		}
}		}
}		}

▲ Show 20 Lines • Show All 239 Lines • ▼ Show 20 Lines	if (isPredicatedInst(I, VF)) {

// Add the cost of an i1 extract and a branch		// Add the cost of an i1 extract and a branch
auto *Vec_i1Ty =		auto *Vec_i1Ty =
VectorType::get(IntegerType::getInt1Ty(ValTy->getContext()), VF);		VectorType::get(IntegerType::getInt1Ty(ValTy->getContext()), VF);
Cost += TTI.getScalarizationOverhead(		Cost += TTI.getScalarizationOverhead(
Vec_i1Ty, APInt::getAllOnes(VF.getKnownMinValue()),		Vec_i1Ty, APInt::getAllOnes(VF.getKnownMinValue()),
/Insert=/false, /Extract=/true);		/Insert=/false, /Extract=/true);
Cost += TTI.getCFInstrCost(Instruction::Br, TTI::TCK_RecipThroughput);		Cost += TTI.getCFInstrCost(Instruction::Br, TTI::TCK_RecipThroughput);

if (useEmulatedMaskMemRefHack(I, VF))
// Artificially setting to a high enough value to practically disable
// vectorization with such operations.
Cost = 3000000;
}		}

return Cost;		return Cost;
}		}

InstructionCost		InstructionCost
LoopVectorizationCostModel::getConsecutiveMemOpCost(Instruction *I,		LoopVectorizationCostModel::getConsecutiveMemOpCost(Instruction *I,
ElementCount VF) {		ElementCount VF) {
▲ Show 20 Lines • Show All 4,060 Lines • Show Last 20 Lines

llvm/test/Analysis/CostModel/X86/masked-gather-i32-with-i8-index.ll

	Show All 11 Lines

	@A = global [1024 x i8] zeroinitializer, align 128			@A = global [1024 x i8] zeroinitializer, align 128
	@B = global [1024 x i32] zeroinitializer, align 128			@B = global [1024 x i32] zeroinitializer, align 128
	@C = global [1024 x i32] zeroinitializer, align 128			@C = global [1024 x i32] zeroinitializer, align 128

	; CHECK: LV: Checking a loop in "test"			; CHECK: LV: Checking a loop in "test"
	;			;
	; SSE2: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i32, i32* %inB, align 4			; SSE2: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i32, i32* %inB, align 4
	; SSE2: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i32, i32* %inB, align 4			; SSE2: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i32, i32* %inB, align 4
	; SSE2: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i32, i32* %inB, align 4			; SSE2: LV: Found an estimated cost of 5 for VF 4 For instruction: %valB.loaded = load i32, i32* %inB, align 4
	; SSE2: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i32, i32* %inB, align 4			; SSE2: LV: Found an estimated cost of 11 for VF 8 For instruction: %valB.loaded = load i32, i32* %inB, align 4
	; SSE2: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i32, i32* %inB, align 4			; SSE2: LV: Found an estimated cost of 22 for VF 16 For instruction: %valB.loaded = load i32, i32* %inB, align 4
	;			;
	; SSE42: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i32, i32* %inB, align 4			; SSE42: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i32, i32* %inB, align 4
	; SSE42: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i32, i32* %inB, align 4			; SSE42: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i32, i32* %inB, align 4
	; SSE42: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i32, i32* %inB, align 4			; SSE42: LV: Found an estimated cost of 5 for VF 4 For instruction: %valB.loaded = load i32, i32* %inB, align 4
	; SSE42: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i32, i32* %inB, align 4			; SSE42: LV: Found an estimated cost of 11 for VF 8 For instruction: %valB.loaded = load i32, i32* %inB, align 4
	; SSE42: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i32, i32* %inB, align 4			; SSE42: LV: Found an estimated cost of 22 for VF 16 For instruction: %valB.loaded = load i32, i32* %inB, align 4
	;			;
	; AVX1: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i32, i32* %inB, align 4			; AVX1: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i32, i32* %inB, align 4
	; AVX1: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i32, i32* %inB, align 4			; AVX1: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i32, i32* %inB, align 4
	; AVX1: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i32, i32* %inB, align 4			; AVX1: LV: Found an estimated cost of 4 for VF 4 For instruction: %valB.loaded = load i32, i32* %inB, align 4
	; AVX1: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i32, i32* %inB, align 4			; AVX1: LV: Found an estimated cost of 9 for VF 8 For instruction: %valB.loaded = load i32, i32* %inB, align 4
	; AVX1: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i32, i32* %inB, align 4			; AVX1: LV: Found an estimated cost of 18 for VF 16 For instruction: %valB.loaded = load i32, i32* %inB, align 4
	; AVX1: LV: Found an estimated cost of 3000000 for VF 32 For instruction: %valB.loaded = load i32, i32* %inB, align 4			; AVX1: LV: Found an estimated cost of 36 for VF 32 For instruction: %valB.loaded = load i32, i32* %inB, align 4
	;			;
	; AVX2-SLOWGATHER: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i32, i32* %inB, align 4			; AVX2-SLOWGATHER: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i32, i32* %inB, align 4
	; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i32, i32* %inB, align 4			; AVX2-SLOWGATHER: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i32, i32* %inB, align 4
	; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i32, i32* %inB, align 4			; AVX2-SLOWGATHER: LV: Found an estimated cost of 4 for VF 4 For instruction: %valB.loaded = load i32, i32* %inB, align 4
	; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i32, i32* %inB, align 4			; AVX2-SLOWGATHER: LV: Found an estimated cost of 9 for VF 8 For instruction: %valB.loaded = load i32, i32* %inB, align 4
	; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i32, i32* %inB, align 4			; AVX2-SLOWGATHER: LV: Found an estimated cost of 18 for VF 16 For instruction: %valB.loaded = load i32, i32* %inB, align 4
	; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 32 For instruction: %valB.loaded = load i32, i32* %inB, align 4			; AVX2-SLOWGATHER: LV: Found an estimated cost of 36 for VF 32 For instruction: %valB.loaded = load i32, i32* %inB, align 4
	;			;
	; AVX2-FASTGATHER: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i32, i32* %inB, align 4			; AVX2-FASTGATHER: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i32, i32* %inB, align 4
	; AVX2-FASTGATHER: LV: Found an estimated cost of 4 for VF 2 For instruction: %valB.loaded = load i32, i32* %inB, align 4			; AVX2-FASTGATHER: LV: Found an estimated cost of 4 for VF 2 For instruction: %valB.loaded = load i32, i32* %inB, align 4
	; AVX2-FASTGATHER: LV: Found an estimated cost of 6 for VF 4 For instruction: %valB.loaded = load i32, i32* %inB, align 4			; AVX2-FASTGATHER: LV: Found an estimated cost of 6 for VF 4 For instruction: %valB.loaded = load i32, i32* %inB, align 4
	; AVX2-FASTGATHER: LV: Found an estimated cost of 12 for VF 8 For instruction: %valB.loaded = load i32, i32* %inB, align 4			; AVX2-FASTGATHER: LV: Found an estimated cost of 12 for VF 8 For instruction: %valB.loaded = load i32, i32* %inB, align 4
	; AVX2-FASTGATHER: LV: Found an estimated cost of 24 for VF 16 For instruction: %valB.loaded = load i32, i32* %inB, align 4			; AVX2-FASTGATHER: LV: Found an estimated cost of 24 for VF 16 For instruction: %valB.loaded = load i32, i32* %inB, align 4
	; AVX2-FASTGATHER: LV: Found an estimated cost of 48 for VF 32 For instruction: %valB.loaded = load i32, i32* %inB, align 4			; AVX2-FASTGATHER: LV: Found an estimated cost of 48 for VF 32 For instruction: %valB.loaded = load i32, i32* %inB, align 4
	;			;
	; AVX512: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i32, i32* %inB, align 4			; AVX512: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i32, i32* %inB, align 4
	; AVX512: LV: Found an estimated cost of 10 for VF 2 For instruction: %valB.loaded = load i32, i32* %inB, align 4			; AVX512: LV: Found an estimated cost of 5 for VF 2 For instruction: %valB.loaded = load i32, i32* %inB, align 4
	; AVX512: LV: Found an estimated cost of 22 for VF 4 For instruction: %valB.loaded = load i32, i32* %inB, align 4			; AVX512: LV: Found an estimated cost of 11 for VF 4 For instruction: %valB.loaded = load i32, i32* %inB, align 4
	; AVX512: LV: Found an estimated cost of 10 for VF 8 For instruction: %valB.loaded = load i32, i32* %inB, align 4			; AVX512: LV: Found an estimated cost of 10 for VF 8 For instruction: %valB.loaded = load i32, i32* %inB, align 4
	; AVX512: LV: Found an estimated cost of 18 for VF 16 For instruction: %valB.loaded = load i32, i32* %inB, align 4			; AVX512: LV: Found an estimated cost of 18 for VF 16 For instruction: %valB.loaded = load i32, i32* %inB, align 4
	; AVX512: LV: Found an estimated cost of 36 for VF 32 For instruction: %valB.loaded = load i32, i32* %inB, align 4			; AVX512: LV: Found an estimated cost of 36 for VF 32 For instruction: %valB.loaded = load i32, i32* %inB, align 4
	; AVX512: LV: Found an estimated cost of 72 for VF 64 For instruction: %valB.loaded = load i32, i32* %inB, align 4			; AVX512: LV: Found an estimated cost of 72 for VF 64 For instruction: %valB.loaded = load i32, i32* %inB, align 4
	;			;
	; CHECK-NOT: LV: Found an estimated cost of {{[0-9]+}} for VF {{[0-9]+}} For instruction: %valB.loaded = load i32, i32* %inB, align 4			; CHECK-NOT: LV: Found an estimated cost of {{[0-9]+}} for VF {{[0-9]+}} For instruction: %valB.loaded = load i32, i32* %inB, align 4
	define void @test() {			define void @test() {
	entry:			entry:
	Show All 31 Lines

llvm/test/Analysis/CostModel/X86/masked-gather-i64-with-i8-index.ll

	Show All 11 Lines

	@A = global [1024 x i8] zeroinitializer, align 128			@A = global [1024 x i8] zeroinitializer, align 128
	@B = global [1024 x i64] zeroinitializer, align 128			@B = global [1024 x i64] zeroinitializer, align 128
	@C = global [1024 x i64] zeroinitializer, align 128			@C = global [1024 x i64] zeroinitializer, align 128

	; CHECK: LV: Checking a loop in "test"			; CHECK: LV: Checking a loop in "test"
	;			;
	; SSE2: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i64, i64* %inB, align 8			; SSE2: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i64, i64* %inB, align 8
	; SSE2: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i64, i64* %inB, align 8			; SSE2: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i64, i64* %inB, align 8
	; SSE2: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i64, i64* %inB, align 8			; SSE2: LV: Found an estimated cost of 5 for VF 4 For instruction: %valB.loaded = load i64, i64* %inB, align 8
	; SSE2: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i64, i64* %inB, align 8			; SSE2: LV: Found an estimated cost of 10 for VF 8 For instruction: %valB.loaded = load i64, i64* %inB, align 8
	; SSE2: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i64, i64* %inB, align 8			; SSE2: LV: Found an estimated cost of 20 for VF 16 For instruction: %valB.loaded = load i64, i64* %inB, align 8
	;			;
	; SSE42: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i64, i64* %inB, align 8			; SSE42: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i64, i64* %inB, align 8
	; SSE42: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i64, i64* %inB, align 8			; SSE42: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i64, i64* %inB, align 8
	; SSE42: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i64, i64* %inB, align 8			; SSE42: LV: Found an estimated cost of 5 for VF 4 For instruction: %valB.loaded = load i64, i64* %inB, align 8
	; SSE42: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i64, i64* %inB, align 8			; SSE42: LV: Found an estimated cost of 10 for VF 8 For instruction: %valB.loaded = load i64, i64* %inB, align 8
	; SSE42: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i64, i64* %inB, align 8			; SSE42: LV: Found an estimated cost of 20 for VF 16 For instruction: %valB.loaded = load i64, i64* %inB, align 8
	;			;
	; AVX1: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i64, i64* %inB, align 8			; AVX1: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i64, i64* %inB, align 8
	; AVX1: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i64, i64* %inB, align 8			; AVX1: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i64, i64* %inB, align 8
	; AVX1: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i64, i64* %inB, align 8			; AVX1: LV: Found an estimated cost of 5 for VF 4 For instruction: %valB.loaded = load i64, i64* %inB, align 8
	; AVX1: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i64, i64* %inB, align 8			; AVX1: LV: Found an estimated cost of 10 for VF 8 For instruction: %valB.loaded = load i64, i64* %inB, align 8
	; AVX1: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i64, i64* %inB, align 8			; AVX1: LV: Found an estimated cost of 20 for VF 16 For instruction: %valB.loaded = load i64, i64* %inB, align 8
	; AVX1: LV: Found an estimated cost of 3000000 for VF 32 For instruction: %valB.loaded = load i64, i64* %inB, align 8			; AVX1: LV: Found an estimated cost of 40 for VF 32 For instruction: %valB.loaded = load i64, i64* %inB, align 8
	;			;
	; AVX2-SLOWGATHER: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i64, i64* %inB, align 8			; AVX2-SLOWGATHER: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i64, i64* %inB, align 8
	; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i64, i64* %inB, align 8			; AVX2-SLOWGATHER: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i64, i64* %inB, align 8
	; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i64, i64* %inB, align 8			; AVX2-SLOWGATHER: LV: Found an estimated cost of 5 for VF 4 For instruction: %valB.loaded = load i64, i64* %inB, align 8
	; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i64, i64* %inB, align 8			; AVX2-SLOWGATHER: LV: Found an estimated cost of 10 for VF 8 For instruction: %valB.loaded = load i64, i64* %inB, align 8
	; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i64, i64* %inB, align 8			; AVX2-SLOWGATHER: LV: Found an estimated cost of 20 for VF 16 For instruction: %valB.loaded = load i64, i64* %inB, align 8
	; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 32 For instruction: %valB.loaded = load i64, i64* %inB, align 8			; AVX2-SLOWGATHER: LV: Found an estimated cost of 40 for VF 32 For instruction: %valB.loaded = load i64, i64* %inB, align 8
	;			;
	; AVX2-FASTGATHER: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i64, i64* %inB, align 8			; AVX2-FASTGATHER: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i64, i64* %inB, align 8
	; AVX2-FASTGATHER: LV: Found an estimated cost of 4 for VF 2 For instruction: %valB.loaded = load i64, i64* %inB, align 8			; AVX2-FASTGATHER: LV: Found an estimated cost of 4 for VF 2 For instruction: %valB.loaded = load i64, i64* %inB, align 8
	; AVX2-FASTGATHER: LV: Found an estimated cost of 6 for VF 4 For instruction: %valB.loaded = load i64, i64* %inB, align 8			; AVX2-FASTGATHER: LV: Found an estimated cost of 6 for VF 4 For instruction: %valB.loaded = load i64, i64* %inB, align 8
	; AVX2-FASTGATHER: LV: Found an estimated cost of 12 for VF 8 For instruction: %valB.loaded = load i64, i64* %inB, align 8			; AVX2-FASTGATHER: LV: Found an estimated cost of 12 for VF 8 For instruction: %valB.loaded = load i64, i64* %inB, align 8
	; AVX2-FASTGATHER: LV: Found an estimated cost of 24 for VF 16 For instruction: %valB.loaded = load i64, i64* %inB, align 8			; AVX2-FASTGATHER: LV: Found an estimated cost of 24 for VF 16 For instruction: %valB.loaded = load i64, i64* %inB, align 8
	; AVX2-FASTGATHER: LV: Found an estimated cost of 48 for VF 32 For instruction: %valB.loaded = load i64, i64* %inB, align 8			; AVX2-FASTGATHER: LV: Found an estimated cost of 48 for VF 32 For instruction: %valB.loaded = load i64, i64* %inB, align 8
	;			;
	; AVX512: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i64, i64* %inB, align 8			; AVX512: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i64, i64* %inB, align 8
	; AVX512: LV: Found an estimated cost of 10 for VF 2 For instruction: %valB.loaded = load i64, i64* %inB, align 8			; AVX512: LV: Found an estimated cost of 5 for VF 2 For instruction: %valB.loaded = load i64, i64* %inB, align 8
	; AVX512: LV: Found an estimated cost of 24 for VF 4 For instruction: %valB.loaded = load i64, i64* %inB, align 8			; AVX512: LV: Found an estimated cost of 12 for VF 4 For instruction: %valB.loaded = load i64, i64* %inB, align 8
	; AVX512: LV: Found an estimated cost of 10 for VF 8 For instruction: %valB.loaded = load i64, i64* %inB, align 8			; AVX512: LV: Found an estimated cost of 10 for VF 8 For instruction: %valB.loaded = load i64, i64* %inB, align 8
	; AVX512: LV: Found an estimated cost of 20 for VF 16 For instruction: %valB.loaded = load i64, i64* %inB, align 8			; AVX512: LV: Found an estimated cost of 20 for VF 16 For instruction: %valB.loaded = load i64, i64* %inB, align 8
	; AVX512: LV: Found an estimated cost of 40 for VF 32 For instruction: %valB.loaded = load i64, i64* %inB, align 8			; AVX512: LV: Found an estimated cost of 40 for VF 32 For instruction: %valB.loaded = load i64, i64* %inB, align 8
	; AVX512: LV: Found an estimated cost of 80 for VF 64 For instruction: %valB.loaded = load i64, i64* %inB, align 8			; AVX512: LV: Found an estimated cost of 80 for VF 64 For instruction: %valB.loaded = load i64, i64* %inB, align 8
	;			;
	; CHECK-NOT: LV: Found an estimated cost of {{[0-9]+}} for VF {{[0-9]+}} For instruction: %valB.loaded = load i64, i64* %inB, align 8			; CHECK-NOT: LV: Found an estimated cost of {{[0-9]+}} for VF {{[0-9]+}} For instruction: %valB.loaded = load i64, i64* %inB, align 8
	define void @test() {			define void @test() {
	entry:			entry:
	Show All 31 Lines

llvm/test/Analysis/CostModel/X86/masked-interleaved-load-i16.ll

	Show First 20 Lines • Show All 83 Lines • ▼ Show 20 Lines
	; y[i] = points[i*4 + 1];			; y[i] = points[i*4 + 1];
	; }			; }

	; DISABLED_MASKED_STRIDED: LV: Checking a loop in "test2"			; DISABLED_MASKED_STRIDED: LV: Checking a loop in "test2"
	;			;
	; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 1 For instruction: %i2 = load i16, i16* %arrayidx2, align 2			; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 1 For instruction: %i2 = load i16, i16* %arrayidx2, align 2
	; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 1 For instruction: %i4 = load i16, i16* %arrayidx7, align 2			; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 1 For instruction: %i4 = load i16, i16* %arrayidx7, align 2
	;			;
	; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %i2 = load i16, i16* %arrayidx2, align 2			; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 2 for VF 2 For instruction: %i2 = load i16, i16* %arrayidx2, align 2
	; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %i4 = load i16, i16* %arrayidx7, align 2			; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 2 for VF 2 For instruction: %i4 = load i16, i16* %arrayidx7, align 2
	;			;
	; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %i2 = load i16, i16* %arrayidx2, align 2			; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 4 for VF 4 For instruction: %i2 = load i16, i16* %arrayidx2, align 2
	; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %i4 = load i16, i16* %arrayidx7, align 2			; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 4 for VF 4 For instruction: %i4 = load i16, i16* %arrayidx7, align 2
	;			;
	; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %i2 = load i16, i16* %arrayidx2, align 2			; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 8 for VF 8 For instruction: %i2 = load i16, i16* %arrayidx2, align 2
	; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %i4 = load i16, i16* %arrayidx7, align 2			; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 8 for VF 8 For instruction: %i4 = load i16, i16* %arrayidx7, align 2
	;			;
	; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %i2 = load i16, i16* %arrayidx2, align 2			; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 17 for VF 16 For instruction: %i2 = load i16, i16* %arrayidx2, align 2
	; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %i4 = load i16, i16* %arrayidx7, align 2			; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 17 for VF 16 For instruction: %i4 = load i16, i16* %arrayidx7, align 2

	; ENABLED_MASKED_STRIDED: LV: Checking a loop in "test2"			; ENABLED_MASKED_STRIDED: LV: Checking a loop in "test2"
	;			;
	; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 1 For instruction: %i2 = load i16, i16* %arrayidx2, align 2			; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 1 For instruction: %i2 = load i16, i16* %arrayidx2, align 2
	; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 1 For instruction: %i4 = load i16, i16* %arrayidx7, align 2			; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 1 For instruction: %i4 = load i16, i16* %arrayidx7, align 2
	;			;
	; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 8 for VF 2 For instruction: %i2 = load i16, i16* %arrayidx2, align 2			; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 2 for VF 2 For instruction: %i2 = load i16, i16* %arrayidx2, align 2
	; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 0 for VF 2 For instruction: %i4 = load i16, i16* %arrayidx7, align 2			; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 0 for VF 2 For instruction: %i4 = load i16, i16* %arrayidx7, align 2
	;			;
	; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 11 for VF 4 For instruction: %i2 = load i16, i16* %arrayidx2, align 2			; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 4 for VF 4 For instruction: %i2 = load i16, i16* %arrayidx2, align 2
	; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 0 for VF 4 For instruction: %i4 = load i16, i16* %arrayidx7, align 2			; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 0 for VF 4 For instruction: %i4 = load i16, i16* %arrayidx7, align 2
	;			;
	; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 11 for VF 8 For instruction: %i2 = load i16, i16* %arrayidx2, align 2			; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 8 for VF 8 For instruction: %i2 = load i16, i16* %arrayidx2, align 2
	; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 0 for VF 8 For instruction: %i4 = load i16, i16* %arrayidx7, align 2			; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 0 for VF 8 For instruction: %i4 = load i16, i16* %arrayidx7, align 2
	;			;
	; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 17 for VF 16 For instruction: %i2 = load i16, i16* %arrayidx2, align 2			; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 17 for VF 16 For instruction: %i2 = load i16, i16* %arrayidx2, align 2
	; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 0 for VF 16 For instruction: %i4 = load i16, i16* %arrayidx7, align 2			; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 0 for VF 16 For instruction: %i4 = load i16, i16* %arrayidx7, align 2

	define void @test2(i16* noalias nocapture %points, i32 %numPoints, i16* noalias nocapture readonly %x, i16* noalias nocapture readonly %y) {			define void @test2(i16* noalias nocapture %points, i32 %numPoints, i16* noalias nocapture readonly %x, i16* noalias nocapture readonly %y) {
	entry:			entry:
	%cmp15 = icmp sgt i32 %numPoints, 0			%cmp15 = icmp sgt i32 %numPoints, 0
	Show All 35 Lines
	; for(i=0;i<1024;i++){			; for(i=0;i<1024;i++){
	; if (x[i] > 0)			; if (x[i] > 0)
	; x[i] = points[i*3];			; x[i] = points[i*3];
	; }			; }

	; DISABLED_MASKED_STRIDED: LV: Checking a loop in "test"			; DISABLED_MASKED_STRIDED: LV: Checking a loop in "test"
	;			;
	; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 1 For instruction: %i4 = load i16, i16* %arrayidx6, align 2			; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 1 For instruction: %i4 = load i16, i16* %arrayidx6, align 2
	; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %i4 = load i16, i16* %arrayidx6, align 2			; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 2 for VF 2 For instruction: %i4 = load i16, i16* %arrayidx6, align 2
	; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %i4 = load i16, i16* %arrayidx6, align 2			; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 4 for VF 4 For instruction: %i4 = load i16, i16* %arrayidx6, align 2
	; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %i4 = load i16, i16* %arrayidx6, align 2			; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 8 for VF 8 For instruction: %i4 = load i16, i16* %arrayidx6, align 2
	; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %i4 = load i16, i16* %arrayidx6, align 2			; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 17 for VF 16 For instruction: %i4 = load i16, i16* %arrayidx6, align 2

	; ENABLED_MASKED_STRIDED: LV: Checking a loop in "test"			; ENABLED_MASKED_STRIDED: LV: Checking a loop in "test"
	;			;
	; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 1 For instruction: %i4 = load i16, i16* %arrayidx6, align 2			; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 1 For instruction: %i4 = load i16, i16* %arrayidx6, align 2
	; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 7 for VF 2 For instruction: %i4 = load i16, i16* %arrayidx6, align 2			; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 2 for VF 2 For instruction: %i4 = load i16, i16* %arrayidx6, align 2
	; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 9 for VF 4 For instruction: %i4 = load i16, i16* %arrayidx6, align 2			; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 4 for VF 4 For instruction: %i4 = load i16, i16* %arrayidx6, align 2
	; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 9 for VF 8 For instruction: %i4 = load i16, i16* %arrayidx6, align 2			; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 8 for VF 8 For instruction: %i4 = load i16, i16* %arrayidx6, align 2
	; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 14 for VF 16 For instruction: %i4 = load i16, i16* %arrayidx6, align 2			; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 14 for VF 16 For instruction: %i4 = load i16, i16* %arrayidx6, align 2

	define void @test(i16* noalias nocapture %points, i16* noalias nocapture readonly %x, i16* noalias nocapture readnone %y) {			define void @test(i16* noalias nocapture %points, i16* noalias nocapture readonly %x, i16* noalias nocapture readnone %y) {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.inc ]			%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.inc ]
	Show All 20 Lines

llvm/test/Analysis/CostModel/X86/masked-interleaved-store-i16.ll

	Show First 20 Lines • Show All 83 Lines • ▼ Show 20 Lines
	; points[i*4 + 1] = y[i];			; points[i*4 + 1] = y[i];
	; }			; }

	; DISABLED_MASKED_STRIDED: LV: Checking a loop in "test2"			; DISABLED_MASKED_STRIDED: LV: Checking a loop in "test2"
	;			;
	; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 1 For instruction: store i16 %0, i16* %arrayidx2, align 2			; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 1 For instruction: store i16 %0, i16* %arrayidx2, align 2
	; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 1 For instruction: store i16 %2, i16* %arrayidx7, align 2			; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 1 For instruction: store i16 %2, i16* %arrayidx7, align 2
	;			;
	; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 5 for VF 2 For instruction: store i16 %0, i16* %arrayidx2, align 2			; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 2 for VF 2 For instruction: store i16 %0, i16* %arrayidx2, align 2
	; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 2 For instruction: store i16 %2, i16* %arrayidx7, align 2			; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 2 for VF 2 For instruction: store i16 %2, i16* %arrayidx7, align 2
	;			;
	; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 11 for VF 4 For instruction: store i16 %0, i16* %arrayidx2, align 2			; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 4 for VF 4 For instruction: store i16 %0, i16* %arrayidx2, align 2
	; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 4 For instruction: store i16 %2, i16* %arrayidx7, align 2			; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 4 for VF 4 For instruction: store i16 %2, i16* %arrayidx7, align 2
	;			;
	; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 23 for VF 8 For instruction: store i16 %0, i16* %arrayidx2, align 2			; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 8 for VF 8 For instruction: store i16 %0, i16* %arrayidx2, align 2
	; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 8 For instruction: store i16 %2, i16* %arrayidx7, align 2			; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 8 for VF 8 For instruction: store i16 %2, i16* %arrayidx7, align 2
	;			;
	; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 50 for VF 16 For instruction: store i16 %0, i16* %arrayidx2, align 2			; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 20 for VF 16 For instruction: store i16 %0, i16* %arrayidx2, align 2
	; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 16 For instruction: store i16 %2, i16* %arrayidx7, align 2			; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 20 for VF 16 For instruction: store i16 %2, i16* %arrayidx7, align 2

	; ENABLED_MASKED_STRIDED: LV: Checking a loop in "test2"			; ENABLED_MASKED_STRIDED: LV: Checking a loop in "test2"
	;			;
	; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 1 For instruction: store i16 %0, i16* %arrayidx2, align 2			; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 1 For instruction: store i16 %0, i16* %arrayidx2, align 2
	; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 1 For instruction: store i16 %2, i16* %arrayidx7, align 2			; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 1 For instruction: store i16 %2, i16* %arrayidx7, align 2
	;			;
	; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 0 for VF 2 For instruction: store i16 %0, i16* %arrayidx2, align 2			; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 0 for VF 2 For instruction: store i16 %0, i16* %arrayidx2, align 2
	; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 10 for VF 2 For instruction: store i16 %2, i16* %arrayidx7, align 2			; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 2 for VF 2 For instruction: store i16 %2, i16* %arrayidx7, align 2
	;			;
	; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 0 for VF 4 For instruction: store i16 %0, i16* %arrayidx2, align 2			; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 0 for VF 4 For instruction: store i16 %0, i16* %arrayidx2, align 2
	; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 14 for VF 4 For instruction: store i16 %2, i16* %arrayidx7, align 2			; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 4 for VF 4 For instruction: store i16 %2, i16* %arrayidx7, align 2
	;			;
	; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 0 for VF 8 For instruction: store i16 %0, i16* %arrayidx2, align 2			; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 0 for VF 8 For instruction: store i16 %0, i16* %arrayidx2, align 2
	; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 14 for VF 8 For instruction: store i16 %2, i16* %arrayidx7, align 2			; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 8 for VF 8 For instruction: store i16 %2, i16* %arrayidx7, align 2
	;			;
	; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 0 for VF 16 For instruction: store i16 %0, i16* %arrayidx2, align 2			; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 0 for VF 16 For instruction: store i16 %0, i16* %arrayidx2, align 2
	; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 27 for VF 16 For instruction: store i16 %2, i16* %arrayidx7, align 2			; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 20 for VF 16 For instruction: store i16 %2, i16* %arrayidx7, align 2

	define void @test2(i16* noalias nocapture %points, i32 %numPoints, i16* noalias nocapture readonly %x, i16* noalias nocapture readonly %y) {			define void @test2(i16* noalias nocapture %points, i32 %numPoints, i16* noalias nocapture readonly %x, i16* noalias nocapture readonly %y) {
	entry:			entry:
	%cmp15 = icmp sgt i32 %numPoints, 0			%cmp15 = icmp sgt i32 %numPoints, 0
	br i1 %cmp15, label %for.body.preheader, label %for.end			br i1 %cmp15, label %for.body.preheader, label %for.end

	for.body.preheader:			for.body.preheader:
	%wide.trip.count = zext i32 %numPoints to i64			%wide.trip.count = zext i32 %numPoints to i64
	▲ Show 20 Lines • Show All 77 Lines • Show Last 20 Lines

llvm/test/Analysis/CostModel/X86/masked-load-i16.ll

	Show All 10 Lines
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	@A = global [1024 x i8] zeroinitializer, align 128			@A = global [1024 x i8] zeroinitializer, align 128
	@C = global [1024 x i16] zeroinitializer, align 128			@C = global [1024 x i16] zeroinitializer, align 128

	; CHECK: LV: Checking a loop in "test"			; CHECK: LV: Checking a loop in "test"
	;			;
	; SSE2: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i16, i16* %inB, align 2			; SSE2: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i16, i16* %inB, align 2
	; SSE2: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i16, i16* %inB, align 2			; SSE2: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i16, i16* %inB, align 2
	; SSE2: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i16, i16* %inB, align 2			; SSE2: LV: Found an estimated cost of 4 for VF 4 For instruction: %valB.loaded = load i16, i16* %inB, align 2
	; SSE2: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i16, i16* %inB, align 2			; SSE2: LV: Found an estimated cost of 8 for VF 8 For instruction: %valB.loaded = load i16, i16* %inB, align 2
	; SSE2: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i16, i16* %inB, align 2			; SSE2: LV: Found an estimated cost of 16 for VF 16 For instruction: %valB.loaded = load i16, i16* %inB, align 2
	;			;
	; SSE42: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i16, i16* %inB, align 2			; SSE42: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i16, i16* %inB, align 2
	; SSE42: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i16, i16* %inB, align 2			; SSE42: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i16, i16* %inB, align 2
	; SSE42: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i16, i16* %inB, align 2			; SSE42: LV: Found an estimated cost of 4 for VF 4 For instruction: %valB.loaded = load i16, i16* %inB, align 2
	; SSE42: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i16, i16* %inB, align 2			; SSE42: LV: Found an estimated cost of 8 for VF 8 For instruction: %valB.loaded = load i16, i16* %inB, align 2
	; SSE42: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i16, i16* %inB, align 2			; SSE42: LV: Found an estimated cost of 16 for VF 16 For instruction: %valB.loaded = load i16, i16* %inB, align 2
	;			;
	; AVX1: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i16, i16* %inB, align 2			; AVX1: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i16, i16* %inB, align 2
	; AVX1: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i16, i16* %inB, align 2			; AVX1: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i16, i16* %inB, align 2
	; AVX1: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i16, i16* %inB, align 2			; AVX1: LV: Found an estimated cost of 4 for VF 4 For instruction: %valB.loaded = load i16, i16* %inB, align 2
	; AVX1: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i16, i16* %inB, align 2			; AVX1: LV: Found an estimated cost of 8 for VF 8 For instruction: %valB.loaded = load i16, i16* %inB, align 2
	; AVX1: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i16, i16* %inB, align 2			; AVX1: LV: Found an estimated cost of 17 for VF 16 For instruction: %valB.loaded = load i16, i16* %inB, align 2
	; AVX1: LV: Found an estimated cost of 3000000 for VF 32 For instruction: %valB.loaded = load i16, i16* %inB, align 2			; AVX1: LV: Found an estimated cost of 34 for VF 32 For instruction: %valB.loaded = load i16, i16* %inB, align 2
	;			;
	; AVX2-SLOWGATHER: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i16, i16* %inB, align 2			; AVX2-SLOWGATHER: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i16, i16* %inB, align 2
	; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i16, i16* %inB, align 2			; AVX2-SLOWGATHER: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i16, i16* %inB, align 2
	; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i16, i16* %inB, align 2			; AVX2-SLOWGATHER: LV: Found an estimated cost of 4 for VF 4 For instruction: %valB.loaded = load i16, i16* %inB, align 2
	; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i16, i16* %inB, align 2			; AVX2-SLOWGATHER: LV: Found an estimated cost of 8 for VF 8 For instruction: %valB.loaded = load i16, i16* %inB, align 2
	; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i16, i16* %inB, align 2			; AVX2-SLOWGATHER: LV: Found an estimated cost of 17 for VF 16 For instruction: %valB.loaded = load i16, i16* %inB, align 2
	; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 32 For instruction: %valB.loaded = load i16, i16* %inB, align 2			; AVX2-SLOWGATHER: LV: Found an estimated cost of 34 for VF 32 For instruction: %valB.loaded = load i16, i16* %inB, align 2
	;			;
	; AVX2-FASTGATHER: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i16, i16* %inB, align 2			; AVX2-FASTGATHER: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i16, i16* %inB, align 2
	; AVX2-FASTGATHER: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i16, i16* %inB, align 2			; AVX2-FASTGATHER: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i16, i16* %inB, align 2
	; AVX2-FASTGATHER: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i16, i16* %inB, align 2			; AVX2-FASTGATHER: LV: Found an estimated cost of 4 for VF 4 For instruction: %valB.loaded = load i16, i16* %inB, align 2
	; AVX2-FASTGATHER: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i16, i16* %inB, align 2			; AVX2-FASTGATHER: LV: Found an estimated cost of 8 for VF 8 For instruction: %valB.loaded = load i16, i16* %inB, align 2
	; AVX2-FASTGATHER: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i16, i16* %inB, align 2			; AVX2-FASTGATHER: LV: Found an estimated cost of 17 for VF 16 For instruction: %valB.loaded = load i16, i16* %inB, align 2
	; AVX2-FASTGATHER: LV: Found an estimated cost of 3000000 for VF 32 For instruction: %valB.loaded = load i16, i16* %inB, align 2			; AVX2-FASTGATHER: LV: Found an estimated cost of 34 for VF 32 For instruction: %valB.loaded = load i16, i16* %inB, align 2
	;			;
	; AVX512: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i16, i16* %inB, align 2			; AVX512: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i16, i16* %inB, align 2
	; AVX512: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i16, i16* %inB, align 2			; AVX512: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i16, i16* %inB, align 2
	; AVX512: LV: Found an estimated cost of 2 for VF 4 For instruction: %valB.loaded = load i16, i16* %inB, align 2			; AVX512: LV: Found an estimated cost of 2 for VF 4 For instruction: %valB.loaded = load i16, i16* %inB, align 2
	; AVX512: LV: Found an estimated cost of 1 for VF 8 For instruction: %valB.loaded = load i16, i16* %inB, align 2			; AVX512: LV: Found an estimated cost of 1 for VF 8 For instruction: %valB.loaded = load i16, i16* %inB, align 2
	; AVX512: LV: Found an estimated cost of 1 for VF 16 For instruction: %valB.loaded = load i16, i16* %inB, align 2			; AVX512: LV: Found an estimated cost of 1 for VF 16 For instruction: %valB.loaded = load i16, i16* %inB, align 2
	; AVX512: LV: Found an estimated cost of 1 for VF 32 For instruction: %valB.loaded = load i16, i16* %inB, align 2			; AVX512: LV: Found an estimated cost of 1 for VF 32 For instruction: %valB.loaded = load i16, i16* %inB, align 2
	; AVX512: LV: Found an estimated cost of 2 for VF 64 For instruction: %valB.loaded = load i16, i16* %inB, align 2			; AVX512: LV: Found an estimated cost of 2 for VF 64 For instruction: %valB.loaded = load i16, i16* %inB, align 2
	Show All 34 Lines

llvm/test/Analysis/CostModel/X86/masked-load-i32.ll

	Show All 10 Lines
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	@A = global [1024 x i8] zeroinitializer, align 128			@A = global [1024 x i8] zeroinitializer, align 128
	@C = global [1024 x i32] zeroinitializer, align 128			@C = global [1024 x i32] zeroinitializer, align 128

	; CHECK: LV: Checking a loop in "test"			; CHECK: LV: Checking a loop in "test"
	;			;
	; SSE2: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i32, i32* %inB, align 4			; SSE2: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i32, i32* %inB, align 4
	; SSE2: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i32, i32* %inB, align 4			; SSE2: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i32, i32* %inB, align 4
	; SSE2: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i32, i32* %inB, align 4			; SSE2: LV: Found an estimated cost of 5 for VF 4 For instruction: %valB.loaded = load i32, i32* %inB, align 4
	; SSE2: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i32, i32* %inB, align 4			; SSE2: LV: Found an estimated cost of 11 for VF 8 For instruction: %valB.loaded = load i32, i32* %inB, align 4
	; SSE2: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i32, i32* %inB, align 4			; SSE2: LV: Found an estimated cost of 22 for VF 16 For instruction: %valB.loaded = load i32, i32* %inB, align 4
	;			;
	; SSE42: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i32, i32* %inB, align 4			; SSE42: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i32, i32* %inB, align 4
	; SSE42: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i32, i32* %inB, align 4			; SSE42: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i32, i32* %inB, align 4
	; SSE42: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i32, i32* %inB, align 4			; SSE42: LV: Found an estimated cost of 5 for VF 4 For instruction: %valB.loaded = load i32, i32* %inB, align 4
	; SSE42: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i32, i32* %inB, align 4			; SSE42: LV: Found an estimated cost of 11 for VF 8 For instruction: %valB.loaded = load i32, i32* %inB, align 4
	; SSE42: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i32, i32* %inB, align 4			; SSE42: LV: Found an estimated cost of 22 for VF 16 For instruction: %valB.loaded = load i32, i32* %inB, align 4
	;			;
	; AVX1: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i32, i32* %inB, align 4			; AVX1: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i32, i32* %inB, align 4
	; AVX1: LV: Found an estimated cost of 3 for VF 2 For instruction: %valB.loaded = load i32, i32* %inB, align 4			; AVX1: LV: Found an estimated cost of 3 for VF 2 For instruction: %valB.loaded = load i32, i32* %inB, align 4
	; AVX1: LV: Found an estimated cost of 2 for VF 4 For instruction: %valB.loaded = load i32, i32* %inB, align 4			; AVX1: LV: Found an estimated cost of 2 for VF 4 For instruction: %valB.loaded = load i32, i32* %inB, align 4
	; AVX1: LV: Found an estimated cost of 2 for VF 8 For instruction: %valB.loaded = load i32, i32* %inB, align 4			; AVX1: LV: Found an estimated cost of 2 for VF 8 For instruction: %valB.loaded = load i32, i32* %inB, align 4
	; AVX1: LV: Found an estimated cost of 4 for VF 16 For instruction: %valB.loaded = load i32, i32* %inB, align 4			; AVX1: LV: Found an estimated cost of 4 for VF 16 For instruction: %valB.loaded = load i32, i32* %inB, align 4
	; AVX1: LV: Found an estimated cost of 8 for VF 32 For instruction: %valB.loaded = load i32, i32* %inB, align 4			; AVX1: LV: Found an estimated cost of 8 for VF 32 For instruction: %valB.loaded = load i32, i32* %inB, align 4
	;			;
	▲ Show 20 Lines • Show All 55 Lines • Show Last 20 Lines

llvm/test/Analysis/CostModel/X86/masked-load-i64.ll

	Show All 10 Lines
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	@A = global [1024 x i8] zeroinitializer, align 128			@A = global [1024 x i8] zeroinitializer, align 128
	@C = global [1024 x i64] zeroinitializer, align 128			@C = global [1024 x i64] zeroinitializer, align 128

	; CHECK: LV: Checking a loop in "test"			; CHECK: LV: Checking a loop in "test"
	;			;
	; SSE2: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i64, i64* %inB, align 8			; SSE2: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i64, i64* %inB, align 8
	; SSE2: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i64, i64* %inB, align 8			; SSE2: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i64, i64* %inB, align 8
	; SSE2: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i64, i64* %inB, align 8			; SSE2: LV: Found an estimated cost of 5 for VF 4 For instruction: %valB.loaded = load i64, i64* %inB, align 8
	; SSE2: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i64, i64* %inB, align 8			; SSE2: LV: Found an estimated cost of 10 for VF 8 For instruction: %valB.loaded = load i64, i64* %inB, align 8
	; SSE2: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i64, i64* %inB, align 8			; SSE2: LV: Found an estimated cost of 20 for VF 16 For instruction: %valB.loaded = load i64, i64* %inB, align 8
	;			;
	; SSE42: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i64, i64* %inB, align 8			; SSE42: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i64, i64* %inB, align 8
	; SSE42: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i64, i64* %inB, align 8			; SSE42: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i64, i64* %inB, align 8
	; SSE42: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i64, i64* %inB, align 8			; SSE42: LV: Found an estimated cost of 5 for VF 4 For instruction: %valB.loaded = load i64, i64* %inB, align 8
	; SSE42: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i64, i64* %inB, align 8			; SSE42: LV: Found an estimated cost of 10 for VF 8 For instruction: %valB.loaded = load i64, i64* %inB, align 8
	; SSE42: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i64, i64* %inB, align 8			; SSE42: LV: Found an estimated cost of 20 for VF 16 For instruction: %valB.loaded = load i64, i64* %inB, align 8
	;			;
	; AVX1: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i64, i64* %inB, align 8			; AVX1: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i64, i64* %inB, align 8
	; AVX1: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i64, i64* %inB, align 8			; AVX1: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i64, i64* %inB, align 8
	; AVX1: LV: Found an estimated cost of 2 for VF 4 For instruction: %valB.loaded = load i64, i64* %inB, align 8			; AVX1: LV: Found an estimated cost of 2 for VF 4 For instruction: %valB.loaded = load i64, i64* %inB, align 8
	; AVX1: LV: Found an estimated cost of 4 for VF 8 For instruction: %valB.loaded = load i64, i64* %inB, align 8			; AVX1: LV: Found an estimated cost of 4 for VF 8 For instruction: %valB.loaded = load i64, i64* %inB, align 8
	; AVX1: LV: Found an estimated cost of 8 for VF 16 For instruction: %valB.loaded = load i64, i64* %inB, align 8			; AVX1: LV: Found an estimated cost of 8 for VF 16 For instruction: %valB.loaded = load i64, i64* %inB, align 8
	; AVX1: LV: Found an estimated cost of 16 for VF 32 For instruction: %valB.loaded = load i64, i64* %inB, align 8			; AVX1: LV: Found an estimated cost of 16 for VF 32 For instruction: %valB.loaded = load i64, i64* %inB, align 8
	;			;
	▲ Show 20 Lines • Show All 55 Lines • Show Last 20 Lines

llvm/test/Analysis/CostModel/X86/masked-load-i8.ll

	Show All 10 Lines
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	@A = global [1024 x i8] zeroinitializer, align 128			@A = global [1024 x i8] zeroinitializer, align 128
	@C = global [1024 x i8] zeroinitializer, align 128			@C = global [1024 x i8] zeroinitializer, align 128

	; CHECK: LV: Checking a loop in "test"			; CHECK: LV: Checking a loop in "test"
	;			;
	; SSE2: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i8, i8* %inB, align 1			; SSE2: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i8, i8* %inB, align 1
	; SSE2: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i8, i8* %inB, align 1			; SSE2: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i8, i8* %inB, align 1
	; SSE2: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i8, i8* %inB, align 1			; SSE2: LV: Found an estimated cost of 5 for VF 4 For instruction: %valB.loaded = load i8, i8* %inB, align 1
	; SSE2: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i8, i8* %inB, align 1			; SSE2: LV: Found an estimated cost of 11 for VF 8 For instruction: %valB.loaded = load i8, i8* %inB, align 1
	; SSE2: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i8, i8* %inB, align 1			; SSE2: LV: Found an estimated cost of 23 for VF 16 For instruction: %valB.loaded = load i8, i8* %inB, align 1
	;			;
	; SSE42: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i8, i8* %inB, align 1			; SSE42: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i8, i8* %inB, align 1
	; SSE42: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i8, i8* %inB, align 1			; SSE42: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i8, i8* %inB, align 1
	; SSE42: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i8, i8* %inB, align 1			; SSE42: LV: Found an estimated cost of 5 for VF 4 For instruction: %valB.loaded = load i8, i8* %inB, align 1
	; SSE42: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i8, i8* %inB, align 1			; SSE42: LV: Found an estimated cost of 11 for VF 8 For instruction: %valB.loaded = load i8, i8* %inB, align 1
	; SSE42: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i8, i8* %inB, align 1			; SSE42: LV: Found an estimated cost of 23 for VF 16 For instruction: %valB.loaded = load i8, i8* %inB, align 1
	;			;
	; AVX1: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i8, i8* %inB, align 1			; AVX1: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i8, i8* %inB, align 1
	; AVX1: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i8, i8* %inB, align 1			; AVX1: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i8, i8* %inB, align 1
	; AVX1: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i8, i8* %inB, align 1			; AVX1: LV: Found an estimated cost of 4 for VF 4 For instruction: %valB.loaded = load i8, i8* %inB, align 1
	; AVX1: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i8, i8* %inB, align 1			; AVX1: LV: Found an estimated cost of 8 for VF 8 For instruction: %valB.loaded = load i8, i8* %inB, align 1
	; AVX1: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i8, i8* %inB, align 1			; AVX1: LV: Found an estimated cost of 16 for VF 16 For instruction: %valB.loaded = load i8, i8* %inB, align 1
	; AVX1: LV: Found an estimated cost of 3000000 for VF 32 For instruction: %valB.loaded = load i8, i8* %inB, align 1			; AVX1: LV: Found an estimated cost of 33 for VF 32 For instruction: %valB.loaded = load i8, i8* %inB, align 1
	;			;
	; AVX2-SLOWGATHER: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i8, i8* %inB, align 1			; AVX2-SLOWGATHER: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i8, i8* %inB, align 1
	; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i8, i8* %inB, align 1			; AVX2-SLOWGATHER: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i8, i8* %inB, align 1
	; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i8, i8* %inB, align 1			; AVX2-SLOWGATHER: LV: Found an estimated cost of 4 for VF 4 For instruction: %valB.loaded = load i8, i8* %inB, align 1
	; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i8, i8* %inB, align 1			; AVX2-SLOWGATHER: LV: Found an estimated cost of 8 for VF 8 For instruction: %valB.loaded = load i8, i8* %inB, align 1
	; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i8, i8* %inB, align 1			; AVX2-SLOWGATHER: LV: Found an estimated cost of 16 for VF 16 For instruction: %valB.loaded = load i8, i8* %inB, align 1
	; AVX2-SLOWGATHER: LV: Found an estimated cost of 3000000 for VF 32 For instruction: %valB.loaded = load i8, i8* %inB, align 1			; AVX2-SLOWGATHER: LV: Found an estimated cost of 33 for VF 32 For instruction: %valB.loaded = load i8, i8* %inB, align 1
	;			;
	; AVX2-FASTGATHER: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i8, i8* %inB, align 1			; AVX2-FASTGATHER: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i8, i8* %inB, align 1
	; AVX2-FASTGATHER: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %valB.loaded = load i8, i8* %inB, align 1			; AVX2-FASTGATHER: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i8, i8* %inB, align 1
	; AVX2-FASTGATHER: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %valB.loaded = load i8, i8* %inB, align 1			; AVX2-FASTGATHER: LV: Found an estimated cost of 4 for VF 4 For instruction: %valB.loaded = load i8, i8* %inB, align 1
	; AVX2-FASTGATHER: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %valB.loaded = load i8, i8* %inB, align 1			; AVX2-FASTGATHER: LV: Found an estimated cost of 8 for VF 8 For instruction: %valB.loaded = load i8, i8* %inB, align 1
	; AVX2-FASTGATHER: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %valB.loaded = load i8, i8* %inB, align 1			; AVX2-FASTGATHER: LV: Found an estimated cost of 16 for VF 16 For instruction: %valB.loaded = load i8, i8* %inB, align 1
	; AVX2-FASTGATHER: LV: Found an estimated cost of 3000000 for VF 32 For instruction: %valB.loaded = load i8, i8* %inB, align 1			; AVX2-FASTGATHER: LV: Found an estimated cost of 33 for VF 32 For instruction: %valB.loaded = load i8, i8* %inB, align 1
	;			;
	; AVX512: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i8, i8* %inB, align 1			; AVX512: LV: Found an estimated cost of 1 for VF 1 For instruction: %valB.loaded = load i8, i8* %inB, align 1
	; AVX512: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i8, i8* %inB, align 1			; AVX512: LV: Found an estimated cost of 2 for VF 2 For instruction: %valB.loaded = load i8, i8* %inB, align 1
	; AVX512: LV: Found an estimated cost of 2 for VF 4 For instruction: %valB.loaded = load i8, i8* %inB, align 1			; AVX512: LV: Found an estimated cost of 2 for VF 4 For instruction: %valB.loaded = load i8, i8* %inB, align 1
	; AVX512: LV: Found an estimated cost of 2 for VF 8 For instruction: %valB.loaded = load i8, i8* %inB, align 1			; AVX512: LV: Found an estimated cost of 2 for VF 8 For instruction: %valB.loaded = load i8, i8* %inB, align 1
	; AVX512: LV: Found an estimated cost of 1 for VF 16 For instruction: %valB.loaded = load i8, i8* %inB, align 1			; AVX512: LV: Found an estimated cost of 1 for VF 16 For instruction: %valB.loaded = load i8, i8* %inB, align 1
	; AVX512: LV: Found an estimated cost of 1 for VF 32 For instruction: %valB.loaded = load i8, i8* %inB, align 1			; AVX512: LV: Found an estimated cost of 1 for VF 32 For instruction: %valB.loaded = load i8, i8* %inB, align 1
	; AVX512: LV: Found an estimated cost of 1 for VF 64 For instruction: %valB.loaded = load i8, i8* %inB, align 1			; AVX512: LV: Found an estimated cost of 1 for VF 64 For instruction: %valB.loaded = load i8, i8* %inB, align 1
	Show All 34 Lines

llvm/test/Transforms/LoopVectorize/AArch64/tail-fold-uniform-memops.ll

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -loop-vectorize -scalable-vectorization=off -force-vector-width=4 -prefer-predicate-over-epilogue=predicate-dont-vectorize -S < %s \| FileCheck %s			; RUN: opt -loop-vectorize -scalable-vectorization=off -force-vector-width=4 -prefer-predicate-over-epilogue=predicate-dont-vectorize -S < %s \| FileCheck %s

	; NOTE: These tests aren't really target-specific, but it's convenient to target AArch64			; NOTE: These tests aren't really target-specific, but it's convenient to target AArch64
	; so that TTI.isLegalMaskedLoad can return true.			; so that TTI.isLegalMaskedLoad can return true.

	target triple = "aarch64-linux-gnu"			target triple = "aarch64-linux-gnu"

	; The original loop had an unconditional uniform load. Let's make sure			; The original loop had an unconditional uniform load. Let's make sure
	; we don't artificially create new predicated blocks for the load.			; we don't artificially create new predicated blocks for the load.
	define void @uniform_load(i32* noalias %dst, i32* noalias readonly %src, i64 %n) #0 {			define void @uniform_load(i32* noalias %dst, i32* noalias readonly %src, i64 %n) #0 {
	; CHECK-LABEL: @uniform_load(			; CHECK-LABEL: @uniform_load(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; CHECK: vector.ph:
				; CHECK-NEXT: [[N_RND_UP:%.]] = add i64 [[N:%.]], 3
				; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], 4
				; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]
				; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[IDX:%.]] = phi i64 [ 0, %vector.ph ], [ [[IDX_NEXT:%.]], %vector.body ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP3:%.*]] = add i64 [[IDX]], 0			; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
	; CHECK-NEXT: [[LOOP_PRED:%.*]] = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i64(i64 [[TMP3]], i64 %n)			; CHECK-NEXT: [[ACTIVE_LANE_MASK:%.*]] = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i64(i64 [[TMP0]], i64 [[N]])
	; CHECK-NEXT: [[LOAD_VAL:%.]] = load i32, i32 %src, align 4			; CHECK-NEXT: [[TMP1:%.]] = load i32, i32 [[SRC:%.*]], align 4
	; CHECK-NOT: load i32, i32* %src, align 4			; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x i32> poison, i32 [[TMP1]], i32 0
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x i32> poison, i32 [[LOAD_VAL]], i32 0			; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT]], <4 x i32> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP4]], <4 x i32> poison, <4 x i32> zeroinitializer			; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[DST:%.*]], i64 [[TMP0]]
	; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 %dst, i64 [[TMP3]]			; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[TMP2]], i32 0
	; CHECK-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 [[TMP6]], i32 0			; CHECK-NEXT: [[TMP4:%.]] = bitcast i32 [[TMP3]] to <4 x i32>*
	; CHECK-NEXT: [[STORE_PTR:%.]] = bitcast i32 [[TMP7]] to <4 x i32>*			; CHECK-NEXT: call void @llvm.masked.store.v4i32.p0v4i32(<4 x i32> [[BROADCAST_SPLAT]], <4 x i32>* [[TMP4]], i32 4, <4 x i1> [[ACTIVE_LANE_MASK]])
	; CHECK-NEXT: call void @llvm.masked.store.v4i32.p0v4i32(<4 x i32> [[TMP5]], <4 x i32>* [[STORE_PTR]], i32 4, <4 x i1> [[LOOP_PRED]])			; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 4
	; CHECK-NEXT: [[IDX_NEXT]] = add i64 [[IDX]], 4			; CHECK-NEXT: [[TMP5:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: [[CMP:%.*]] = icmp eq i64 [[IDX_NEXT]], %n.vec			; CHECK-NEXT: br i1 [[TMP5]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
	; CHECK-NEXT: br i1 [[CMP]], label %middle.block, label %vector.body			; CHECK: middle.block:
				; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]
				; CHECK: scalar.ph:
				; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
				; CHECK-NEXT: br label [[FOR_BODY:%.*]]
				; CHECK: for.body:
				; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]
				; CHECK-NEXT: [[VAL:%.]] = load i32, i32 [[SRC]], align 4
				; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 [[INDVARS_IV]]
				; CHECK-NEXT: store i32 [[VAL]], i32* [[ARRAYIDX]], align 4
				; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
				; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[N]]
				; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP2:![0-9]+]]
				; CHECK: for.end:
				; CHECK-NEXT: ret void
				;

	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %entry, %for.body			for.body: ; preds = %entry, %for.body
	%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]			%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
	%val = load i32, i32* %src, align 4			%val = load i32, i32* %src, align 4
	%arrayidx = getelementptr inbounds i32, i32* %dst, i64 %indvars.iv			%arrayidx = getelementptr inbounds i32, i32* %dst, i64 %indvars.iv
	store i32 %val, i32* %arrayidx, align 4			store i32 %val, i32* %arrayidx, align 4
	%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1			%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
	%exitcond.not = icmp eq i64 %indvars.iv.next, %n			%exitcond.not = icmp eq i64 %indvars.iv.next, %n
	br i1 %exitcond.not, label %for.end, label %for.body			br i1 %exitcond.not, label %for.end, label %for.body

	for.end: ; preds = %for.body, %entry			for.end: ; preds = %for.body, %entry
	ret void			ret void
	}			}

	; The original loop had a conditional uniform load. In this case we actually			; The original loop had a conditional uniform load. In this case we actually
	; do need to perform conditional loads and so we end up using a gather instead.			; do need to perform conditional loads and so we end up using a gather instead.
	; However, we at least ensure the mask is the overlap of the loop predicate			; However, we at least ensure the mask is the overlap of the loop predicate
	; and the original condition.			; and the original condition.
	define void @cond_uniform_load(i32* nocapture %dst, i32* nocapture readonly %src, i32* nocapture readonly %cond, i64 %n) #0 {			define void @cond_uniform_load(i32* nocapture %dst, i32* nocapture readonly %src, i32* nocapture readonly %cond, i64 %n) #0 {
	; CHECK-LABEL: @cond_uniform_load(			; CHECK-LABEL: @cond_uniform_load(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[DST1:%.]] = bitcast i32 [[DST:%.]] to i8
				; CHECK-NEXT: [[COND3:%.]] = bitcast i32 [[COND:%.]] to i8
				; CHECK-NEXT: [[SRC6:%.]] = bitcast i32 [[SRC:%.]] to i8
				; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]
				; CHECK: vector.memcheck:
				; CHECK-NEXT: [[SCEVGEP:%.]] = getelementptr i32, i32 [[DST]], i64 [[N:%.*]]
				; CHECK-NEXT: [[SCEVGEP2:%.]] = bitcast i32 [[SCEVGEP]] to i8*
				; CHECK-NEXT: [[SCEVGEP4:%.]] = getelementptr i32, i32 [[COND]], i64 [[N]]
				; CHECK-NEXT: [[SCEVGEP45:%.]] = bitcast i32 [[SCEVGEP4]] to i8*
				; CHECK-NEXT: [[SCEVGEP7:%.]] = getelementptr i32, i32 [[SRC]], i64 1
				; CHECK-NEXT: [[SCEVGEP78:%.]] = bitcast i32 [[SCEVGEP7]] to i8*
				; CHECK-NEXT: [[BOUND0:%.]] = icmp ult i8 [[DST1]], [[SCEVGEP45]]
				; CHECK-NEXT: [[BOUND1:%.]] = icmp ult i8 [[COND3]], [[SCEVGEP2]]
				; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
				; CHECK-NEXT: [[BOUND09:%.]] = icmp ult i8 [[DST1]], [[SCEVGEP78]]
				; CHECK-NEXT: [[BOUND110:%.]] = icmp ult i8 [[SRC6]], [[SCEVGEP2]]
				; CHECK-NEXT: [[FOUND_CONFLICT11:%.*]] = and i1 [[BOUND09]], [[BOUND110]]
				; CHECK-NEXT: [[CONFLICT_RDX:%.*]] = or i1 [[FOUND_CONFLICT]], [[FOUND_CONFLICT11]]
				; CHECK-NEXT: br i1 [[CONFLICT_RDX]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK: [[TMP1:%.]] = insertelement <4 x i32> poison, i32* %src, i32 0			; CHECK-NEXT: [[N_RND_UP:%.*]] = add i64 [[N]], 3
	; CHECK-NEXT: [[SRC_SPLAT:%.]] = shufflevector <4 x i32> [[TMP1]], <4 x i32*> poison, <4 x i32> zeroinitializer			; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], 4
				; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]
				; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[IDX:%.]] = phi i64 [ 0, %vector.ph ], [ [[IDX_NEXT:%.]], %vector.body ]			; CHECK-NEXT: [[INDEX12:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT19:%.]], [[PRED_LOAD_CONTINUE18:%.*]] ]
	; CHECK-NEXT: [[TMP3:%.*]] = add i64 [[IDX]], 0			; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX12]], 0
	; CHECK-NEXT: [[LOOP_PRED:%.*]] = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i64(i64 [[TMP3]], i64 %n)			; CHECK-NEXT: [[ACTIVE_LANE_MASK:%.*]] = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i64(i64 [[TMP0]], i64 [[N]])
	; CHECK: [[COND_LOAD:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> {{%.*}}, i32 4, <4 x i1> [[LOOP_PRED]], <4 x i32> poison)			; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, i32 [[COND]], i64 [[TMP0]]
	; CHECK-NEXT: [[TMP4:%.*]] = icmp eq <4 x i32> [[COND_LOAD]], zeroinitializer			; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i32 0
				; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[TMP2]] to <4 x i32>*
				; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP3]], i32 4, <4 x i1> [[ACTIVE_LANE_MASK]], <4 x i32> poison), !alias.scope !4
				; CHECK-NEXT: [[TMP4:%.*]] = icmp eq <4 x i32> [[WIDE_MASKED_LOAD]], zeroinitializer
	; CHECK-NEXT: [[TMP5:%.*]] = xor <4 x i1> [[TMP4]], <i1 true, i1 true, i1 true, i1 true>			; CHECK-NEXT: [[TMP5:%.*]] = xor <4 x i1> [[TMP4]], <i1 true, i1 true, i1 true, i1 true>
	; CHECK-NEXT: [[MASK:%.*]] = select <4 x i1> [[LOOP_PRED]], <4 x i1> [[TMP5]], <4 x i1> zeroinitializer			; CHECK-NEXT: [[TMP6:%.*]] = select <4 x i1> [[ACTIVE_LANE_MASK]], <4 x i1> [[TMP5]], <4 x i1> zeroinitializer
	; CHECK-NEXT: call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32(<4 x i32*> [[SRC_SPLAT]], i32 4, <4 x i1> [[MASK]], <4 x i32> undef)			; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x i1> [[TMP6]], i32 0
				; CHECK-NEXT: br i1 [[TMP7]], label [[PRED_LOAD_IF:%.]], label [[PRED_LOAD_CONTINUE:%.]]
				; CHECK: pred.load.if:
				; CHECK-NEXT: [[TMP8:%.]] = load i32, i32 [[SRC]], align 4, !alias.scope !7
				; CHECK-NEXT: [[TMP9:%.*]] = insertelement <4 x i32> poison, i32 [[TMP8]], i32 0
				; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE]]
				; CHECK: pred.load.continue:
				; CHECK-NEXT: [[TMP10:%.*]] = phi <4 x i32> [ poison, [[VECTOR_BODY]] ], [ [[TMP9]], [[PRED_LOAD_IF]] ]
				; CHECK-NEXT: [[TMP11:%.*]] = extractelement <4 x i1> [[TMP6]], i32 1
				; CHECK-NEXT: br i1 [[TMP11]], label [[PRED_LOAD_IF13:%.]], label [[PRED_LOAD_CONTINUE14:%.]]
				; CHECK: pred.load.if13:
				; CHECK-NEXT: [[TMP12:%.]] = load i32, i32 [[SRC]], align 4, !alias.scope !7
				; CHECK-NEXT: [[TMP13:%.*]] = insertelement <4 x i32> [[TMP10]], i32 [[TMP12]], i32 1
				; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE14]]
				; CHECK: pred.load.continue14:
				; CHECK-NEXT: [[TMP14:%.*]] = phi <4 x i32> [ [[TMP10]], [[PRED_LOAD_CONTINUE]] ], [ [[TMP13]], [[PRED_LOAD_IF13]] ]
				; CHECK-NEXT: [[TMP15:%.*]] = extractelement <4 x i1> [[TMP6]], i32 2
				; CHECK-NEXT: br i1 [[TMP15]], label [[PRED_LOAD_IF15:%.]], label [[PRED_LOAD_CONTINUE16:%.]]
				; CHECK: pred.load.if15:
				; CHECK-NEXT: [[TMP16:%.]] = load i32, i32 [[SRC]], align 4, !alias.scope !7
				; CHECK-NEXT: [[TMP17:%.*]] = insertelement <4 x i32> [[TMP14]], i32 [[TMP16]], i32 2
				; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE16]]
				; CHECK: pred.load.continue16:
				; CHECK-NEXT: [[TMP18:%.*]] = phi <4 x i32> [ [[TMP14]], [[PRED_LOAD_CONTINUE14]] ], [ [[TMP17]], [[PRED_LOAD_IF15]] ]
				; CHECK-NEXT: [[TMP19:%.*]] = extractelement <4 x i1> [[TMP6]], i32 3
				; CHECK-NEXT: br i1 [[TMP19]], label [[PRED_LOAD_IF17:%.*]], label [[PRED_LOAD_CONTINUE18]]
				; CHECK: pred.load.if17:
				; CHECK-NEXT: [[TMP20:%.]] = load i32, i32 [[SRC]], align 4, !alias.scope !7
				; CHECK-NEXT: [[TMP21:%.*]] = insertelement <4 x i32> [[TMP18]], i32 [[TMP20]], i32 3
				; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE18]]
				; CHECK: pred.load.continue18:
				; CHECK-NEXT: [[TMP22:%.*]] = phi <4 x i32> [ [[TMP18]], [[PRED_LOAD_CONTINUE16]] ], [ [[TMP21]], [[PRED_LOAD_IF17]] ]
				; CHECK-NEXT: [[TMP23:%.*]] = select <4 x i1> [[ACTIVE_LANE_MASK]], <4 x i1> [[TMP4]], <4 x i1> zeroinitializer
				; CHECK-NEXT: [[PREDPHI:%.*]] = select <4 x i1> [[TMP23]], <4 x i32> zeroinitializer, <4 x i32> [[TMP22]]
				; CHECK-NEXT: [[TMP24:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 [[TMP0]]
				; CHECK-NEXT: [[TMP25:%.*]] = or <4 x i1> [[TMP6]], [[TMP23]]
				; CHECK-NEXT: [[TMP26:%.]] = getelementptr inbounds i32, i32 [[TMP24]], i32 0
				; CHECK-NEXT: [[TMP27:%.]] = bitcast i32 [[TMP26]] to <4 x i32>*
				; CHECK-NEXT: call void @llvm.masked.store.v4i32.p0v4i32(<4 x i32> [[PREDPHI]], <4 x i32>* [[TMP27]], i32 4, <4 x i1> [[TMP25]]), !alias.scope !9, !noalias !11
				; CHECK-NEXT: [[INDEX_NEXT19]] = add i64 [[INDEX12]], 4
				; CHECK-NEXT: [[TMP28:%.*]] = icmp eq i64 [[INDEX_NEXT19]], [[N_VEC]]
				; CHECK-NEXT: br i1 [[TMP28]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP12:![0-9]+]]
				; CHECK: middle.block:
				; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]
				; CHECK: scalar.ph:
				; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ], [ 0, [[VECTOR_MEMCHECK]] ]
				; CHECK-NEXT: br label [[FOR_BODY:%.*]]
				; CHECK: for.body:
				; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ [[INDEX_NEXT:%.]], [[IF_END:%.*]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
				; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[COND]], i64 [[INDEX]]
				; CHECK-NEXT: [[TMP29:%.]] = load i32, i32 [[ARRAYIDX]], align 4
				; CHECK-NEXT: [[TOBOOL_NOT:%.*]] = icmp eq i32 [[TMP29]], 0
				; CHECK-NEXT: br i1 [[TOBOOL_NOT]], label [[IF_END]], label [[IF_THEN:%.*]]
				; CHECK: if.then:
				; CHECK-NEXT: [[TMP30:%.]] = load i32, i32 [[SRC]], align 4
				; CHECK-NEXT: br label [[IF_END]]
				; CHECK: if.end:
				; CHECK-NEXT: [[VAL_0:%.*]] = phi i32 [ [[TMP30]], [[IF_THEN]] ], [ 0, [[FOR_BODY]] ]
				; CHECK-NEXT: [[ARRAYIDX1:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 [[INDEX]]
				; CHECK-NEXT: store i32 [[VAL_0]], i32* [[ARRAYIDX1]], align 4
				; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 1
				; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N]]
				; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP13:![0-9]+]]
				; CHECK: for.end:
				; CHECK-NEXT: ret void
				;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %entry, %if.end			for.body: ; preds = %entry, %if.end
	%index = phi i64 [ %index.next, %if.end ], [ 0, %entry ]			%index = phi i64 [ %index.next, %if.end ], [ 0, %entry ]
	%arrayidx = getelementptr inbounds i32, i32* %cond, i64 %index			%arrayidx = getelementptr inbounds i32, i32* %cond, i64 %index
	%0 = load i32, i32* %arrayidx, align 4			%0 = load i32, i32* %arrayidx, align 4
	%tobool.not = icmp eq i32 %0, 0			%tobool.not = icmp eq i32 %0, 0
	Show All 19 Lines

llvm/test/Transforms/LoopVectorize/X86/gather_scatter.ll

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show All 19 Lines
	;}			;}

	; Function Attrs: nounwind uwtable			; Function Attrs: nounwind uwtable
	define void @foo1(float* noalias %in, float* noalias %out, i32* noalias %trigger, i32* noalias %index) {			define void @foo1(float* noalias %in, float* noalias %out, i32* noalias %trigger, i32* noalias %index) {
	; AVX512-LABEL: @foo1(			; AVX512-LABEL: @foo1(
	; AVX512-NEXT: iter.check:			; AVX512-NEXT: iter.check:
	; AVX512-NEXT: br label [[VECTOR_BODY:%.*]]			; AVX512-NEXT: br label [[VECTOR_BODY:%.*]]
	; AVX512: vector.body:			; AVX512: vector.body:
	; AVX512-NEXT: [[INDEX8:%.]] = phi i64 [ 0, [[ITER_CHECK:%.]] ], [ [[INDEX_NEXT_3:%.*]], [[VECTOR_BODY]] ]			; AVX512-NEXT: [[INDEX7:%.]] = phi i64 [ 0, [[ITER_CHECK:%.]] ], [ [[INDEX_NEXT_3:%.*]], [[VECTOR_BODY]] ]
	; AVX512-NEXT: [[TMP0:%.]] = getelementptr inbounds i32, i32 [[TRIGGER:%.*]], i64 [[INDEX8]]			; AVX512-NEXT: [[TMP0:%.]] = getelementptr inbounds i32, i32 [[TRIGGER:%.*]], i64 [[INDEX7]]
	; AVX512-NEXT: [[TMP1:%.]] = bitcast i32 [[TMP0]] to <16 x i32>*			; AVX512-NEXT: [[TMP1:%.]] = bitcast i32 [[TMP0]] to <16 x i32>*
	; AVX512-NEXT: [[WIDE_LOAD:%.]] = load <16 x i32>, <16 x i32> [[TMP1]], align 4			; AVX512-NEXT: [[WIDE_LOAD:%.]] = load <16 x i32>, <16 x i32> [[TMP1]], align 4
	; AVX512-NEXT: [[TMP2:%.*]] = icmp sgt <16 x i32> [[WIDE_LOAD]], zeroinitializer			; AVX512-NEXT: [[TMP2:%.*]] = icmp sgt <16 x i32> [[WIDE_LOAD]], zeroinitializer
	; AVX512-NEXT: [[TMP3:%.]] = getelementptr i32, i32 [[INDEX:%.*]], i64 [[INDEX8]]			; AVX512-NEXT: [[TMP3:%.]] = getelementptr i32, i32 [[INDEX:%.*]], i64 [[INDEX7]]
	; AVX512-NEXT: [[TMP4:%.]] = bitcast i32 [[TMP3]] to <16 x i32>*			; AVX512-NEXT: [[TMP4:%.]] = bitcast i32 [[TMP3]] to <16 x i32>*
	; AVX512-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <16 x i32> @llvm.masked.load.v16i32.p0v16i32(<16 x i32> [[TMP4]], i32 4, <16 x i1> [[TMP2]], <16 x i32> poison)			; AVX512-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <16 x i32> @llvm.masked.load.v16i32.p0v16i32(<16 x i32> [[TMP4]], i32 4, <16 x i1> [[TMP2]], <16 x i32> poison)
	; AVX512-NEXT: [[TMP5:%.*]] = sext <16 x i32> [[WIDE_MASKED_LOAD]] to <16 x i64>			; AVX512-NEXT: [[TMP5:%.*]] = sext <16 x i32> [[WIDE_MASKED_LOAD]] to <16 x i64>
	; AVX512-NEXT: [[TMP6:%.]] = getelementptr inbounds float, float [[IN:%.*]], <16 x i64> [[TMP5]]			; AVX512-NEXT: [[TMP6:%.]] = getelementptr inbounds float, float [[IN:%.*]], <16 x i64> [[TMP5]]
	; AVX512-NEXT: [[WIDE_MASKED_GATHER:%.]] = call <16 x float> @llvm.masked.gather.v16f32.v16p0f32(<16 x float> [[TMP6]], i32 4, <16 x i1> [[TMP2]], <16 x float> undef)			; AVX512-NEXT: [[WIDE_MASKED_GATHER:%.]] = call <16 x float> @llvm.masked.gather.v16f32.v16p0f32(<16 x float> [[TMP6]], i32 4, <16 x i1> [[TMP2]], <16 x float> undef)
	; AVX512-NEXT: [[TMP7:%.*]] = fadd <16 x float> [[WIDE_MASKED_GATHER]], <float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01>			; AVX512-NEXT: [[TMP7:%.*]] = fadd <16 x float> [[WIDE_MASKED_GATHER]], <float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01>
	; AVX512-NEXT: [[TMP8:%.]] = getelementptr float, float [[OUT:%.*]], i64 [[INDEX8]]			; AVX512-NEXT: [[TMP8:%.]] = getelementptr float, float [[OUT:%.*]], i64 [[INDEX7]]
	; AVX512-NEXT: [[TMP9:%.]] = bitcast float [[TMP8]] to <16 x float>*			; AVX512-NEXT: [[TMP9:%.]] = bitcast float [[TMP8]] to <16 x float>*
	; AVX512-NEXT: call void @llvm.masked.store.v16f32.p0v16f32(<16 x float> [[TMP7]], <16 x float>* [[TMP9]], i32 4, <16 x i1> [[TMP2]])			; AVX512-NEXT: call void @llvm.masked.store.v16f32.p0v16f32(<16 x float> [[TMP7]], <16 x float>* [[TMP9]], i32 4, <16 x i1> [[TMP2]])
	; AVX512-NEXT: [[INDEX_NEXT:%.*]] = or i64 [[INDEX8]], 16			; AVX512-NEXT: [[INDEX_NEXT:%.*]] = or i64 [[INDEX7]], 16
	; AVX512-NEXT: [[TMP10:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[INDEX_NEXT]]			; AVX512-NEXT: [[TMP10:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[INDEX_NEXT]]
	; AVX512-NEXT: [[TMP11:%.]] = bitcast i32 [[TMP10]] to <16 x i32>*			; AVX512-NEXT: [[TMP11:%.]] = bitcast i32 [[TMP10]] to <16 x i32>*
	; AVX512-NEXT: [[WIDE_LOAD_1:%.]] = load <16 x i32>, <16 x i32> [[TMP11]], align 4			; AVX512-NEXT: [[WIDE_LOAD_1:%.]] = load <16 x i32>, <16 x i32> [[TMP11]], align 4
	; AVX512-NEXT: [[TMP12:%.*]] = icmp sgt <16 x i32> [[WIDE_LOAD_1]], zeroinitializer			; AVX512-NEXT: [[TMP12:%.*]] = icmp sgt <16 x i32> [[WIDE_LOAD_1]], zeroinitializer
	; AVX512-NEXT: [[TMP13:%.]] = getelementptr i32, i32 [[INDEX]], i64 [[INDEX_NEXT]]			; AVX512-NEXT: [[TMP13:%.]] = getelementptr i32, i32 [[INDEX]], i64 [[INDEX_NEXT]]
	; AVX512-NEXT: [[TMP14:%.]] = bitcast i32 [[TMP13]] to <16 x i32>*			; AVX512-NEXT: [[TMP14:%.]] = bitcast i32 [[TMP13]] to <16 x i32>*
	; AVX512-NEXT: [[WIDE_MASKED_LOAD_1:%.]] = call <16 x i32> @llvm.masked.load.v16i32.p0v16i32(<16 x i32> [[TMP14]], i32 4, <16 x i1> [[TMP12]], <16 x i32> poison)			; AVX512-NEXT: [[WIDE_MASKED_LOAD_1:%.]] = call <16 x i32> @llvm.masked.load.v16i32.p0v16i32(<16 x i32> [[TMP14]], i32 4, <16 x i1> [[TMP12]], <16 x i32> poison)
	; AVX512-NEXT: [[TMP15:%.*]] = sext <16 x i32> [[WIDE_MASKED_LOAD_1]] to <16 x i64>			; AVX512-NEXT: [[TMP15:%.*]] = sext <16 x i32> [[WIDE_MASKED_LOAD_1]] to <16 x i64>
	; AVX512-NEXT: [[TMP16:%.]] = getelementptr inbounds float, float [[IN]], <16 x i64> [[TMP15]]			; AVX512-NEXT: [[TMP16:%.]] = getelementptr inbounds float, float [[IN]], <16 x i64> [[TMP15]]
	; AVX512-NEXT: [[WIDE_MASKED_GATHER_1:%.]] = call <16 x float> @llvm.masked.gather.v16f32.v16p0f32(<16 x float> [[TMP16]], i32 4, <16 x i1> [[TMP12]], <16 x float> undef)			; AVX512-NEXT: [[WIDE_MASKED_GATHER_1:%.]] = call <16 x float> @llvm.masked.gather.v16f32.v16p0f32(<16 x float> [[TMP16]], i32 4, <16 x i1> [[TMP12]], <16 x float> undef)
	; AVX512-NEXT: [[TMP17:%.*]] = fadd <16 x float> [[WIDE_MASKED_GATHER_1]], <float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01>			; AVX512-NEXT: [[TMP17:%.*]] = fadd <16 x float> [[WIDE_MASKED_GATHER_1]], <float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01>
	; AVX512-NEXT: [[TMP18:%.]] = getelementptr float, float [[OUT]], i64 [[INDEX_NEXT]]			; AVX512-NEXT: [[TMP18:%.]] = getelementptr float, float [[OUT]], i64 [[INDEX_NEXT]]
	; AVX512-NEXT: [[TMP19:%.]] = bitcast float [[TMP18]] to <16 x float>*			; AVX512-NEXT: [[TMP19:%.]] = bitcast float [[TMP18]] to <16 x float>*
	; AVX512-NEXT: call void @llvm.masked.store.v16f32.p0v16f32(<16 x float> [[TMP17]], <16 x float>* [[TMP19]], i32 4, <16 x i1> [[TMP12]])			; AVX512-NEXT: call void @llvm.masked.store.v16f32.p0v16f32(<16 x float> [[TMP17]], <16 x float>* [[TMP19]], i32 4, <16 x i1> [[TMP12]])
	; AVX512-NEXT: [[INDEX_NEXT_1:%.*]] = or i64 [[INDEX8]], 32			; AVX512-NEXT: [[INDEX_NEXT_1:%.*]] = or i64 [[INDEX7]], 32
	; AVX512-NEXT: [[TMP20:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[INDEX_NEXT_1]]			; AVX512-NEXT: [[TMP20:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[INDEX_NEXT_1]]
	; AVX512-NEXT: [[TMP21:%.]] = bitcast i32 [[TMP20]] to <16 x i32>*			; AVX512-NEXT: [[TMP21:%.]] = bitcast i32 [[TMP20]] to <16 x i32>*
	; AVX512-NEXT: [[WIDE_LOAD_2:%.]] = load <16 x i32>, <16 x i32> [[TMP21]], align 4			; AVX512-NEXT: [[WIDE_LOAD_2:%.]] = load <16 x i32>, <16 x i32> [[TMP21]], align 4
	; AVX512-NEXT: [[TMP22:%.*]] = icmp sgt <16 x i32> [[WIDE_LOAD_2]], zeroinitializer			; AVX512-NEXT: [[TMP22:%.*]] = icmp sgt <16 x i32> [[WIDE_LOAD_2]], zeroinitializer
	; AVX512-NEXT: [[TMP23:%.]] = getelementptr i32, i32 [[INDEX]], i64 [[INDEX_NEXT_1]]			; AVX512-NEXT: [[TMP23:%.]] = getelementptr i32, i32 [[INDEX]], i64 [[INDEX_NEXT_1]]
	; AVX512-NEXT: [[TMP24:%.]] = bitcast i32 [[TMP23]] to <16 x i32>*			; AVX512-NEXT: [[TMP24:%.]] = bitcast i32 [[TMP23]] to <16 x i32>*
	; AVX512-NEXT: [[WIDE_MASKED_LOAD_2:%.]] = call <16 x i32> @llvm.masked.load.v16i32.p0v16i32(<16 x i32> [[TMP24]], i32 4, <16 x i1> [[TMP22]], <16 x i32> poison)			; AVX512-NEXT: [[WIDE_MASKED_LOAD_2:%.]] = call <16 x i32> @llvm.masked.load.v16i32.p0v16i32(<16 x i32> [[TMP24]], i32 4, <16 x i1> [[TMP22]], <16 x i32> poison)
	; AVX512-NEXT: [[TMP25:%.*]] = sext <16 x i32> [[WIDE_MASKED_LOAD_2]] to <16 x i64>			; AVX512-NEXT: [[TMP25:%.*]] = sext <16 x i32> [[WIDE_MASKED_LOAD_2]] to <16 x i64>
	; AVX512-NEXT: [[TMP26:%.]] = getelementptr inbounds float, float [[IN]], <16 x i64> [[TMP25]]			; AVX512-NEXT: [[TMP26:%.]] = getelementptr inbounds float, float [[IN]], <16 x i64> [[TMP25]]
	; AVX512-NEXT: [[WIDE_MASKED_GATHER_2:%.]] = call <16 x float> @llvm.masked.gather.v16f32.v16p0f32(<16 x float> [[TMP26]], i32 4, <16 x i1> [[TMP22]], <16 x float> undef)			; AVX512-NEXT: [[WIDE_MASKED_GATHER_2:%.]] = call <16 x float> @llvm.masked.gather.v16f32.v16p0f32(<16 x float> [[TMP26]], i32 4, <16 x i1> [[TMP22]], <16 x float> undef)
	; AVX512-NEXT: [[TMP27:%.*]] = fadd <16 x float> [[WIDE_MASKED_GATHER_2]], <float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01>			; AVX512-NEXT: [[TMP27:%.*]] = fadd <16 x float> [[WIDE_MASKED_GATHER_2]], <float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01>
	; AVX512-NEXT: [[TMP28:%.]] = getelementptr float, float [[OUT]], i64 [[INDEX_NEXT_1]]			; AVX512-NEXT: [[TMP28:%.]] = getelementptr float, float [[OUT]], i64 [[INDEX_NEXT_1]]
	; AVX512-NEXT: [[TMP29:%.]] = bitcast float [[TMP28]] to <16 x float>*			; AVX512-NEXT: [[TMP29:%.]] = bitcast float [[TMP28]] to <16 x float>*
	; AVX512-NEXT: call void @llvm.masked.store.v16f32.p0v16f32(<16 x float> [[TMP27]], <16 x float>* [[TMP29]], i32 4, <16 x i1> [[TMP22]])			; AVX512-NEXT: call void @llvm.masked.store.v16f32.p0v16f32(<16 x float> [[TMP27]], <16 x float>* [[TMP29]], i32 4, <16 x i1> [[TMP22]])
	; AVX512-NEXT: [[INDEX_NEXT_2:%.*]] = or i64 [[INDEX8]], 48			; AVX512-NEXT: [[INDEX_NEXT_2:%.*]] = or i64 [[INDEX7]], 48
	; AVX512-NEXT: [[TMP30:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[INDEX_NEXT_2]]			; AVX512-NEXT: [[TMP30:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[INDEX_NEXT_2]]
	; AVX512-NEXT: [[TMP31:%.]] = bitcast i32 [[TMP30]] to <16 x i32>*			; AVX512-NEXT: [[TMP31:%.]] = bitcast i32 [[TMP30]] to <16 x i32>*
	; AVX512-NEXT: [[WIDE_LOAD_3:%.]] = load <16 x i32>, <16 x i32> [[TMP31]], align 4			; AVX512-NEXT: [[WIDE_LOAD_3:%.]] = load <16 x i32>, <16 x i32> [[TMP31]], align 4
	; AVX512-NEXT: [[TMP32:%.*]] = icmp sgt <16 x i32> [[WIDE_LOAD_3]], zeroinitializer			; AVX512-NEXT: [[TMP32:%.*]] = icmp sgt <16 x i32> [[WIDE_LOAD_3]], zeroinitializer
	; AVX512-NEXT: [[TMP33:%.]] = getelementptr i32, i32 [[INDEX]], i64 [[INDEX_NEXT_2]]			; AVX512-NEXT: [[TMP33:%.]] = getelementptr i32, i32 [[INDEX]], i64 [[INDEX_NEXT_2]]
	; AVX512-NEXT: [[TMP34:%.]] = bitcast i32 [[TMP33]] to <16 x i32>*			; AVX512-NEXT: [[TMP34:%.]] = bitcast i32 [[TMP33]] to <16 x i32>*
	; AVX512-NEXT: [[WIDE_MASKED_LOAD_3:%.]] = call <16 x i32> @llvm.masked.load.v16i32.p0v16i32(<16 x i32> [[TMP34]], i32 4, <16 x i1> [[TMP32]], <16 x i32> poison)			; AVX512-NEXT: [[WIDE_MASKED_LOAD_3:%.]] = call <16 x i32> @llvm.masked.load.v16i32.p0v16i32(<16 x i32> [[TMP34]], i32 4, <16 x i1> [[TMP32]], <16 x i32> poison)
	; AVX512-NEXT: [[TMP35:%.*]] = sext <16 x i32> [[WIDE_MASKED_LOAD_3]] to <16 x i64>			; AVX512-NEXT: [[TMP35:%.*]] = sext <16 x i32> [[WIDE_MASKED_LOAD_3]] to <16 x i64>
	; AVX512-NEXT: [[TMP36:%.]] = getelementptr inbounds float, float [[IN]], <16 x i64> [[TMP35]]			; AVX512-NEXT: [[TMP36:%.]] = getelementptr inbounds float, float [[IN]], <16 x i64> [[TMP35]]
	; AVX512-NEXT: [[WIDE_MASKED_GATHER_3:%.]] = call <16 x float> @llvm.masked.gather.v16f32.v16p0f32(<16 x float> [[TMP36]], i32 4, <16 x i1> [[TMP32]], <16 x float> undef)			; AVX512-NEXT: [[WIDE_MASKED_GATHER_3:%.]] = call <16 x float> @llvm.masked.gather.v16f32.v16p0f32(<16 x float> [[TMP36]], i32 4, <16 x i1> [[TMP32]], <16 x float> undef)
	; AVX512-NEXT: [[TMP37:%.*]] = fadd <16 x float> [[WIDE_MASKED_GATHER_3]], <float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01>			; AVX512-NEXT: [[TMP37:%.*]] = fadd <16 x float> [[WIDE_MASKED_GATHER_3]], <float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01>
	; AVX512-NEXT: [[TMP38:%.]] = getelementptr float, float [[OUT]], i64 [[INDEX_NEXT_2]]			; AVX512-NEXT: [[TMP38:%.]] = getelementptr float, float [[OUT]], i64 [[INDEX_NEXT_2]]
	; AVX512-NEXT: [[TMP39:%.]] = bitcast float [[TMP38]] to <16 x float>*			; AVX512-NEXT: [[TMP39:%.]] = bitcast float [[TMP38]] to <16 x float>*
	; AVX512-NEXT: call void @llvm.masked.store.v16f32.p0v16f32(<16 x float> [[TMP37]], <16 x float>* [[TMP39]], i32 4, <16 x i1> [[TMP32]])			; AVX512-NEXT: call void @llvm.masked.store.v16f32.p0v16f32(<16 x float> [[TMP37]], <16 x float>* [[TMP39]], i32 4, <16 x i1> [[TMP32]])
	; AVX512-NEXT: [[INDEX_NEXT_3]] = add nuw nsw i64 [[INDEX8]], 64			; AVX512-NEXT: [[INDEX_NEXT_3]] = add nuw nsw i64 [[INDEX7]], 64
	; AVX512-NEXT: [[TMP40:%.*]] = icmp eq i64 [[INDEX_NEXT_3]], 4096			; AVX512-NEXT: [[TMP40:%.*]] = icmp eq i64 [[INDEX_NEXT_3]], 4096
	; AVX512-NEXT: br i1 [[TMP40]], label [[FOR_END:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]			; AVX512-NEXT: br i1 [[TMP40]], label [[FOR_END:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
	; AVX512: for.end:			; AVX512: for.end:
	; AVX512-NEXT: ret void			; AVX512-NEXT: ret void
	;			;
	; FVW2-LABEL: @foo1(			; FVW2-LABEL: @foo1(
	; FVW2-NEXT: entry:			; FVW2-NEXT: entry:
	; FVW2-NEXT: br label [[VECTOR_BODY:%.*]]			; FVW2-NEXT: br label [[VECTOR_BODY:%.*]]
	; FVW2: vector.body:			; FVW2: vector.body:
	; FVW2-NEXT: [[INDEX17:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]			; FVW2-NEXT: [[INDEX7:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDEX_NEXT:%.]], [[PRED_LOAD_CONTINUE27:%.]] ]
	; FVW2-NEXT: [[TMP0:%.]] = getelementptr inbounds i32, i32 [[TRIGGER:%.*]], i64 [[INDEX17]]			; FVW2-NEXT: [[TMP0:%.]] = getelementptr inbounds i32, i32 [[TRIGGER:%.*]], i64 [[INDEX7]]
	; FVW2-NEXT: [[TMP1:%.]] = bitcast i32 [[TMP0]] to <2 x i32>*			; FVW2-NEXT: [[TMP1:%.]] = bitcast i32 [[TMP0]] to <2 x i32>*
	; FVW2-NEXT: [[WIDE_LOAD:%.]] = load <2 x i32>, <2 x i32> [[TMP1]], align 4			; FVW2-NEXT: [[WIDE_LOAD:%.]] = load <2 x i32>, <2 x i32> [[TMP1]], align 4
	; FVW2-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 2			; FVW2-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 2
	; FVW2-NEXT: [[TMP3:%.]] = bitcast i32 [[TMP2]] to <2 x i32>*			; FVW2-NEXT: [[TMP3:%.]] = bitcast i32 [[TMP2]] to <2 x i32>*
	; FVW2-NEXT: [[WIDE_LOAD8:%.]] = load <2 x i32>, <2 x i32> [[TMP3]], align 4			; FVW2-NEXT: [[WIDE_LOAD8:%.]] = load <2 x i32>, <2 x i32> [[TMP3]], align 4
	; FVW2-NEXT: [[TMP4:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 4			; FVW2-NEXT: [[TMP4:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 4
	; FVW2-NEXT: [[TMP5:%.]] = bitcast i32 [[TMP4]] to <2 x i32>*			; FVW2-NEXT: [[TMP5:%.]] = bitcast i32 [[TMP4]] to <2 x i32>*
	; FVW2-NEXT: [[WIDE_LOAD9:%.]] = load <2 x i32>, <2 x i32> [[TMP5]], align 4			; FVW2-NEXT: [[WIDE_LOAD9:%.]] = load <2 x i32>, <2 x i32> [[TMP5]], align 4
	; FVW2-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 6			; FVW2-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 6
	; FVW2-NEXT: [[TMP7:%.]] = bitcast i32 [[TMP6]] to <2 x i32>*			; FVW2-NEXT: [[TMP7:%.]] = bitcast i32 [[TMP6]] to <2 x i32>*
	; FVW2-NEXT: [[WIDE_LOAD10:%.]] = load <2 x i32>, <2 x i32> [[TMP7]], align 4			; FVW2-NEXT: [[WIDE_LOAD10:%.]] = load <2 x i32>, <2 x i32> [[TMP7]], align 4
	; FVW2-NEXT: [[TMP8:%.*]] = icmp sgt <2 x i32> [[WIDE_LOAD]], zeroinitializer			; FVW2-NEXT: [[TMP8:%.*]] = icmp sgt <2 x i32> [[WIDE_LOAD]], zeroinitializer
	; FVW2-NEXT: [[TMP9:%.*]] = icmp sgt <2 x i32> [[WIDE_LOAD8]], zeroinitializer			; FVW2-NEXT: [[TMP9:%.*]] = icmp sgt <2 x i32> [[WIDE_LOAD8]], zeroinitializer
	; FVW2-NEXT: [[TMP10:%.*]] = icmp sgt <2 x i32> [[WIDE_LOAD9]], zeroinitializer			; FVW2-NEXT: [[TMP10:%.*]] = icmp sgt <2 x i32> [[WIDE_LOAD9]], zeroinitializer
	; FVW2-NEXT: [[TMP11:%.*]] = icmp sgt <2 x i32> [[WIDE_LOAD10]], zeroinitializer			; FVW2-NEXT: [[TMP11:%.*]] = icmp sgt <2 x i32> [[WIDE_LOAD10]], zeroinitializer
	; FVW2-NEXT: [[TMP12:%.]] = getelementptr i32, i32 [[INDEX:%.*]], i64 [[INDEX17]]			; FVW2-NEXT: [[TMP12:%.]] = getelementptr i32, i32 [[INDEX:%.*]], i64 [[INDEX7]]
	; FVW2-NEXT: [[TMP13:%.]] = bitcast i32 [[TMP12]] to <2 x i32>*			; FVW2-NEXT: [[TMP13:%.]] = bitcast i32 [[TMP12]] to <2 x i32>*
	; FVW2-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <2 x i32> @llvm.masked.load.v2i32.p0v2i32(<2 x i32> [[TMP13]], i32 4, <2 x i1> [[TMP8]], <2 x i32> poison)			; FVW2-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <2 x i32> @llvm.masked.load.v2i32.p0v2i32(<2 x i32> [[TMP13]], i32 4, <2 x i1> [[TMP8]], <2 x i32> poison)
	; FVW2-NEXT: [[TMP14:%.]] = getelementptr i32, i32 [[TMP12]], i64 2			; FVW2-NEXT: [[TMP14:%.]] = getelementptr i32, i32 [[TMP12]], i64 2
	; FVW2-NEXT: [[TMP15:%.]] = bitcast i32 [[TMP14]] to <2 x i32>*			; FVW2-NEXT: [[TMP15:%.]] = bitcast i32 [[TMP14]] to <2 x i32>*
	; FVW2-NEXT: [[WIDE_MASKED_LOAD11:%.]] = call <2 x i32> @llvm.masked.load.v2i32.p0v2i32(<2 x i32> [[TMP15]], i32 4, <2 x i1> [[TMP9]], <2 x i32> poison)			; FVW2-NEXT: [[WIDE_MASKED_LOAD11:%.]] = call <2 x i32> @llvm.masked.load.v2i32.p0v2i32(<2 x i32> [[TMP15]], i32 4, <2 x i1> [[TMP9]], <2 x i32> poison)
	; FVW2-NEXT: [[TMP16:%.]] = getelementptr i32, i32 [[TMP12]], i64 4			; FVW2-NEXT: [[TMP16:%.]] = getelementptr i32, i32 [[TMP12]], i64 4
	; FVW2-NEXT: [[TMP17:%.]] = bitcast i32 [[TMP16]] to <2 x i32>*			; FVW2-NEXT: [[TMP17:%.]] = bitcast i32 [[TMP16]] to <2 x i32>*
	; FVW2-NEXT: [[WIDE_MASKED_LOAD12:%.]] = call <2 x i32> @llvm.masked.load.v2i32.p0v2i32(<2 x i32> [[TMP17]], i32 4, <2 x i1> [[TMP10]], <2 x i32> poison)			; FVW2-NEXT: [[WIDE_MASKED_LOAD12:%.]] = call <2 x i32> @llvm.masked.load.v2i32.p0v2i32(<2 x i32> [[TMP17]], i32 4, <2 x i1> [[TMP10]], <2 x i32> poison)
	; FVW2-NEXT: [[TMP18:%.]] = getelementptr i32, i32 [[TMP12]], i64 6			; FVW2-NEXT: [[TMP18:%.]] = getelementptr i32, i32 [[TMP12]], i64 6
	; FVW2-NEXT: [[TMP19:%.]] = bitcast i32 [[TMP18]] to <2 x i32>*			; FVW2-NEXT: [[TMP19:%.]] = bitcast i32 [[TMP18]] to <2 x i32>*
	; FVW2-NEXT: [[WIDE_MASKED_LOAD13:%.]] = call <2 x i32> @llvm.masked.load.v2i32.p0v2i32(<2 x i32> [[TMP19]], i32 4, <2 x i1> [[TMP11]], <2 x i32> poison)			; FVW2-NEXT: [[WIDE_MASKED_LOAD13:%.]] = call <2 x i32> @llvm.masked.load.v2i32.p0v2i32(<2 x i32> [[TMP19]], i32 4, <2 x i1> [[TMP11]], <2 x i32> poison)
	; FVW2-NEXT: [[TMP20:%.*]] = sext <2 x i32> [[WIDE_MASKED_LOAD]] to <2 x i64>			; FVW2-NEXT: [[TMP20:%.*]] = sext <2 x i32> [[WIDE_MASKED_LOAD]] to <2 x i64>
	; FVW2-NEXT: [[TMP21:%.*]] = sext <2 x i32> [[WIDE_MASKED_LOAD11]] to <2 x i64>			; FVW2-NEXT: [[TMP21:%.*]] = sext <2 x i32> [[WIDE_MASKED_LOAD11]] to <2 x i64>
	; FVW2-NEXT: [[TMP22:%.*]] = sext <2 x i32> [[WIDE_MASKED_LOAD12]] to <2 x i64>			; FVW2-NEXT: [[TMP22:%.*]] = sext <2 x i32> [[WIDE_MASKED_LOAD12]] to <2 x i64>
	; FVW2-NEXT: [[TMP23:%.*]] = sext <2 x i32> [[WIDE_MASKED_LOAD13]] to <2 x i64>			; FVW2-NEXT: [[TMP23:%.*]] = sext <2 x i32> [[WIDE_MASKED_LOAD13]] to <2 x i64>
	; FVW2-NEXT: [[TMP24:%.]] = getelementptr inbounds float, float [[IN:%.*]], <2 x i64> [[TMP20]]			; FVW2-NEXT: [[TMP24:%.*]] = extractelement <2 x i1> [[TMP8]], i64 0
	; FVW2-NEXT: [[TMP25:%.]] = getelementptr inbounds float, float [[IN]], <2 x i64> [[TMP21]]			; FVW2-NEXT: br i1 [[TMP24]], label [[PRED_LOAD_IF:%.]], label [[PRED_LOAD_CONTINUE:%.]]
	; FVW2-NEXT: [[TMP26:%.]] = getelementptr inbounds float, float [[IN]], <2 x i64> [[TMP22]]			; FVW2: pred.load.if:
	; FVW2-NEXT: [[TMP27:%.]] = getelementptr inbounds float, float [[IN]], <2 x i64> [[TMP23]]			; FVW2-NEXT: [[TMP25:%.*]] = extractelement <2 x i64> [[TMP20]], i64 0
	; FVW2-NEXT: [[WIDE_MASKED_GATHER:%.]] = call <2 x float> @llvm.masked.gather.v2f32.v2p0f32(<2 x float> [[TMP24]], i32 4, <2 x i1> [[TMP8]], <2 x float> undef)			; FVW2-NEXT: [[TMP26:%.]] = getelementptr inbounds float, float [[IN:%.*]], i64 [[TMP25]]
	; FVW2-NEXT: [[WIDE_MASKED_GATHER14:%.]] = call <2 x float> @llvm.masked.gather.v2f32.v2p0f32(<2 x float> [[TMP25]], i32 4, <2 x i1> [[TMP9]], <2 x float> undef)			; FVW2-NEXT: [[TMP27:%.]] = load float, float [[TMP26]], align 4
	; FVW2-NEXT: [[WIDE_MASKED_GATHER15:%.]] = call <2 x float> @llvm.masked.gather.v2f32.v2p0f32(<2 x float> [[TMP26]], i32 4, <2 x i1> [[TMP10]], <2 x float> undef)			; FVW2-NEXT: [[TMP28:%.*]] = insertelement <2 x float> poison, float [[TMP27]], i64 0
	; FVW2-NEXT: [[WIDE_MASKED_GATHER16:%.]] = call <2 x float> @llvm.masked.gather.v2f32.v2p0f32(<2 x float> [[TMP27]], i32 4, <2 x i1> [[TMP11]], <2 x float> undef)			; FVW2-NEXT: br label [[PRED_LOAD_CONTINUE]]
	; FVW2-NEXT: [[TMP28:%.*]] = fadd <2 x float> [[WIDE_MASKED_GATHER]], <float 5.000000e-01, float 5.000000e-01>			; FVW2: pred.load.continue:
	; FVW2-NEXT: [[TMP29:%.*]] = fadd <2 x float> [[WIDE_MASKED_GATHER14]], <float 5.000000e-01, float 5.000000e-01>			; FVW2-NEXT: [[TMP29:%.*]] = phi <2 x float> [ poison, [[VECTOR_BODY]] ], [ [[TMP28]], [[PRED_LOAD_IF]] ]
	; FVW2-NEXT: [[TMP30:%.*]] = fadd <2 x float> [[WIDE_MASKED_GATHER15]], <float 5.000000e-01, float 5.000000e-01>			; FVW2-NEXT: [[TMP30:%.*]] = extractelement <2 x i1> [[TMP8]], i64 1
	; FVW2-NEXT: [[TMP31:%.*]] = fadd <2 x float> [[WIDE_MASKED_GATHER16]], <float 5.000000e-01, float 5.000000e-01>			; FVW2-NEXT: br i1 [[TMP30]], label [[PRED_LOAD_IF14:%.]], label [[PRED_LOAD_CONTINUE15:%.]]
	; FVW2-NEXT: [[TMP32:%.]] = getelementptr float, float [[OUT:%.*]], i64 [[INDEX17]]			; FVW2: pred.load.if14:
	; FVW2-NEXT: [[TMP33:%.]] = bitcast float [[TMP32]] to <2 x float>*			; FVW2-NEXT: [[TMP31:%.*]] = extractelement <2 x i64> [[TMP20]], i64 1
	; FVW2-NEXT: call void @llvm.masked.store.v2f32.p0v2f32(<2 x float> [[TMP28]], <2 x float>* [[TMP33]], i32 4, <2 x i1> [[TMP8]])			; FVW2-NEXT: [[TMP32:%.]] = getelementptr inbounds float, float [[IN]], i64 [[TMP31]]
	; FVW2-NEXT: [[TMP34:%.]] = getelementptr float, float [[TMP32]], i64 2			; FVW2-NEXT: [[TMP33:%.]] = load float, float [[TMP32]], align 4
	; FVW2-NEXT: [[TMP35:%.]] = bitcast float [[TMP34]] to <2 x float>*			; FVW2-NEXT: [[TMP34:%.*]] = insertelement <2 x float> [[TMP29]], float [[TMP33]], i64 1
	; FVW2-NEXT: call void @llvm.masked.store.v2f32.p0v2f32(<2 x float> [[TMP29]], <2 x float>* [[TMP35]], i32 4, <2 x i1> [[TMP9]])			; FVW2-NEXT: br label [[PRED_LOAD_CONTINUE15]]
	; FVW2-NEXT: [[TMP36:%.]] = getelementptr float, float [[TMP32]], i64 4			; FVW2: pred.load.continue15:
	; FVW2-NEXT: [[TMP37:%.]] = bitcast float [[TMP36]] to <2 x float>*			; FVW2-NEXT: [[TMP35:%.*]] = phi <2 x float> [ [[TMP29]], [[PRED_LOAD_CONTINUE]] ], [ [[TMP34]], [[PRED_LOAD_IF14]] ]
	; FVW2-NEXT: call void @llvm.masked.store.v2f32.p0v2f32(<2 x float> [[TMP30]], <2 x float>* [[TMP37]], i32 4, <2 x i1> [[TMP10]])			; FVW2-NEXT: [[TMP36:%.*]] = extractelement <2 x i1> [[TMP9]], i64 0
	; FVW2-NEXT: [[TMP38:%.]] = getelementptr float, float [[TMP32]], i64 6			; FVW2-NEXT: br i1 [[TMP36]], label [[PRED_LOAD_IF16:%.]], label [[PRED_LOAD_CONTINUE17:%.]]
	; FVW2-NEXT: [[TMP39:%.]] = bitcast float [[TMP38]] to <2 x float>*			; FVW2: pred.load.if16:
	; FVW2-NEXT: call void @llvm.masked.store.v2f32.p0v2f32(<2 x float> [[TMP31]], <2 x float>* [[TMP39]], i32 4, <2 x i1> [[TMP11]])			; FVW2-NEXT: [[TMP37:%.*]] = extractelement <2 x i64> [[TMP21]], i64 0
	; FVW2-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX17]], 8			; FVW2-NEXT: [[TMP38:%.]] = getelementptr inbounds float, float [[IN]], i64 [[TMP37]]
	; FVW2-NEXT: [[TMP40:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096			; FVW2-NEXT: [[TMP39:%.]] = load float, float [[TMP38]], align 4
	; FVW2-NEXT: br i1 [[TMP40]], label [[FOR_END:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]			; FVW2-NEXT: [[TMP40:%.*]] = insertelement <2 x float> poison, float [[TMP39]], i64 0
				; FVW2-NEXT: br label [[PRED_LOAD_CONTINUE17]]
				; FVW2: pred.load.continue17:
				; FVW2-NEXT: [[TMP41:%.*]] = phi <2 x float> [ poison, [[PRED_LOAD_CONTINUE15]] ], [ [[TMP40]], [[PRED_LOAD_IF16]] ]
				; FVW2-NEXT: [[TMP42:%.*]] = extractelement <2 x i1> [[TMP9]], i64 1
				; FVW2-NEXT: br i1 [[TMP42]], label [[PRED_LOAD_IF18:%.]], label [[PRED_LOAD_CONTINUE19:%.]]
				; FVW2: pred.load.if18:
				; FVW2-NEXT: [[TMP43:%.*]] = extractelement <2 x i64> [[TMP21]], i64 1
				; FVW2-NEXT: [[TMP44:%.]] = getelementptr inbounds float, float [[IN]], i64 [[TMP43]]
				; FVW2-NEXT: [[TMP45:%.]] = load float, float [[TMP44]], align 4
				; FVW2-NEXT: [[TMP46:%.*]] = insertelement <2 x float> [[TMP41]], float [[TMP45]], i64 1
				; FVW2-NEXT: br label [[PRED_LOAD_CONTINUE19]]
				; FVW2: pred.load.continue19:
				; FVW2-NEXT: [[TMP47:%.*]] = phi <2 x float> [ [[TMP41]], [[PRED_LOAD_CONTINUE17]] ], [ [[TMP46]], [[PRED_LOAD_IF18]] ]
				; FVW2-NEXT: [[TMP48:%.*]] = extractelement <2 x i1> [[TMP10]], i64 0
				; FVW2-NEXT: br i1 [[TMP48]], label [[PRED_LOAD_IF20:%.]], label [[PRED_LOAD_CONTINUE21:%.]]
				; FVW2: pred.load.if20:
				; FVW2-NEXT: [[TMP49:%.*]] = extractelement <2 x i64> [[TMP22]], i64 0
				; FVW2-NEXT: [[TMP50:%.]] = getelementptr inbounds float, float [[IN]], i64 [[TMP49]]
				; FVW2-NEXT: [[TMP51:%.]] = load float, float [[TMP50]], align 4
				; FVW2-NEXT: [[TMP52:%.*]] = insertelement <2 x float> poison, float [[TMP51]], i64 0
				; FVW2-NEXT: br label [[PRED_LOAD_CONTINUE21]]
				; FVW2: pred.load.continue21:
				; FVW2-NEXT: [[TMP53:%.*]] = phi <2 x float> [ poison, [[PRED_LOAD_CONTINUE19]] ], [ [[TMP52]], [[PRED_LOAD_IF20]] ]
				; FVW2-NEXT: [[TMP54:%.*]] = extractelement <2 x i1> [[TMP10]], i64 1
				; FVW2-NEXT: br i1 [[TMP54]], label [[PRED_LOAD_IF22:%.]], label [[PRED_LOAD_CONTINUE23:%.]]
				; FVW2: pred.load.if22:
				; FVW2-NEXT: [[TMP55:%.*]] = extractelement <2 x i64> [[TMP22]], i64 1
				; FVW2-NEXT: [[TMP56:%.]] = getelementptr inbounds float, float [[IN]], i64 [[TMP55]]
				; FVW2-NEXT: [[TMP57:%.]] = load float, float [[TMP56]], align 4
				; FVW2-NEXT: [[TMP58:%.*]] = insertelement <2 x float> [[TMP53]], float [[TMP57]], i64 1
				; FVW2-NEXT: br label [[PRED_LOAD_CONTINUE23]]
				; FVW2: pred.load.continue23:
				; FVW2-NEXT: [[TMP59:%.*]] = phi <2 x float> [ [[TMP53]], [[PRED_LOAD_CONTINUE21]] ], [ [[TMP58]], [[PRED_LOAD_IF22]] ]
				; FVW2-NEXT: [[TMP60:%.*]] = extractelement <2 x i1> [[TMP11]], i64 0
				; FVW2-NEXT: br i1 [[TMP60]], label [[PRED_LOAD_IF24:%.]], label [[PRED_LOAD_CONTINUE25:%.]]
				; FVW2: pred.load.if24:
				; FVW2-NEXT: [[TMP61:%.*]] = extractelement <2 x i64> [[TMP23]], i64 0
				; FVW2-NEXT: [[TMP62:%.]] = getelementptr inbounds float, float [[IN]], i64 [[TMP61]]
				; FVW2-NEXT: [[TMP63:%.]] = load float, float [[TMP62]], align 4
				; FVW2-NEXT: [[TMP64:%.*]] = insertelement <2 x float> poison, float [[TMP63]], i64 0
				; FVW2-NEXT: br label [[PRED_LOAD_CONTINUE25]]
				; FVW2: pred.load.continue25:
				; FVW2-NEXT: [[TMP65:%.*]] = phi <2 x float> [ poison, [[PRED_LOAD_CONTINUE23]] ], [ [[TMP64]], [[PRED_LOAD_IF24]] ]
				; FVW2-NEXT: [[TMP66:%.*]] = extractelement <2 x i1> [[TMP11]], i64 1
				; FVW2-NEXT: br i1 [[TMP66]], label [[PRED_LOAD_IF26:%.*]], label [[PRED_LOAD_CONTINUE27]]
				; FVW2: pred.load.if26:
				; FVW2-NEXT: [[TMP67:%.*]] = extractelement <2 x i64> [[TMP23]], i64 1
				; FVW2-NEXT: [[TMP68:%.]] = getelementptr inbounds float, float [[IN]], i64 [[TMP67]]
				; FVW2-NEXT: [[TMP69:%.]] = load float, float [[TMP68]], align 4
				; FVW2-NEXT: [[TMP70:%.*]] = insertelement <2 x float> [[TMP65]], float [[TMP69]], i64 1
				; FVW2-NEXT: br label [[PRED_LOAD_CONTINUE27]]
				; FVW2: pred.load.continue27:
				; FVW2-NEXT: [[TMP71:%.*]] = phi <2 x float> [ [[TMP65]], [[PRED_LOAD_CONTINUE25]] ], [ [[TMP70]], [[PRED_LOAD_IF26]] ]
				; FVW2-NEXT: [[TMP72:%.*]] = fadd <2 x float> [[TMP35]], <float 5.000000e-01, float 5.000000e-01>
				; FVW2-NEXT: [[TMP73:%.*]] = fadd <2 x float> [[TMP47]], <float 5.000000e-01, float 5.000000e-01>
				; FVW2-NEXT: [[TMP74:%.*]] = fadd <2 x float> [[TMP59]], <float 5.000000e-01, float 5.000000e-01>
				; FVW2-NEXT: [[TMP75:%.*]] = fadd <2 x float> [[TMP71]], <float 5.000000e-01, float 5.000000e-01>
				; FVW2-NEXT: [[TMP76:%.]] = getelementptr float, float [[OUT:%.*]], i64 [[INDEX7]]
				; FVW2-NEXT: [[TMP77:%.]] = bitcast float [[TMP76]] to <2 x float>*
				; FVW2-NEXT: call void @llvm.masked.store.v2f32.p0v2f32(<2 x float> [[TMP72]], <2 x float>* [[TMP77]], i32 4, <2 x i1> [[TMP8]])
				; FVW2-NEXT: [[TMP78:%.]] = getelementptr float, float [[TMP76]], i64 2
				; FVW2-NEXT: [[TMP79:%.]] = bitcast float [[TMP78]] to <2 x float>*
				; FVW2-NEXT: call void @llvm.masked.store.v2f32.p0v2f32(<2 x float> [[TMP73]], <2 x float>* [[TMP79]], i32 4, <2 x i1> [[TMP9]])
				; FVW2-NEXT: [[TMP80:%.]] = getelementptr float, float [[TMP76]], i64 4
				; FVW2-NEXT: [[TMP81:%.]] = bitcast float [[TMP80]] to <2 x float>*
				; FVW2-NEXT: call void @llvm.masked.store.v2f32.p0v2f32(<2 x float> [[TMP74]], <2 x float>* [[TMP81]], i32 4, <2 x i1> [[TMP10]])
				; FVW2-NEXT: [[TMP82:%.]] = getelementptr float, float [[TMP76]], i64 6
				; FVW2-NEXT: [[TMP83:%.]] = bitcast float [[TMP82]] to <2 x float>*
				; FVW2-NEXT: call void @llvm.masked.store.v2f32.p0v2f32(<2 x float> [[TMP75]], <2 x float>* [[TMP83]], i32 4, <2 x i1> [[TMP11]])
				; FVW2-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX7]], 8
				; FVW2-NEXT: [[TMP84:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096
				; FVW2-NEXT: br i1 [[TMP84]], label [[FOR_END:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
	; FVW2: for.end:			; FVW2: for.end:
	; FVW2-NEXT: ret void			; FVW2-NEXT: ret void
	;			;
	entry:			entry:
	%in.addr = alloca float*, align 8			%in.addr = alloca float*, align 8
	%out.addr = alloca float*, align 8			%out.addr = alloca float*, align 8
	%trigger.addr = alloca i32*, align 8			%trigger.addr = alloca i32*, align 8
	%index.addr = alloca i32*, align 8			%index.addr = alloca i32*, align 8
	▲ Show 20 Lines • Show All 194 Lines • ▼ Show 20 Lines
	; AVX512-NEXT: [[TMP79:%.]] = getelementptr inbounds float, float [[OUT]], <16 x i64> <i64 3840, i64 3856, i64 3872, i64 3888, i64 3904, i64 3920, i64 3936, i64 3952, i64 3968, i64 3984, i64 4000, i64 4016, i64 4032, i64 4048, i64 4064, i64 4080>			; AVX512-NEXT: [[TMP79:%.]] = getelementptr inbounds float, float [[OUT]], <16 x i64> <i64 3840, i64 3856, i64 3872, i64 3888, i64 3904, i64 3920, i64 3936, i64 3952, i64 3968, i64 3984, i64 4000, i64 4016, i64 4032, i64 4048, i64 4064, i64 4080>
	; AVX512-NEXT: call void @llvm.masked.scatter.v16f32.v16p0f32(<16 x float> [[TMP78]], <16 x float*> [[TMP79]], i32 4, <16 x i1> [[TMP76]])			; AVX512-NEXT: call void @llvm.masked.scatter.v16f32.v16p0f32(<16 x float> [[TMP78]], <16 x float*> [[TMP79]], i32 4, <16 x i1> [[TMP76]])
	; AVX512-NEXT: ret void			; AVX512-NEXT: ret void
	;			;
	; FVW2-LABEL: @foo2(			; FVW2-LABEL: @foo2(
	; FVW2-NEXT: entry:			; FVW2-NEXT: entry:
	; FVW2-NEXT: br label [[VECTOR_BODY:%.*]]			; FVW2-NEXT: br label [[VECTOR_BODY:%.*]]
	; FVW2: vector.body:			; FVW2: vector.body:
	; FVW2-NEXT: [[INDEX10:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDEX_NEXT:%.]], [[PRED_STORE_CONTINUE9:%.]] ]			; FVW2-NEXT: [[INDEX7:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDEX_NEXT:%.]], [[PRED_STORE_CONTINUE35:%.]] ]
	; FVW2-NEXT: [[VEC_IND:%.]] = phi <2 x i64> [ <i64 0, i64 16>, [[ENTRY]] ], [ [[VEC_IND_NEXT:%.]], [[PRED_STORE_CONTINUE9]] ]			; FVW2-NEXT: [[OFFSET_IDX:%.*]] = shl i64 [[INDEX7]], 4
	; FVW2-NEXT: [[OFFSET_IDX:%.*]] = shl i64 [[INDEX10]], 4
	; FVW2-NEXT: [[TMP0:%.*]] = or i64 [[OFFSET_IDX]], 16			; FVW2-NEXT: [[TMP0:%.*]] = or i64 [[OFFSET_IDX]], 16
	; FVW2-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, i32 [[TRIGGER:%.*]], i64 [[OFFSET_IDX]]			; FVW2-NEXT: [[TMP1:%.*]] = or i64 [[OFFSET_IDX]], 32
	; FVW2-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP0]]			; FVW2-NEXT: [[TMP2:%.*]] = or i64 [[OFFSET_IDX]], 48
	; FVW2-NEXT: [[TMP3:%.]] = load i32, i32 [[TMP1]], align 4			; FVW2-NEXT: [[TMP3:%.*]] = or i64 [[OFFSET_IDX]], 64
	; FVW2-NEXT: [[TMP4:%.]] = load i32, i32 [[TMP2]], align 4			; FVW2-NEXT: [[TMP4:%.*]] = or i64 [[OFFSET_IDX]], 80
	; FVW2-NEXT: [[TMP5:%.*]] = insertelement <2 x i32> poison, i32 [[TMP3]], i64 0			; FVW2-NEXT: [[TMP5:%.*]] = or i64 [[OFFSET_IDX]], 96
	; FVW2-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> [[TMP5]], i32 [[TMP4]], i64 1			; FVW2-NEXT: [[TMP6:%.*]] = or i64 [[OFFSET_IDX]], 112
	; FVW2-NEXT: [[TMP7:%.*]] = icmp sgt <2 x i32> [[TMP6]], zeroinitializer			; FVW2-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 [[TRIGGER:%.*]], i64 [[OFFSET_IDX]]
	; FVW2-NEXT: [[TMP8:%.]] = getelementptr inbounds [[STRUCT_IN:%.]], %struct.In* [[IN:%.*]], <2 x i64> [[VEC_IND]], i32 1			; FVW2-NEXT: [[TMP8:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP0]]
	; FVW2-NEXT: [[WIDE_MASKED_GATHER:%.]] = call <2 x float> @llvm.masked.gather.v2f32.v2p0f32(<2 x float> [[TMP8]], i32 4, <2 x i1> [[TMP7]], <2 x float> undef)			; FVW2-NEXT: [[TMP9:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP1]]
	; FVW2-NEXT: [[TMP9:%.*]] = fadd <2 x float> [[WIDE_MASKED_GATHER]], <float 5.000000e-01, float 5.000000e-01>			; FVW2-NEXT: [[TMP10:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP2]]
	; FVW2-NEXT: [[TMP10:%.*]] = extractelement <2 x i1> [[TMP7]], i64 0			; FVW2-NEXT: [[TMP11:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP3]]
	; FVW2-NEXT: br i1 [[TMP10]], label [[PRED_STORE_IF:%.]], label [[PRED_STORE_CONTINUE:%.]]			; FVW2-NEXT: [[TMP12:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP4]]
				; FVW2-NEXT: [[TMP13:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP5]]
				; FVW2-NEXT: [[TMP14:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP6]]
				; FVW2-NEXT: [[TMP15:%.]] = load i32, i32 [[TMP7]], align 4
				; FVW2-NEXT: [[TMP16:%.]] = load i32, i32 [[TMP8]], align 4
				; FVW2-NEXT: [[TMP17:%.*]] = insertelement <2 x i32> poison, i32 [[TMP15]], i64 0
				; FVW2-NEXT: [[TMP18:%.*]] = insertelement <2 x i32> [[TMP17]], i32 [[TMP16]], i64 1
				; FVW2-NEXT: [[TMP19:%.]] = load i32, i32 [[TMP9]], align 4
				; FVW2-NEXT: [[TMP20:%.]] = load i32, i32 [[TMP10]], align 4
				; FVW2-NEXT: [[TMP21:%.*]] = insertelement <2 x i32> poison, i32 [[TMP19]], i64 0
				; FVW2-NEXT: [[TMP22:%.*]] = insertelement <2 x i32> [[TMP21]], i32 [[TMP20]], i64 1
				; FVW2-NEXT: [[TMP23:%.]] = load i32, i32 [[TMP11]], align 4
				; FVW2-NEXT: [[TMP24:%.]] = load i32, i32 [[TMP12]], align 4
				; FVW2-NEXT: [[TMP25:%.*]] = insertelement <2 x i32> poison, i32 [[TMP23]], i64 0
				; FVW2-NEXT: [[TMP26:%.*]] = insertelement <2 x i32> [[TMP25]], i32 [[TMP24]], i64 1
				; FVW2-NEXT: [[TMP27:%.]] = load i32, i32 [[TMP13]], align 4
				; FVW2-NEXT: [[TMP28:%.]] = load i32, i32 [[TMP14]], align 4
				; FVW2-NEXT: [[TMP29:%.*]] = insertelement <2 x i32> poison, i32 [[TMP27]], i64 0
				; FVW2-NEXT: [[TMP30:%.*]] = insertelement <2 x i32> [[TMP29]], i32 [[TMP28]], i64 1
				; FVW2-NEXT: [[TMP31:%.*]] = icmp sgt <2 x i32> [[TMP18]], zeroinitializer
				; FVW2-NEXT: [[TMP32:%.*]] = icmp sgt <2 x i32> [[TMP22]], zeroinitializer
				; FVW2-NEXT: [[TMP33:%.*]] = icmp sgt <2 x i32> [[TMP26]], zeroinitializer
				; FVW2-NEXT: [[TMP34:%.*]] = icmp sgt <2 x i32> [[TMP30]], zeroinitializer
				; FVW2-NEXT: [[TMP35:%.*]] = extractelement <2 x i1> [[TMP31]], i64 0
				; FVW2-NEXT: br i1 [[TMP35]], label [[PRED_LOAD_IF:%.]], label [[PRED_LOAD_CONTINUE:%.]]
				; FVW2: pred.load.if:
				; FVW2-NEXT: [[TMP36:%.]] = getelementptr inbounds [[STRUCT_IN:%.]], %struct.In* [[IN:%.*]], i64 [[OFFSET_IDX]], i32 1
				; FVW2-NEXT: [[TMP37:%.]] = load float, float [[TMP36]], align 4
				; FVW2-NEXT: [[TMP38:%.*]] = insertelement <2 x float> poison, float [[TMP37]], i64 0
				; FVW2-NEXT: br label [[PRED_LOAD_CONTINUE]]
				; FVW2: pred.load.continue:
				; FVW2-NEXT: [[TMP39:%.*]] = phi <2 x float> [ poison, [[VECTOR_BODY]] ], [ [[TMP38]], [[PRED_LOAD_IF]] ]
				; FVW2-NEXT: [[TMP40:%.*]] = extractelement <2 x i1> [[TMP31]], i64 1
				; FVW2-NEXT: br i1 [[TMP40]], label [[PRED_LOAD_IF8:%.]], label [[PRED_LOAD_CONTINUE9:%.]]
				; FVW2: pred.load.if8:
				; FVW2-NEXT: [[TMP41:%.]] = getelementptr inbounds [[STRUCT_IN]], %struct.In [[IN]], i64 [[TMP0]], i32 1
				; FVW2-NEXT: [[TMP42:%.]] = load float, float [[TMP41]], align 4
				; FVW2-NEXT: [[TMP43:%.*]] = insertelement <2 x float> [[TMP39]], float [[TMP42]], i64 1
				; FVW2-NEXT: br label [[PRED_LOAD_CONTINUE9]]
				; FVW2: pred.load.continue9:
				; FVW2-NEXT: [[TMP44:%.*]] = phi <2 x float> [ [[TMP39]], [[PRED_LOAD_CONTINUE]] ], [ [[TMP43]], [[PRED_LOAD_IF8]] ]
				; FVW2-NEXT: [[TMP45:%.*]] = extractelement <2 x i1> [[TMP32]], i64 0
				; FVW2-NEXT: br i1 [[TMP45]], label [[PRED_LOAD_IF10:%.]], label [[PRED_LOAD_CONTINUE11:%.]]
				; FVW2: pred.load.if10:
				; FVW2-NEXT: [[TMP46:%.]] = getelementptr inbounds [[STRUCT_IN]], %struct.In [[IN]], i64 [[TMP1]], i32 1
				; FVW2-NEXT: [[TMP47:%.]] = load float, float [[TMP46]], align 4
				; FVW2-NEXT: [[TMP48:%.*]] = insertelement <2 x float> poison, float [[TMP47]], i64 0
				; FVW2-NEXT: br label [[PRED_LOAD_CONTINUE11]]
				; FVW2: pred.load.continue11:
				; FVW2-NEXT: [[TMP49:%.*]] = phi <2 x float> [ poison, [[PRED_LOAD_CONTINUE9]] ], [ [[TMP48]], [[PRED_LOAD_IF10]] ]
				; FVW2-NEXT: [[TMP50:%.*]] = extractelement <2 x i1> [[TMP32]], i64 1
				; FVW2-NEXT: br i1 [[TMP50]], label [[PRED_LOAD_IF12:%.]], label [[PRED_LOAD_CONTINUE13:%.]]
				; FVW2: pred.load.if12:
				; FVW2-NEXT: [[TMP51:%.]] = getelementptr inbounds [[STRUCT_IN]], %struct.In [[IN]], i64 [[TMP2]], i32 1
				; FVW2-NEXT: [[TMP52:%.]] = load float, float [[TMP51]], align 4
				; FVW2-NEXT: [[TMP53:%.*]] = insertelement <2 x float> [[TMP49]], float [[TMP52]], i64 1
				; FVW2-NEXT: br label [[PRED_LOAD_CONTINUE13]]
				; FVW2: pred.load.continue13:
				; FVW2-NEXT: [[TMP54:%.*]] = phi <2 x float> [ [[TMP49]], [[PRED_LOAD_CONTINUE11]] ], [ [[TMP53]], [[PRED_LOAD_IF12]] ]
				; FVW2-NEXT: [[TMP55:%.*]] = extractelement <2 x i1> [[TMP33]], i64 0
				; FVW2-NEXT: br i1 [[TMP55]], label [[PRED_LOAD_IF14:%.]], label [[PRED_LOAD_CONTINUE15:%.]]
				; FVW2: pred.load.if14:
				; FVW2-NEXT: [[TMP56:%.]] = getelementptr inbounds [[STRUCT_IN]], %struct.In [[IN]], i64 [[TMP3]], i32 1
				; FVW2-NEXT: [[TMP57:%.]] = load float, float [[TMP56]], align 4
				; FVW2-NEXT: [[TMP58:%.*]] = insertelement <2 x float> poison, float [[TMP57]], i64 0
				; FVW2-NEXT: br label [[PRED_LOAD_CONTINUE15]]
				; FVW2: pred.load.continue15:
				; FVW2-NEXT: [[TMP59:%.*]] = phi <2 x float> [ poison, [[PRED_LOAD_CONTINUE13]] ], [ [[TMP58]], [[PRED_LOAD_IF14]] ]
				; FVW2-NEXT: [[TMP60:%.*]] = extractelement <2 x i1> [[TMP33]], i64 1
				; FVW2-NEXT: br i1 [[TMP60]], label [[PRED_LOAD_IF16:%.]], label [[PRED_LOAD_CONTINUE17:%.]]
				; FVW2: pred.load.if16:
				; FVW2-NEXT: [[TMP61:%.]] = getelementptr inbounds [[STRUCT_IN]], %struct.In [[IN]], i64 [[TMP4]], i32 1
				; FVW2-NEXT: [[TMP62:%.]] = load float, float [[TMP61]], align 4
				; FVW2-NEXT: [[TMP63:%.*]] = insertelement <2 x float> [[TMP59]], float [[TMP62]], i64 1
				; FVW2-NEXT: br label [[PRED_LOAD_CONTINUE17]]
				; FVW2: pred.load.continue17:
				; FVW2-NEXT: [[TMP64:%.*]] = phi <2 x float> [ [[TMP59]], [[PRED_LOAD_CONTINUE15]] ], [ [[TMP63]], [[PRED_LOAD_IF16]] ]
				; FVW2-NEXT: [[TMP65:%.*]] = extractelement <2 x i1> [[TMP34]], i64 0
				; FVW2-NEXT: br i1 [[TMP65]], label [[PRED_LOAD_IF18:%.]], label [[PRED_LOAD_CONTINUE19:%.]]
				; FVW2: pred.load.if18:
				; FVW2-NEXT: [[TMP66:%.]] = getelementptr inbounds [[STRUCT_IN]], %struct.In [[IN]], i64 [[TMP5]], i32 1
				; FVW2-NEXT: [[TMP67:%.]] = load float, float [[TMP66]], align 4
				; FVW2-NEXT: [[TMP68:%.*]] = insertelement <2 x float> poison, float [[TMP67]], i64 0
				; FVW2-NEXT: br label [[PRED_LOAD_CONTINUE19]]
				; FVW2: pred.load.continue19:
				; FVW2-NEXT: [[TMP69:%.*]] = phi <2 x float> [ poison, [[PRED_LOAD_CONTINUE17]] ], [ [[TMP68]], [[PRED_LOAD_IF18]] ]
				; FVW2-NEXT: [[TMP70:%.*]] = extractelement <2 x i1> [[TMP34]], i64 1
				; FVW2-NEXT: br i1 [[TMP70]], label [[PRED_LOAD_IF20:%.]], label [[PRED_LOAD_CONTINUE21:%.]]
				; FVW2: pred.load.if20:
				; FVW2-NEXT: [[TMP71:%.]] = getelementptr inbounds [[STRUCT_IN]], %struct.In [[IN]], i64 [[TMP6]], i32 1
				; FVW2-NEXT: [[TMP72:%.]] = load float, float [[TMP71]], align 4
				; FVW2-NEXT: [[TMP73:%.*]] = insertelement <2 x float> [[TMP69]], float [[TMP72]], i64 1
				; FVW2-NEXT: br label [[PRED_LOAD_CONTINUE21]]
				; FVW2: pred.load.continue21:
				; FVW2-NEXT: [[TMP74:%.*]] = phi <2 x float> [ [[TMP69]], [[PRED_LOAD_CONTINUE19]] ], [ [[TMP73]], [[PRED_LOAD_IF20]] ]
				; FVW2-NEXT: [[TMP75:%.*]] = fadd <2 x float> [[TMP44]], <float 5.000000e-01, float 5.000000e-01>
				; FVW2-NEXT: [[TMP76:%.*]] = fadd <2 x float> [[TMP54]], <float 5.000000e-01, float 5.000000e-01>
				; FVW2-NEXT: [[TMP77:%.*]] = fadd <2 x float> [[TMP64]], <float 5.000000e-01, float 5.000000e-01>
				; FVW2-NEXT: [[TMP78:%.*]] = fadd <2 x float> [[TMP74]], <float 5.000000e-01, float 5.000000e-01>
				; FVW2-NEXT: [[TMP79:%.*]] = extractelement <2 x i1> [[TMP31]], i64 0
				; FVW2-NEXT: br i1 [[TMP79]], label [[PRED_STORE_IF:%.]], label [[PRED_STORE_CONTINUE:%.]]
	; FVW2: pred.store.if:			; FVW2: pred.store.if:
	; FVW2-NEXT: [[TMP11:%.]] = getelementptr inbounds float, float [[OUT:%.*]], i64 [[OFFSET_IDX]]			; FVW2-NEXT: [[TMP80:%.]] = getelementptr inbounds float, float [[OUT:%.*]], i64 [[OFFSET_IDX]]
	; FVW2-NEXT: [[TMP12:%.*]] = extractelement <2 x float> [[TMP9]], i64 0			; FVW2-NEXT: [[TMP81:%.*]] = extractelement <2 x float> [[TMP75]], i64 0
	; FVW2-NEXT: store float [[TMP12]], float* [[TMP11]], align 4			; FVW2-NEXT: store float [[TMP81]], float* [[TMP80]], align 4
	; FVW2-NEXT: br label [[PRED_STORE_CONTINUE]]			; FVW2-NEXT: br label [[PRED_STORE_CONTINUE]]
	; FVW2: pred.store.continue:			; FVW2: pred.store.continue:
	; FVW2-NEXT: [[TMP13:%.*]] = extractelement <2 x i1> [[TMP7]], i64 1			; FVW2-NEXT: [[TMP82:%.*]] = extractelement <2 x i1> [[TMP31]], i64 1
	; FVW2-NEXT: br i1 [[TMP13]], label [[PRED_STORE_IF8:%.*]], label [[PRED_STORE_CONTINUE9]]			; FVW2-NEXT: br i1 [[TMP82]], label [[PRED_STORE_IF22:%.]], label [[PRED_STORE_CONTINUE23:%.]]
	; FVW2: pred.store.if8:			; FVW2: pred.store.if22:
	; FVW2-NEXT: [[TMP14:%.]] = getelementptr inbounds float, float [[OUT]], i64 [[TMP0]]			; FVW2-NEXT: [[TMP83:%.]] = getelementptr inbounds float, float [[OUT]], i64 [[TMP0]]
	; FVW2-NEXT: [[TMP15:%.*]] = extractelement <2 x float> [[TMP9]], i64 1			; FVW2-NEXT: [[TMP84:%.*]] = extractelement <2 x float> [[TMP75]], i64 1
	; FVW2-NEXT: store float [[TMP15]], float* [[TMP14]], align 4			; FVW2-NEXT: store float [[TMP84]], float* [[TMP83]], align 4
	; FVW2-NEXT: br label [[PRED_STORE_CONTINUE9]]			; FVW2-NEXT: br label [[PRED_STORE_CONTINUE23]]
	; FVW2: pred.store.continue9:			; FVW2: pred.store.continue23:
	; FVW2-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX10]], 2			; FVW2-NEXT: [[TMP85:%.*]] = extractelement <2 x i1> [[TMP32]], i64 0
	; FVW2-NEXT: [[VEC_IND_NEXT]] = add <2 x i64> [[VEC_IND]], <i64 32, i64 32>			; FVW2-NEXT: br i1 [[TMP85]], label [[PRED_STORE_IF24:%.]], label [[PRED_STORE_CONTINUE25:%.]]
	; FVW2-NEXT: [[TMP16:%.*]] = icmp eq i64 [[INDEX_NEXT]], 256			; FVW2: pred.store.if24:
	; FVW2-NEXT: br i1 [[TMP16]], label [[FOR_END:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP2:![0-9]+]]			; FVW2-NEXT: [[TMP86:%.]] = getelementptr inbounds float, float [[OUT]], i64 [[TMP1]]
				; FVW2-NEXT: [[TMP87:%.*]] = extractelement <2 x float> [[TMP76]], i64 0
				; FVW2-NEXT: store float [[TMP87]], float* [[TMP86]], align 4
				; FVW2-NEXT: br label [[PRED_STORE_CONTINUE25]]
				; FVW2: pred.store.continue25:
				; FVW2-NEXT: [[TMP88:%.*]] = extractelement <2 x i1> [[TMP32]], i64 1
				; FVW2-NEXT: br i1 [[TMP88]], label [[PRED_STORE_IF26:%.]], label [[PRED_STORE_CONTINUE27:%.]]
				; FVW2: pred.store.if26:
				; FVW2-NEXT: [[TMP89:%.]] = getelementptr inbounds float, float [[OUT]], i64 [[TMP2]]
				; FVW2-NEXT: [[TMP90:%.*]] = extractelement <2 x float> [[TMP76]], i64 1
				; FVW2-NEXT: store float [[TMP90]], float* [[TMP89]], align 4
				; FVW2-NEXT: br label [[PRED_STORE_CONTINUE27]]
				; FVW2: pred.store.continue27:
				; FVW2-NEXT: [[TMP91:%.*]] = extractelement <2 x i1> [[TMP33]], i64 0
				; FVW2-NEXT: br i1 [[TMP91]], label [[PRED_STORE_IF28:%.]], label [[PRED_STORE_CONTINUE29:%.]]
				; FVW2: pred.store.if28:
				; FVW2-NEXT: [[TMP92:%.]] = getelementptr inbounds float, float [[OUT]], i64 [[TMP3]]
				; FVW2-NEXT: [[TMP93:%.*]] = extractelement <2 x float> [[TMP77]], i64 0
				; FVW2-NEXT: store float [[TMP93]], float* [[TMP92]], align 4
				; FVW2-NEXT: br label [[PRED_STORE_CONTINUE29]]
				; FVW2: pred.store.continue29:
				; FVW2-NEXT: [[TMP94:%.*]] = extractelement <2 x i1> [[TMP33]], i64 1
				; FVW2-NEXT: br i1 [[TMP94]], label [[PRED_STORE_IF30:%.]], label [[PRED_STORE_CONTINUE31:%.]]
				; FVW2: pred.store.if30:
				; FVW2-NEXT: [[TMP95:%.]] = getelementptr inbounds float, float [[OUT]], i64 [[TMP4]]
				; FVW2-NEXT: [[TMP96:%.*]] = extractelement <2 x float> [[TMP77]], i64 1
				; FVW2-NEXT: store float [[TMP96]], float* [[TMP95]], align 4
				; FVW2-NEXT: br label [[PRED_STORE_CONTINUE31]]
				; FVW2: pred.store.continue31:
				; FVW2-NEXT: [[TMP97:%.*]] = extractelement <2 x i1> [[TMP34]], i64 0
				; FVW2-NEXT: br i1 [[TMP97]], label [[PRED_STORE_IF32:%.]], label [[PRED_STORE_CONTINUE33:%.]]
				; FVW2: pred.store.if32:
				; FVW2-NEXT: [[TMP98:%.]] = getelementptr inbounds float, float [[OUT]], i64 [[TMP5]]
				; FVW2-NEXT: [[TMP99:%.*]] = extractelement <2 x float> [[TMP78]], i64 0
				; FVW2-NEXT: store float [[TMP99]], float* [[TMP98]], align 4
				; FVW2-NEXT: br label [[PRED_STORE_CONTINUE33]]
				; FVW2: pred.store.continue33:
				; FVW2-NEXT: [[TMP100:%.*]] = extractelement <2 x i1> [[TMP34]], i64 1
				; FVW2-NEXT: br i1 [[TMP100]], label [[PRED_STORE_IF34:%.*]], label [[PRED_STORE_CONTINUE35]]
				; FVW2: pred.store.if34:
				; FVW2-NEXT: [[TMP101:%.]] = getelementptr inbounds float, float [[OUT]], i64 [[TMP6]]
				; FVW2-NEXT: [[TMP102:%.*]] = extractelement <2 x float> [[TMP78]], i64 1
				; FVW2-NEXT: store float [[TMP102]], float* [[TMP101]], align 4
				; FVW2-NEXT: br label [[PRED_STORE_CONTINUE35]]
				; FVW2: pred.store.continue35:
				; FVW2-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX7]], 8
				; FVW2-NEXT: [[TMP103:%.*]] = icmp eq i64 [[INDEX_NEXT]], 256
				; FVW2-NEXT: br i1 [[TMP103]], label [[FOR_END:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP2:![0-9]+]]
	; FVW2: for.end:			; FVW2: for.end:
	; FVW2-NEXT: ret void			; FVW2-NEXT: ret void
	;			;
	entry:			entry:
	%in.addr = alloca %struct.In*, align 8			%in.addr = alloca %struct.In*, align 8
	%out.addr = alloca float*, align 8			%out.addr = alloca float*, align 8
	%trigger.addr = alloca i32*, align 8			%trigger.addr = alloca i32*, align 8
	%index.addr = alloca i32*, align 8			%index.addr = alloca i32*, align 8
	▲ Show 20 Lines • Show All 195 Lines • ▼ Show 20 Lines
	; AVX512-NEXT: [[TMP79:%.]] = getelementptr inbounds [[STRUCT_OUT]], %struct.Out [[OUT]], <16 x i64> <i64 3840, i64 3856, i64 3872, i64 3888, i64 3904, i64 3920, i64 3936, i64 3952, i64 3968, i64 3984, i64 4000, i64 4016, i64 4032, i64 4048, i64 4064, i64 4080>, i32 1			; AVX512-NEXT: [[TMP79:%.]] = getelementptr inbounds [[STRUCT_OUT]], %struct.Out [[OUT]], <16 x i64> <i64 3840, i64 3856, i64 3872, i64 3888, i64 3904, i64 3920, i64 3936, i64 3952, i64 3968, i64 3984, i64 4000, i64 4016, i64 4032, i64 4048, i64 4064, i64 4080>, i32 1
	; AVX512-NEXT: call void @llvm.masked.scatter.v16f32.v16p0f32(<16 x float> [[TMP78]], <16 x float*> [[TMP79]], i32 4, <16 x i1> [[TMP76]])			; AVX512-NEXT: call void @llvm.masked.scatter.v16f32.v16p0f32(<16 x float> [[TMP78]], <16 x float*> [[TMP79]], i32 4, <16 x i1> [[TMP76]])
	; AVX512-NEXT: ret void			; AVX512-NEXT: ret void
	;			;
	; FVW2-LABEL: @foo3(			; FVW2-LABEL: @foo3(
	; FVW2-NEXT: entry:			; FVW2-NEXT: entry:
	; FVW2-NEXT: br label [[VECTOR_BODY:%.*]]			; FVW2-NEXT: br label [[VECTOR_BODY:%.*]]
	; FVW2: vector.body:			; FVW2: vector.body:
	; FVW2-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDEX_NEXT:%.]], [[PRED_STORE_CONTINUE8:%.]] ]			; FVW2-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDEX_NEXT:%.]], [[PRED_STORE_CONTINUE34:%.]] ]
	; FVW2-NEXT: [[VEC_IND:%.]] = phi <2 x i64> [ <i64 0, i64 16>, [[ENTRY]] ], [ [[VEC_IND_NEXT:%.]], [[PRED_STORE_CONTINUE8]] ]
	; FVW2-NEXT: [[OFFSET_IDX:%.*]] = shl i64 [[INDEX]], 4			; FVW2-NEXT: [[OFFSET_IDX:%.*]] = shl i64 [[INDEX]], 4
	; FVW2-NEXT: [[TMP0:%.*]] = or i64 [[OFFSET_IDX]], 16			; FVW2-NEXT: [[TMP0:%.*]] = or i64 [[OFFSET_IDX]], 16
	; FVW2-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, i32 [[TRIGGER:%.*]], i64 [[OFFSET_IDX]]			; FVW2-NEXT: [[TMP1:%.*]] = or i64 [[OFFSET_IDX]], 32
	; FVW2-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP0]]			; FVW2-NEXT: [[TMP2:%.*]] = or i64 [[OFFSET_IDX]], 48
	; FVW2-NEXT: [[TMP3:%.]] = load i32, i32 [[TMP1]], align 4			; FVW2-NEXT: [[TMP3:%.*]] = or i64 [[OFFSET_IDX]], 64
	; FVW2-NEXT: [[TMP4:%.]] = load i32, i32 [[TMP2]], align 4			; FVW2-NEXT: [[TMP4:%.*]] = or i64 [[OFFSET_IDX]], 80
	; FVW2-NEXT: [[TMP5:%.*]] = insertelement <2 x i32> poison, i32 [[TMP3]], i64 0			; FVW2-NEXT: [[TMP5:%.*]] = or i64 [[OFFSET_IDX]], 96
	; FVW2-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> [[TMP5]], i32 [[TMP4]], i64 1			; FVW2-NEXT: [[TMP6:%.*]] = or i64 [[OFFSET_IDX]], 112
	; FVW2-NEXT: [[TMP7:%.*]] = icmp sgt <2 x i32> [[TMP6]], zeroinitializer			; FVW2-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 [[TRIGGER:%.*]], i64 [[OFFSET_IDX]]
	; FVW2-NEXT: [[TMP8:%.]] = getelementptr inbounds [[STRUCT_IN:%.]], %struct.In* [[IN:%.*]], <2 x i64> [[VEC_IND]], i32 1			; FVW2-NEXT: [[TMP8:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP0]]
	; FVW2-NEXT: [[WIDE_MASKED_GATHER:%.]] = call <2 x float> @llvm.masked.gather.v2f32.v2p0f32(<2 x float> [[TMP8]], i32 4, <2 x i1> [[TMP7]], <2 x float> undef)			; FVW2-NEXT: [[TMP9:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP1]]
	; FVW2-NEXT: [[TMP9:%.*]] = fadd <2 x float> [[WIDE_MASKED_GATHER]], <float 5.000000e-01, float 5.000000e-01>			; FVW2-NEXT: [[TMP10:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP2]]
	; FVW2-NEXT: [[TMP10:%.*]] = extractelement <2 x i1> [[TMP7]], i64 0			; FVW2-NEXT: [[TMP11:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP3]]
	; FVW2-NEXT: br i1 [[TMP10]], label [[PRED_STORE_IF:%.]], label [[PRED_STORE_CONTINUE:%.]]			; FVW2-NEXT: [[TMP12:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP4]]
				; FVW2-NEXT: [[TMP13:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP5]]
				; FVW2-NEXT: [[TMP14:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP6]]
				; FVW2-NEXT: [[TMP15:%.]] = load i32, i32 [[TMP7]], align 4
				; FVW2-NEXT: [[TMP16:%.]] = load i32, i32 [[TMP8]], align 4
				; FVW2-NEXT: [[TMP17:%.*]] = insertelement <2 x i32> poison, i32 [[TMP15]], i64 0
				; FVW2-NEXT: [[TMP18:%.*]] = insertelement <2 x i32> [[TMP17]], i32 [[TMP16]], i64 1
				; FVW2-NEXT: [[TMP19:%.]] = load i32, i32 [[TMP9]], align 4
				; FVW2-NEXT: [[TMP20:%.]] = load i32, i32 [[TMP10]], align 4
				; FVW2-NEXT: [[TMP21:%.*]] = insertelement <2 x i32> poison, i32 [[TMP19]], i64 0
				; FVW2-NEXT: [[TMP22:%.*]] = insertelement <2 x i32> [[TMP21]], i32 [[TMP20]], i64 1
				; FVW2-NEXT: [[TMP23:%.]] = load i32, i32 [[TMP11]], align 4
				; FVW2-NEXT: [[TMP24:%.]] = load i32, i32 [[TMP12]], align 4
				; FVW2-NEXT: [[TMP25:%.*]] = insertelement <2 x i32> poison, i32 [[TMP23]], i64 0
				; FVW2-NEXT: [[TMP26:%.*]] = insertelement <2 x i32> [[TMP25]], i32 [[TMP24]], i64 1
				; FVW2-NEXT: [[TMP27:%.]] = load i32, i32 [[TMP13]], align 4
				; FVW2-NEXT: [[TMP28:%.]] = load i32, i32 [[TMP14]], align 4
				; FVW2-NEXT: [[TMP29:%.*]] = insertelement <2 x i32> poison, i32 [[TMP27]], i64 0
				; FVW2-NEXT: [[TMP30:%.*]] = insertelement <2 x i32> [[TMP29]], i32 [[TMP28]], i64 1
				; FVW2-NEXT: [[TMP31:%.*]] = icmp sgt <2 x i32> [[TMP18]], zeroinitializer
				; FVW2-NEXT: [[TMP32:%.*]] = icmp sgt <2 x i32> [[TMP22]], zeroinitializer
				; FVW2-NEXT: [[TMP33:%.*]] = icmp sgt <2 x i32> [[TMP26]], zeroinitializer
				; FVW2-NEXT: [[TMP34:%.*]] = icmp sgt <2 x i32> [[TMP30]], zeroinitializer
				; FVW2-NEXT: [[TMP35:%.*]] = extractelement <2 x i1> [[TMP31]], i64 0
				; FVW2-NEXT: br i1 [[TMP35]], label [[PRED_LOAD_IF:%.]], label [[PRED_LOAD_CONTINUE:%.]]
				; FVW2: pred.load.if:
				; FVW2-NEXT: [[TMP36:%.]] = getelementptr inbounds [[STRUCT_IN:%.]], %struct.In* [[IN:%.*]], i64 [[OFFSET_IDX]], i32 1
				; FVW2-NEXT: [[TMP37:%.]] = load float, float [[TMP36]], align 4
				; FVW2-NEXT: [[TMP38:%.*]] = insertelement <2 x float> poison, float [[TMP37]], i64 0
				; FVW2-NEXT: br label [[PRED_LOAD_CONTINUE]]
				; FVW2: pred.load.continue:
				; FVW2-NEXT: [[TMP39:%.*]] = phi <2 x float> [ poison, [[VECTOR_BODY]] ], [ [[TMP38]], [[PRED_LOAD_IF]] ]
				; FVW2-NEXT: [[TMP40:%.*]] = extractelement <2 x i1> [[TMP31]], i64 1
				; FVW2-NEXT: br i1 [[TMP40]], label [[PRED_LOAD_IF7:%.]], label [[PRED_LOAD_CONTINUE8:%.]]
				; FVW2: pred.load.if7:
				; FVW2-NEXT: [[TMP41:%.]] = getelementptr inbounds [[STRUCT_IN]], %struct.In [[IN]], i64 [[TMP0]], i32 1
				; FVW2-NEXT: [[TMP42:%.]] = load float, float [[TMP41]], align 4
				; FVW2-NEXT: [[TMP43:%.*]] = insertelement <2 x float> [[TMP39]], float [[TMP42]], i64 1
				; FVW2-NEXT: br label [[PRED_LOAD_CONTINUE8]]
				; FVW2: pred.load.continue8:
				; FVW2-NEXT: [[TMP44:%.*]] = phi <2 x float> [ [[TMP39]], [[PRED_LOAD_CONTINUE]] ], [ [[TMP43]], [[PRED_LOAD_IF7]] ]
				; FVW2-NEXT: [[TMP45:%.*]] = extractelement <2 x i1> [[TMP32]], i64 0
				; FVW2-NEXT: br i1 [[TMP45]], label [[PRED_LOAD_IF9:%.]], label [[PRED_LOAD_CONTINUE10:%.]]
				; FVW2: pred.load.if9:
				; FVW2-NEXT: [[TMP46:%.]] = getelementptr inbounds [[STRUCT_IN]], %struct.In [[IN]], i64 [[TMP1]], i32 1
				; FVW2-NEXT: [[TMP47:%.]] = load float, float [[TMP46]], align 4
				; FVW2-NEXT: [[TMP48:%.*]] = insertelement <2 x float> poison, float [[TMP47]], i64 0
				; FVW2-NEXT: br label [[PRED_LOAD_CONTINUE10]]
				; FVW2: pred.load.continue10:
				; FVW2-NEXT: [[TMP49:%.*]] = phi <2 x float> [ poison, [[PRED_LOAD_CONTINUE8]] ], [ [[TMP48]], [[PRED_LOAD_IF9]] ]
				; FVW2-NEXT: [[TMP50:%.*]] = extractelement <2 x i1> [[TMP32]], i64 1
				; FVW2-NEXT: br i1 [[TMP50]], label [[PRED_LOAD_IF11:%.]], label [[PRED_LOAD_CONTINUE12:%.]]
				; FVW2: pred.load.if11:
				; FVW2-NEXT: [[TMP51:%.]] = getelementptr inbounds [[STRUCT_IN]], %struct.In [[IN]], i64 [[TMP2]], i32 1
				; FVW2-NEXT: [[TMP52:%.]] = load float, float [[TMP51]], align 4
				; FVW2-NEXT: [[TMP53:%.*]] = insertelement <2 x float> [[TMP49]], float [[TMP52]], i64 1
				; FVW2-NEXT: br label [[PRED_LOAD_CONTINUE12]]
				; FVW2: pred.load.continue12:
				; FVW2-NEXT: [[TMP54:%.*]] = phi <2 x float> [ [[TMP49]], [[PRED_LOAD_CONTINUE10]] ], [ [[TMP53]], [[PRED_LOAD_IF11]] ]
				; FVW2-NEXT: [[TMP55:%.*]] = extractelement <2 x i1> [[TMP33]], i64 0
				; FVW2-NEXT: br i1 [[TMP55]], label [[PRED_LOAD_IF13:%.]], label [[PRED_LOAD_CONTINUE14:%.]]
				; FVW2: pred.load.if13:
				; FVW2-NEXT: [[TMP56:%.]] = getelementptr inbounds [[STRUCT_IN]], %struct.In [[IN]], i64 [[TMP3]], i32 1
				; FVW2-NEXT: [[TMP57:%.]] = load float, float [[TMP56]], align 4
				; FVW2-NEXT: [[TMP58:%.*]] = insertelement <2 x float> poison, float [[TMP57]], i64 0
				; FVW2-NEXT: br label [[PRED_LOAD_CONTINUE14]]
				; FVW2: pred.load.continue14:
				; FVW2-NEXT: [[TMP59:%.*]] = phi <2 x float> [ poison, [[PRED_LOAD_CONTINUE12]] ], [ [[TMP58]], [[PRED_LOAD_IF13]] ]
				; FVW2-NEXT: [[TMP60:%.*]] = extractelement <2 x i1> [[TMP33]], i64 1
				; FVW2-NEXT: br i1 [[TMP60]], label [[PRED_LOAD_IF15:%.]], label [[PRED_LOAD_CONTINUE16:%.]]
				; FVW2: pred.load.if15:
				; FVW2-NEXT: [[TMP61:%.]] = getelementptr inbounds [[STRUCT_IN]], %struct.In [[IN]], i64 [[TMP4]], i32 1
				; FVW2-NEXT: [[TMP62:%.]] = load float, float [[TMP61]], align 4
				; FVW2-NEXT: [[TMP63:%.*]] = insertelement <2 x float> [[TMP59]], float [[TMP62]], i64 1
				; FVW2-NEXT: br label [[PRED_LOAD_CONTINUE16]]
				; FVW2: pred.load.continue16:
				; FVW2-NEXT: [[TMP64:%.*]] = phi <2 x float> [ [[TMP59]], [[PRED_LOAD_CONTINUE14]] ], [ [[TMP63]], [[PRED_LOAD_IF15]] ]
				; FVW2-NEXT: [[TMP65:%.*]] = extractelement <2 x i1> [[TMP34]], i64 0
				; FVW2-NEXT: br i1 [[TMP65]], label [[PRED_LOAD_IF17:%.]], label [[PRED_LOAD_CONTINUE18:%.]]
				; FVW2: pred.load.if17:
				; FVW2-NEXT: [[TMP66:%.]] = getelementptr inbounds [[STRUCT_IN]], %struct.In [[IN]], i64 [[TMP5]], i32 1
				; FVW2-NEXT: [[TMP67:%.]] = load float, float [[TMP66]], align 4
				; FVW2-NEXT: [[TMP68:%.*]] = insertelement <2 x float> poison, float [[TMP67]], i64 0
				; FVW2-NEXT: br label [[PRED_LOAD_CONTINUE18]]
				; FVW2: pred.load.continue18:
				; FVW2-NEXT: [[TMP69:%.*]] = phi <2 x float> [ poison, [[PRED_LOAD_CONTINUE16]] ], [ [[TMP68]], [[PRED_LOAD_IF17]] ]
				; FVW2-NEXT: [[TMP70:%.*]] = extractelement <2 x i1> [[TMP34]], i64 1
				; FVW2-NEXT: br i1 [[TMP70]], label [[PRED_LOAD_IF19:%.]], label [[PRED_LOAD_CONTINUE20:%.]]
				; FVW2: pred.load.if19:
				; FVW2-NEXT: [[TMP71:%.]] = getelementptr inbounds [[STRUCT_IN]], %struct.In [[IN]], i64 [[TMP6]], i32 1
				; FVW2-NEXT: [[TMP72:%.]] = load float, float [[TMP71]], align 4
				; FVW2-NEXT: [[TMP73:%.*]] = insertelement <2 x float> [[TMP69]], float [[TMP72]], i64 1
				; FVW2-NEXT: br label [[PRED_LOAD_CONTINUE20]]
				; FVW2: pred.load.continue20:
				; FVW2-NEXT: [[TMP74:%.*]] = phi <2 x float> [ [[TMP69]], [[PRED_LOAD_CONTINUE18]] ], [ [[TMP73]], [[PRED_LOAD_IF19]] ]
				; FVW2-NEXT: [[TMP75:%.*]] = fadd <2 x float> [[TMP44]], <float 5.000000e-01, float 5.000000e-01>
				; FVW2-NEXT: [[TMP76:%.*]] = fadd <2 x float> [[TMP54]], <float 5.000000e-01, float 5.000000e-01>
				; FVW2-NEXT: [[TMP77:%.*]] = fadd <2 x float> [[TMP64]], <float 5.000000e-01, float 5.000000e-01>
				; FVW2-NEXT: [[TMP78:%.*]] = fadd <2 x float> [[TMP74]], <float 5.000000e-01, float 5.000000e-01>
				; FVW2-NEXT: [[TMP79:%.*]] = extractelement <2 x i1> [[TMP31]], i64 0
				; FVW2-NEXT: br i1 [[TMP79]], label [[PRED_STORE_IF:%.]], label [[PRED_STORE_CONTINUE:%.]]
	; FVW2: pred.store.if:			; FVW2: pred.store.if:
	; FVW2-NEXT: [[TMP11:%.]] = getelementptr inbounds [[STRUCT_OUT:%.]], %struct.Out* [[OUT:%.*]], i64 [[OFFSET_IDX]], i32 1			; FVW2-NEXT: [[TMP80:%.]] = getelementptr inbounds [[STRUCT_OUT:%.]], %struct.Out* [[OUT:%.*]], i64 [[OFFSET_IDX]], i32 1
	; FVW2-NEXT: [[TMP12:%.*]] = extractelement <2 x float> [[TMP9]], i64 0			; FVW2-NEXT: [[TMP81:%.*]] = extractelement <2 x float> [[TMP75]], i64 0
	; FVW2-NEXT: store float [[TMP12]], float* [[TMP11]], align 4			; FVW2-NEXT: store float [[TMP81]], float* [[TMP80]], align 4
	; FVW2-NEXT: br label [[PRED_STORE_CONTINUE]]			; FVW2-NEXT: br label [[PRED_STORE_CONTINUE]]
	; FVW2: pred.store.continue:			; FVW2: pred.store.continue:
	; FVW2-NEXT: [[TMP13:%.*]] = extractelement <2 x i1> [[TMP7]], i64 1			; FVW2-NEXT: [[TMP82:%.*]] = extractelement <2 x i1> [[TMP31]], i64 1
	; FVW2-NEXT: br i1 [[TMP13]], label [[PRED_STORE_IF7:%.*]], label [[PRED_STORE_CONTINUE8]]			; FVW2-NEXT: br i1 [[TMP82]], label [[PRED_STORE_IF21:%.]], label [[PRED_STORE_CONTINUE22:%.]]
	; FVW2: pred.store.if7:			; FVW2: pred.store.if21:
	; FVW2-NEXT: [[TMP14:%.]] = getelementptr inbounds [[STRUCT_OUT]], %struct.Out [[OUT]], i64 [[TMP0]], i32 1			; FVW2-NEXT: [[TMP83:%.]] = getelementptr inbounds [[STRUCT_OUT]], %struct.Out [[OUT]], i64 [[TMP0]], i32 1
	; FVW2-NEXT: [[TMP15:%.*]] = extractelement <2 x float> [[TMP9]], i64 1			; FVW2-NEXT: [[TMP84:%.*]] = extractelement <2 x float> [[TMP75]], i64 1
	; FVW2-NEXT: store float [[TMP15]], float* [[TMP14]], align 4			; FVW2-NEXT: store float [[TMP84]], float* [[TMP83]], align 4
	; FVW2-NEXT: br label [[PRED_STORE_CONTINUE8]]			; FVW2-NEXT: br label [[PRED_STORE_CONTINUE22]]
	; FVW2: pred.store.continue8:			; FVW2: pred.store.continue22:
	; FVW2-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2			; FVW2-NEXT: [[TMP85:%.*]] = extractelement <2 x i1> [[TMP32]], i64 0
	; FVW2-NEXT: [[VEC_IND_NEXT]] = add <2 x i64> [[VEC_IND]], <i64 32, i64 32>			; FVW2-NEXT: br i1 [[TMP85]], label [[PRED_STORE_IF23:%.]], label [[PRED_STORE_CONTINUE24:%.]]
	; FVW2-NEXT: [[TMP16:%.*]] = icmp eq i64 [[INDEX_NEXT]], 256			; FVW2: pred.store.if23:
	; FVW2-NEXT: br i1 [[TMP16]], label [[FOR_END:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]			; FVW2-NEXT: [[TMP86:%.]] = getelementptr inbounds [[STRUCT_OUT]], %struct.Out [[OUT]], i64 [[TMP1]], i32 1
				; FVW2-NEXT: [[TMP87:%.*]] = extractelement <2 x float> [[TMP76]], i64 0
				; FVW2-NEXT: store float [[TMP87]], float* [[TMP86]], align 4
				; FVW2-NEXT: br label [[PRED_STORE_CONTINUE24]]
				; FVW2: pred.store.continue24:
				; FVW2-NEXT: [[TMP88:%.*]] = extractelement <2 x i1> [[TMP32]], i64 1
				; FVW2-NEXT: br i1 [[TMP88]], label [[PRED_STORE_IF25:%.]], label [[PRED_STORE_CONTINUE26:%.]]
				; FVW2: pred.store.if25:
				; FVW2-NEXT: [[TMP89:%.]] = getelementptr inbounds [[STRUCT_OUT]], %struct.Out [[OUT]], i64 [[TMP2]], i32 1
				; FVW2-NEXT: [[TMP90:%.*]] = extractelement <2 x float> [[TMP76]], i64 1
				; FVW2-NEXT: store float [[TMP90]], float* [[TMP89]], align 4
				; FVW2-NEXT: br label [[PRED_STORE_CONTINUE26]]
				; FVW2: pred.store.continue26:
				; FVW2-NEXT: [[TMP91:%.*]] = extractelement <2 x i1> [[TMP33]], i64 0
				; FVW2-NEXT: br i1 [[TMP91]], label [[PRED_STORE_IF27:%.]], label [[PRED_STORE_CONTINUE28:%.]]
				; FVW2: pred.store.if27:
				; FVW2-NEXT: [[TMP92:%.]] = getelementptr inbounds [[STRUCT_OUT]], %struct.Out [[OUT]], i64 [[TMP3]], i32 1
				; FVW2-NEXT: [[TMP93:%.*]] = extractelement <2 x float> [[TMP77]], i64 0
				; FVW2-NEXT: store float [[TMP93]], float* [[TMP92]], align 4
				; FVW2-NEXT: br label [[PRED_STORE_CONTINUE28]]
				; FVW2: pred.store.continue28:
				; FVW2-NEXT: [[TMP94:%.*]] = extractelement <2 x i1> [[TMP33]], i64 1
				; FVW2-NEXT: br i1 [[TMP94]], label [[PRED_STORE_IF29:%.]], label [[PRED_STORE_CONTINUE30:%.]]
				; FVW2: pred.store.if29:
				; FVW2-NEXT: [[TMP95:%.]] = getelementptr inbounds [[STRUCT_OUT]], %struct.Out [[OUT]], i64 [[TMP4]], i32 1
				; FVW2-NEXT: [[TMP96:%.*]] = extractelement <2 x float> [[TMP77]], i64 1
				; FVW2-NEXT: store float [[TMP96]], float* [[TMP95]], align 4
				; FVW2-NEXT: br label [[PRED_STORE_CONTINUE30]]
				; FVW2: pred.store.continue30:
				; FVW2-NEXT: [[TMP97:%.*]] = extractelement <2 x i1> [[TMP34]], i64 0
				; FVW2-NEXT: br i1 [[TMP97]], label [[PRED_STORE_IF31:%.]], label [[PRED_STORE_CONTINUE32:%.]]
				; FVW2: pred.store.if31:
				; FVW2-NEXT: [[TMP98:%.]] = getelementptr inbounds [[STRUCT_OUT]], %struct.Out [[OUT]], i64 [[TMP5]], i32 1
				; FVW2-NEXT: [[TMP99:%.*]] = extractelement <2 x float> [[TMP78]], i64 0
				; FVW2-NEXT: store float [[TMP99]], float* [[TMP98]], align 4
				; FVW2-NEXT: br label [[PRED_STORE_CONTINUE32]]
				; FVW2: pred.store.continue32:
				; FVW2-NEXT: [[TMP100:%.*]] = extractelement <2 x i1> [[TMP34]], i64 1
				; FVW2-NEXT: br i1 [[TMP100]], label [[PRED_STORE_IF33:%.*]], label [[PRED_STORE_CONTINUE34]]
				; FVW2: pred.store.if33:
				; FVW2-NEXT: [[TMP101:%.]] = getelementptr inbounds [[STRUCT_OUT]], %struct.Out [[OUT]], i64 [[TMP6]], i32 1
				; FVW2-NEXT: [[TMP102:%.*]] = extractelement <2 x float> [[TMP78]], i64 1
				; FVW2-NEXT: store float [[TMP102]], float* [[TMP101]], align 4
				; FVW2-NEXT: br label [[PRED_STORE_CONTINUE34]]
				; FVW2: pred.store.continue34:
				; FVW2-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8
				; FVW2-NEXT: [[TMP103:%.*]] = icmp eq i64 [[INDEX_NEXT]], 256
				; FVW2-NEXT: br i1 [[TMP103]], label [[FOR_END:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
	; FVW2: for.end:			; FVW2: for.end:
	; FVW2-NEXT: ret void			; FVW2-NEXT: ret void
	;			;
	entry:			entry:
	%in.addr = alloca %struct.In*, align 8			%in.addr = alloca %struct.In*, align 8
	%out.addr = alloca %struct.Out*, align 8			%out.addr = alloca %struct.Out*, align 8
	%trigger.addr = alloca i32*, align 8			%trigger.addr = alloca i32*, align 8
	%i = alloca i32, align 4			%i = alloca i32, align 4
	▲ Show 20 Lines • Show All 181 Lines • ▼ Show 20 Lines
	; AVX512-NEXT: [[TMP79:%.]] = getelementptr inbounds float, float addrspace(1) [[OUT]], <16 x i64> <i64 3840, i64 3856, i64 3872, i64 3888, i64 3904, i64 3920, i64 3936, i64 3952, i64 3968, i64 3984, i64 4000, i64 4016, i64 4032, i64 4048, i64 4064, i64 4080>			; AVX512-NEXT: [[TMP79:%.]] = getelementptr inbounds float, float addrspace(1) [[OUT]], <16 x i64> <i64 3840, i64 3856, i64 3872, i64 3888, i64 3904, i64 3920, i64 3936, i64 3952, i64 3968, i64 3984, i64 4000, i64 4016, i64 4032, i64 4048, i64 4064, i64 4080>
	; AVX512-NEXT: call void @llvm.masked.scatter.v16f32.v16p1f32(<16 x float> [[TMP78]], <16 x float addrspace(1)*> [[TMP79]], i32 4, <16 x i1> [[TMP76]])			; AVX512-NEXT: call void @llvm.masked.scatter.v16f32.v16p1f32(<16 x float> [[TMP78]], <16 x float addrspace(1)*> [[TMP79]], i32 4, <16 x i1> [[TMP76]])
	; AVX512-NEXT: ret void			; AVX512-NEXT: ret void
	;			;
	; FVW2-LABEL: @foo2_addrspace(			; FVW2-LABEL: @foo2_addrspace(
	; FVW2-NEXT: entry:			; FVW2-NEXT: entry:
	; FVW2-NEXT: br label [[VECTOR_BODY:%.*]]			; FVW2-NEXT: br label [[VECTOR_BODY:%.*]]
	; FVW2: vector.body:			; FVW2: vector.body:
	; FVW2-NEXT: [[INDEX10:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDEX_NEXT:%.]], [[PRED_STORE_CONTINUE9:%.]] ]			; FVW2-NEXT: [[INDEX7:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDEX_NEXT:%.]], [[PRED_STORE_CONTINUE35:%.]] ]
	; FVW2-NEXT: [[VEC_IND:%.]] = phi <2 x i64> [ <i64 0, i64 16>, [[ENTRY]] ], [ [[VEC_IND_NEXT:%.]], [[PRED_STORE_CONTINUE9]] ]			; FVW2-NEXT: [[OFFSET_IDX:%.*]] = shl i64 [[INDEX7]], 4
	; FVW2-NEXT: [[OFFSET_IDX:%.*]] = shl i64 [[INDEX10]], 4
	; FVW2-NEXT: [[TMP0:%.*]] = or i64 [[OFFSET_IDX]], 16			; FVW2-NEXT: [[TMP0:%.*]] = or i64 [[OFFSET_IDX]], 16
	; FVW2-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, i32 [[TRIGGER:%.*]], i64 [[OFFSET_IDX]]			; FVW2-NEXT: [[TMP1:%.*]] = or i64 [[OFFSET_IDX]], 32
	; FVW2-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP0]]			; FVW2-NEXT: [[TMP2:%.*]] = or i64 [[OFFSET_IDX]], 48
	; FVW2-NEXT: [[TMP3:%.]] = load i32, i32 [[TMP1]], align 4			; FVW2-NEXT: [[TMP3:%.*]] = or i64 [[OFFSET_IDX]], 64
	; FVW2-NEXT: [[TMP4:%.]] = load i32, i32 [[TMP2]], align 4			; FVW2-NEXT: [[TMP4:%.*]] = or i64 [[OFFSET_IDX]], 80
	; FVW2-NEXT: [[TMP5:%.*]] = insertelement <2 x i32> poison, i32 [[TMP3]], i64 0			; FVW2-NEXT: [[TMP5:%.*]] = or i64 [[OFFSET_IDX]], 96
	; FVW2-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> [[TMP5]], i32 [[TMP4]], i64 1			; FVW2-NEXT: [[TMP6:%.*]] = or i64 [[OFFSET_IDX]], 112
	; FVW2-NEXT: [[TMP7:%.*]] = icmp sgt <2 x i32> [[TMP6]], zeroinitializer			; FVW2-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 [[TRIGGER:%.*]], i64 [[OFFSET_IDX]]
	; FVW2-NEXT: [[TMP8:%.]] = getelementptr inbounds [[STRUCT_IN:%.]], [[STRUCT_IN]] addrspace(1)* [[IN:%.*]], <2 x i64> [[VEC_IND]], i32 1			; FVW2-NEXT: [[TMP8:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP0]]
	; FVW2-NEXT: [[WIDE_MASKED_GATHER:%.]] = call <2 x float> @llvm.masked.gather.v2f32.v2p1f32(<2 x float addrspace(1)> [[TMP8]], i32 4, <2 x i1> [[TMP7]], <2 x float> undef)			; FVW2-NEXT: [[TMP9:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP1]]
	; FVW2-NEXT: [[TMP9:%.*]] = fadd <2 x float> [[WIDE_MASKED_GATHER]], <float 5.000000e-01, float 5.000000e-01>			; FVW2-NEXT: [[TMP10:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP2]]
	; FVW2-NEXT: [[TMP10:%.*]] = extractelement <2 x i1> [[TMP7]], i64 0			; FVW2-NEXT: [[TMP11:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP3]]
	; FVW2-NEXT: br i1 [[TMP10]], label [[PRED_STORE_IF:%.]], label [[PRED_STORE_CONTINUE:%.]]			; FVW2-NEXT: [[TMP12:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP4]]
				; FVW2-NEXT: [[TMP13:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP5]]
				; FVW2-NEXT: [[TMP14:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP6]]
				; FVW2-NEXT: [[TMP15:%.]] = load i32, i32 [[TMP7]], align 4
				; FVW2-NEXT: [[TMP16:%.]] = load i32, i32 [[TMP8]], align 4
				; FVW2-NEXT: [[TMP17:%.*]] = insertelement <2 x i32> poison, i32 [[TMP15]], i64 0
				; FVW2-NEXT: [[TMP18:%.*]] = insertelement <2 x i32> [[TMP17]], i32 [[TMP16]], i64 1
				; FVW2-NEXT: [[TMP19:%.]] = load i32, i32 [[TMP9]], align 4
				; FVW2-NEXT: [[TMP20:%.]] = load i32, i32 [[TMP10]], align 4
				; FVW2-NEXT: [[TMP21:%.*]] = insertelement <2 x i32> poison, i32 [[TMP19]], i64 0
				; FVW2-NEXT: [[TMP22:%.*]] = insertelement <2 x i32> [[TMP21]], i32 [[TMP20]], i64 1
				; FVW2-NEXT: [[TMP23:%.]] = load i32, i32 [[TMP11]], align 4
				; FVW2-NEXT: [[TMP24:%.]] = load i32, i32 [[TMP12]], align 4
				; FVW2-NEXT: [[TMP25:%.*]] = insertelement <2 x i32> poison, i32 [[TMP23]], i64 0
				; FVW2-NEXT: [[TMP26:%.*]] = insertelement <2 x i32> [[TMP25]], i32 [[TMP24]], i64 1
				; FVW2-NEXT: [[TMP27:%.]] = load i32, i32 [[TMP13]], align 4
				; FVW2-NEXT: [[TMP28:%.]] = load i32, i32 [[TMP14]], align 4
				; FVW2-NEXT: [[TMP29:%.*]] = insertelement <2 x i32> poison, i32 [[TMP27]], i64 0
				; FVW2-NEXT: [[TMP30:%.*]] = insertelement <2 x i32> [[TMP29]], i32 [[TMP28]], i64 1
				; FVW2-NEXT: [[TMP31:%.*]] = icmp sgt <2 x i32> [[TMP18]], zeroinitializer
				; FVW2-NEXT: [[TMP32:%.*]] = icmp sgt <2 x i32> [[TMP22]], zeroinitializer
				; FVW2-NEXT: [[TMP33:%.*]] = icmp sgt <2 x i32> [[TMP26]], zeroinitializer
				; FVW2-NEXT: [[TMP34:%.*]] = icmp sgt <2 x i32> [[TMP30]], zeroinitializer
				; FVW2-NEXT: [[TMP35:%.*]] = extractelement <2 x i1> [[TMP31]], i64 0
				; FVW2-NEXT: br i1 [[TMP35]], label [[PRED_LOAD_IF:%.]], label [[PRED_LOAD_CONTINUE:%.]]
				; FVW2: pred.load.if:
				; FVW2-NEXT: [[TMP36:%.]] = getelementptr inbounds [[STRUCT_IN:%.]], [[STRUCT_IN]] addrspace(1)* [[IN:%.*]], i64 [[OFFSET_IDX]], i32 1
				; FVW2-NEXT: [[TMP37:%.]] = load float, float addrspace(1) [[TMP36]], align 4
				; FVW2-NEXT: [[TMP38:%.*]] = insertelement <2 x float> poison, float [[TMP37]], i64 0
				; FVW2-NEXT: br label [[PRED_LOAD_CONTINUE]]
				; FVW2: pred.load.continue:
				; FVW2-NEXT: [[TMP39:%.*]] = phi <2 x float> [ poison, [[VECTOR_BODY]] ], [ [[TMP38]], [[PRED_LOAD_IF]] ]
				; FVW2-NEXT: [[TMP40:%.*]] = extractelement <2 x i1> [[TMP31]], i64 1
				; FVW2-NEXT: br i1 [[TMP40]], label [[PRED_LOAD_IF8:%.]], label [[PRED_LOAD_CONTINUE9:%.]]
				; FVW2: pred.load.if8:
				; FVW2-NEXT: [[TMP41:%.]] = getelementptr inbounds [[STRUCT_IN]], [[STRUCT_IN]] addrspace(1) [[IN]], i64 [[TMP0]], i32 1
				; FVW2-NEXT: [[TMP42:%.]] = load float, float addrspace(1) [[TMP41]], align 4
				; FVW2-NEXT: [[TMP43:%.*]] = insertelement <2 x float> [[TMP39]], float [[TMP42]], i64 1
				; FVW2-NEXT: br label [[PRED_LOAD_CONTINUE9]]
				; FVW2: pred.load.continue9:
				; FVW2-NEXT: [[TMP44:%.*]] = phi <2 x float> [ [[TMP39]], [[PRED_LOAD_CONTINUE]] ], [ [[TMP43]], [[PRED_LOAD_IF8]] ]
				; FVW2-NEXT: [[TMP45:%.*]] = extractelement <2 x i1> [[TMP32]], i64 0
				; FVW2-NEXT: br i1 [[TMP45]], label [[PRED_LOAD_IF10:%.]], label [[PRED_LOAD_CONTINUE11:%.]]
				; FVW2: pred.load.if10:
				; FVW2-NEXT: [[TMP46:%.]] = getelementptr inbounds [[STRUCT_IN]], [[STRUCT_IN]] addrspace(1) [[IN]], i64 [[TMP1]], i32 1
				; FVW2-NEXT: [[TMP47:%.]] = load float, float addrspace(1) [[TMP46]], align 4
				; FVW2-NEXT: [[TMP48:%.*]] = insertelement <2 x float> poison, float [[TMP47]], i64 0
				; FVW2-NEXT: br label [[PRED_LOAD_CONTINUE11]]
				; FVW2: pred.load.continue11:
				; FVW2-NEXT: [[TMP49:%.*]] = phi <2 x float> [ poison, [[PRED_LOAD_CONTINUE9]] ], [ [[TMP48]], [[PRED_LOAD_IF10]] ]
				; FVW2-NEXT: [[TMP50:%.*]] = extractelement <2 x i1> [[TMP32]], i64 1
				; FVW2-NEXT: br i1 [[TMP50]], label [[PRED_LOAD_IF12:%.]], label [[PRED_LOAD_CONTINUE13:%.]]
				; FVW2: pred.load.if12:
				; FVW2-NEXT: [[TMP51:%.]] = getelementptr inbounds [[STRUCT_IN]], [[STRUCT_IN]] addrspace(1) [[IN]], i64 [[TMP2]], i32 1
				; FVW2-NEXT: [[TMP52:%.]] = load float, float addrspace(1) [[TMP51]], align 4
				; FVW2-NEXT: [[TMP53:%.*]] = insertelement <2 x float> [[TMP49]], float [[TMP52]], i64 1
				; FVW2-NEXT: br label [[PRED_LOAD_CONTINUE13]]
				; FVW2: pred.load.continue13:
				; FVW2-NEXT: [[TMP54:%.*]] = phi <2 x float> [ [[TMP49]], [[PRED_LOAD_CONTINUE11]] ], [ [[TMP53]], [[PRED_LOAD_IF12]] ]
				; FVW2-NEXT: [[TMP55:%.*]] = extractelement <2 x i1> [[TMP33]], i64 0
				; FVW2-NEXT: br i1 [[TMP55]], label [[PRED_LOAD_IF14:%.]], label [[PRED_LOAD_CONTINUE15:%.]]
				; FVW2: pred.load.if14:
				; FVW2-NEXT: [[TMP56:%.]] = getelementptr inbounds [[STRUCT_IN]], [[STRUCT_IN]] addrspace(1) [[IN]], i64 [[TMP3]], i32 1
				; FVW2-NEXT: [[TMP57:%.]] = load float, float addrspace(1) [[TMP56]], align 4
				; FVW2-NEXT: [[TMP58:%.*]] = insertelement <2 x float> poison, float [[TMP57]], i64 0
				; FVW2-NEXT: br label [[PRED_LOAD_CONTINUE15]]
				; FVW2: pred.load.continue15:
				; FVW2-NEXT: [[TMP59:%.*]] = phi <2 x float> [ poison, [[PRED_LOAD_CONTINUE13]] ], [ [[TMP58]], [[PRED_LOAD_IF14]] ]
				; FVW2-NEXT: [[TMP60:%.*]] = extractelement <2 x i1> [[TMP33]], i64 1
				; FVW2-NEXT: br i1 [[TMP60]], label [[PRED_LOAD_IF16:%.]], label [[PRED_LOAD_CONTINUE17:%.]]
				; FVW2: pred.load.if16:
				; FVW2-NEXT: [[TMP61:%.]] = getelementptr inbounds [[STRUCT_IN]], [[STRUCT_IN]] addrspace(1) [[IN]], i64 [[TMP4]], i32 1
				; FVW2-NEXT: [[TMP62:%.]] = load float, float addrspace(1) [[TMP61]], align 4
				; FVW2-NEXT: [[TMP63:%.*]] = insertelement <2 x float> [[TMP59]], float [[TMP62]], i64 1
				; FVW2-NEXT: br label [[PRED_LOAD_CONTINUE17]]
				; FVW2: pred.load.continue17:
				; FVW2-NEXT: [[TMP64:%.*]] = phi <2 x float> [ [[TMP59]], [[PRED_LOAD_CONTINUE15]] ], [ [[TMP63]], [[PRED_LOAD_IF16]] ]
				; FVW2-NEXT: [[TMP65:%.*]] = extractelement <2 x i1> [[TMP34]], i64 0
				; FVW2-NEXT: br i1 [[TMP65]], label [[PRED_LOAD_IF18:%.]], label [[PRED_LOAD_CONTINUE19:%.]]
				; FVW2: pred.load.if18:
				; FVW2-NEXT: [[TMP66:%.]] = getelementptr inbounds [[STRUCT_IN]], [[STRUCT_IN]] addrspace(1) [[IN]], i64 [[TMP5]], i32 1
				; FVW2-NEXT: [[TMP67:%.]] = load float, float addrspace(1) [[TMP66]], align 4
				; FVW2-NEXT: [[TMP68:%.*]] = insertelement <2 x float> poison, float [[TMP67]], i64 0
				; FVW2-NEXT: br label [[PRED_LOAD_CONTINUE19]]
				; FVW2: pred.load.continue19:
				; FVW2-NEXT: [[TMP69:%.*]] = phi <2 x float> [ poison, [[PRED_LOAD_CONTINUE17]] ], [ [[TMP68]], [[PRED_LOAD_IF18]] ]
				; FVW2-NEXT: [[TMP70:%.*]] = extractelement <2 x i1> [[TMP34]], i64 1
				; FVW2-NEXT: br i1 [[TMP70]], label [[PRED_LOAD_IF20:%.]], label [[PRED_LOAD_CONTINUE21:%.]]
				; FVW2: pred.load.if20:
				; FVW2-NEXT: [[TMP71:%.]] = getelementptr inbounds [[STRUCT_IN]], [[STRUCT_IN]] addrspace(1) [[IN]], i64 [[TMP6]], i32 1
				; FVW2-NEXT: [[TMP72:%.]] = load float, float addrspace(1) [[TMP71]], align 4
				; FVW2-NEXT: [[TMP73:%.*]] = insertelement <2 x float> [[TMP69]], float [[TMP72]], i64 1
				; FVW2-NEXT: br label [[PRED_LOAD_CONTINUE21]]
				; FVW2: pred.load.continue21:
				; FVW2-NEXT: [[TMP74:%.*]] = phi <2 x float> [ [[TMP69]], [[PRED_LOAD_CONTINUE19]] ], [ [[TMP73]], [[PRED_LOAD_IF20]] ]
				; FVW2-NEXT: [[TMP75:%.*]] = fadd <2 x float> [[TMP44]], <float 5.000000e-01, float 5.000000e-01>
				; FVW2-NEXT: [[TMP76:%.*]] = fadd <2 x float> [[TMP54]], <float 5.000000e-01, float 5.000000e-01>
				; FVW2-NEXT: [[TMP77:%.*]] = fadd <2 x float> [[TMP64]], <float 5.000000e-01, float 5.000000e-01>
				; FVW2-NEXT: [[TMP78:%.*]] = fadd <2 x float> [[TMP74]], <float 5.000000e-01, float 5.000000e-01>
				; FVW2-NEXT: [[TMP79:%.*]] = extractelement <2 x i1> [[TMP31]], i64 0
				; FVW2-NEXT: br i1 [[TMP79]], label [[PRED_STORE_IF:%.]], label [[PRED_STORE_CONTINUE:%.]]
	; FVW2: pred.store.if:			; FVW2: pred.store.if:
	; FVW2-NEXT: [[TMP11:%.]] = getelementptr inbounds float, float addrspace(1) [[OUT:%.*]], i64 [[OFFSET_IDX]]			; FVW2-NEXT: [[TMP80:%.]] = getelementptr inbounds float, float addrspace(1) [[OUT:%.*]], i64 [[OFFSET_IDX]]
	; FVW2-NEXT: [[TMP12:%.*]] = extractelement <2 x float> [[TMP9]], i64 0			; FVW2-NEXT: [[TMP81:%.*]] = extractelement <2 x float> [[TMP75]], i64 0
	; FVW2-NEXT: store float [[TMP12]], float addrspace(1)* [[TMP11]], align 4			; FVW2-NEXT: store float [[TMP81]], float addrspace(1)* [[TMP80]], align 4
	; FVW2-NEXT: br label [[PRED_STORE_CONTINUE]]			; FVW2-NEXT: br label [[PRED_STORE_CONTINUE]]
	; FVW2: pred.store.continue:			; FVW2: pred.store.continue:
	; FVW2-NEXT: [[TMP13:%.*]] = extractelement <2 x i1> [[TMP7]], i64 1			; FVW2-NEXT: [[TMP82:%.*]] = extractelement <2 x i1> [[TMP31]], i64 1
	; FVW2-NEXT: br i1 [[TMP13]], label [[PRED_STORE_IF8:%.*]], label [[PRED_STORE_CONTINUE9]]			; FVW2-NEXT: br i1 [[TMP82]], label [[PRED_STORE_IF22:%.]], label [[PRED_STORE_CONTINUE23:%.]]
	; FVW2: pred.store.if8:			; FVW2: pred.store.if22:
	; FVW2-NEXT: [[TMP14:%.]] = getelementptr inbounds float, float addrspace(1) [[OUT]], i64 [[TMP0]]			; FVW2-NEXT: [[TMP83:%.]] = getelementptr inbounds float, float addrspace(1) [[OUT]], i64 [[TMP0]]
	; FVW2-NEXT: [[TMP15:%.*]] = extractelement <2 x float> [[TMP9]], i64 1			; FVW2-NEXT: [[TMP84:%.*]] = extractelement <2 x float> [[TMP75]], i64 1
	; FVW2-NEXT: store float [[TMP15]], float addrspace(1)* [[TMP14]], align 4			; FVW2-NEXT: store float [[TMP84]], float addrspace(1)* [[TMP83]], align 4
	; FVW2-NEXT: br label [[PRED_STORE_CONTINUE9]]			; FVW2-NEXT: br label [[PRED_STORE_CONTINUE23]]
	; FVW2: pred.store.continue9:			; FVW2: pred.store.continue23:
	; FVW2-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX10]], 2			; FVW2-NEXT: [[TMP85:%.*]] = extractelement <2 x i1> [[TMP32]], i64 0
	; FVW2-NEXT: [[VEC_IND_NEXT]] = add <2 x i64> [[VEC_IND]], <i64 32, i64 32>			; FVW2-NEXT: br i1 [[TMP85]], label [[PRED_STORE_IF24:%.]], label [[PRED_STORE_CONTINUE25:%.]]
	; FVW2-NEXT: [[TMP16:%.*]] = icmp eq i64 [[INDEX_NEXT]], 256			; FVW2: pred.store.if24:
	; FVW2-NEXT: br i1 [[TMP16]], label [[FOR_END:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]			; FVW2-NEXT: [[TMP86:%.]] = getelementptr inbounds float, float addrspace(1) [[OUT]], i64 [[TMP1]]
				; FVW2-NEXT: [[TMP87:%.*]] = extractelement <2 x float> [[TMP76]], i64 0
				; FVW2-NEXT: store float [[TMP87]], float addrspace(1)* [[TMP86]], align 4
				; FVW2-NEXT: br label [[PRED_STORE_CONTINUE25]]
				; FVW2: pred.store.continue25:
				; FVW2-NEXT: [[TMP88:%.*]] = extractelement <2 x i1> [[TMP32]], i64 1
				; FVW2-NEXT: br i1 [[TMP88]], label [[PRED_STORE_IF26:%.]], label [[PRED_STORE_CONTINUE27:%.]]
				; FVW2: pred.store.if26:
				; FVW2-NEXT: [[TMP89:%.]] = getelementptr inbounds float, float addrspace(1) [[OUT]], i64 [[TMP2]]
				; FVW2-NEXT: [[TMP90:%.*]] = extractelement <2 x float> [[TMP76]], i64 1
				; FVW2-NEXT: store float [[TMP90]], float addrspace(1)* [[TMP89]], align 4
				; FVW2-NEXT: br label [[PRED_STORE_CONTINUE27]]
				; FVW2: pred.store.continue27:
				; FVW2-NEXT: [[TMP91:%.*]] = extractelement <2 x i1> [[TMP33]], i64 0
				; FVW2-NEXT: br i1 [[TMP91]], label [[PRED_STORE_IF28:%.]], label [[PRED_STORE_CONTINUE29:%.]]
				; FVW2: pred.store.if28:
				; FVW2-NEXT: [[TMP92:%.]] = getelementptr inbounds float, float addrspace(1) [[OUT]], i64 [[TMP3]]
				; FVW2-NEXT: [[TMP93:%.*]] = extractelement <2 x float> [[TMP77]], i64 0
				; FVW2-NEXT: store float [[TMP93]], float addrspace(1)* [[TMP92]], align 4
				; FVW2-NEXT: br label [[PRED_STORE_CONTINUE29]]
				; FVW2: pred.store.continue29:
				; FVW2-NEXT: [[TMP94:%.*]] = extractelement <2 x i1> [[TMP33]], i64 1
				; FVW2-NEXT: br i1 [[TMP94]], label [[PRED_STORE_IF30:%.]], label [[PRED_STORE_CONTINUE31:%.]]
				; FVW2: pred.store.if30:
				; FVW2-NEXT: [[TMP95:%.]] = getelementptr inbounds float, float addrspace(1) [[OUT]], i64 [[TMP4]]
				; FVW2-NEXT: [[TMP96:%.*]] = extractelement <2 x float> [[TMP77]], i64 1
				; FVW2-NEXT: store float [[TMP96]], float addrspace(1)* [[TMP95]], align 4
				; FVW2-NEXT: br label [[PRED_STORE_CONTINUE31]]
				; FVW2: pred.store.continue31:
				; FVW2-NEXT: [[TMP97:%.*]] = extractelement <2 x i1> [[TMP34]], i64 0
				; FVW2-NEXT: br i1 [[TMP97]], label [[PRED_STORE_IF32:%.]], label [[PRED_STORE_CONTINUE33:%.]]
				; FVW2: pred.store.if32:
				; FVW2-NEXT: [[TMP98:%.]] = getelementptr inbounds float, float addrspace(1) [[OUT]], i64 [[TMP5]]
				; FVW2-NEXT: [[TMP99:%.*]] = extractelement <2 x float> [[TMP78]], i64 0
				; FVW2-NEXT: store float [[TMP99]], float addrspace(1)* [[TMP98]], align 4
				; FVW2-NEXT: br label [[PRED_STORE_CONTINUE33]]
				; FVW2: pred.store.continue33:
				; FVW2-NEXT: [[TMP100:%.*]] = extractelement <2 x i1> [[TMP34]], i64 1
				; FVW2-NEXT: br i1 [[TMP100]], label [[PRED_STORE_IF34:%.*]], label [[PRED_STORE_CONTINUE35]]
				; FVW2: pred.store.if34:
				; FVW2-NEXT: [[TMP101:%.]] = getelementptr inbounds float, float addrspace(1) [[OUT]], i64 [[TMP6]]
				; FVW2-NEXT: [[TMP102:%.*]] = extractelement <2 x float> [[TMP78]], i64 1
				; FVW2-NEXT: store float [[TMP102]], float addrspace(1)* [[TMP101]], align 4
				; FVW2-NEXT: br label [[PRED_STORE_CONTINUE35]]
				; FVW2: pred.store.continue35:
				; FVW2-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX7]], 8
				; FVW2-NEXT: [[TMP103:%.*]] = icmp eq i64 [[INDEX_NEXT]], 256
				; FVW2-NEXT: br i1 [[TMP103]], label [[FOR_END:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
	; FVW2: for.end:			; FVW2: for.end:
	; FVW2-NEXT: ret void			; FVW2-NEXT: ret void
	;			;
	entry:			entry:
	%in.addr = alloca %struct.In addrspace(1)*, align 8			%in.addr = alloca %struct.In addrspace(1)*, align 8
	%out.addr = alloca float addrspace(1)*, align 8			%out.addr = alloca float addrspace(1)*, align 8
	%trigger.addr = alloca i32*, align 8			%trigger.addr = alloca i32*, align 8
	%index.addr = alloca i32*, align 8			%index.addr = alloca i32*, align 8
	▲ Show 20 Lines • Show All 181 Lines • ▼ Show 20 Lines
	; AVX512-NEXT: [[TMP79:%.]] = getelementptr inbounds float, float [[OUT]], <16 x i64> <i64 3840, i64 3856, i64 3872, i64 3888, i64 3904, i64 3920, i64 3936, i64 3952, i64 3968, i64 3984, i64 4000, i64 4016, i64 4032, i64 4048, i64 4064, i64 4080>			; AVX512-NEXT: [[TMP79:%.]] = getelementptr inbounds float, float [[OUT]], <16 x i64> <i64 3840, i64 3856, i64 3872, i64 3888, i64 3904, i64 3920, i64 3936, i64 3952, i64 3968, i64 3984, i64 4000, i64 4016, i64 4032, i64 4048, i64 4064, i64 4080>
	; AVX512-NEXT: call void @llvm.masked.scatter.v16f32.v16p0f32(<16 x float> [[TMP78]], <16 x float*> [[TMP79]], i32 4, <16 x i1> [[TMP76]])			; AVX512-NEXT: call void @llvm.masked.scatter.v16f32.v16p0f32(<16 x float> [[TMP78]], <16 x float*> [[TMP79]], i32 4, <16 x i1> [[TMP76]])
	; AVX512-NEXT: ret void			; AVX512-NEXT: ret void
	;			;
	; FVW2-LABEL: @foo2_addrspace2(			; FVW2-LABEL: @foo2_addrspace2(
	; FVW2-NEXT: entry:			; FVW2-NEXT: entry:
	; FVW2-NEXT: br label [[VECTOR_BODY:%.*]]			; FVW2-NEXT: br label [[VECTOR_BODY:%.*]]
	; FVW2: vector.body:			; FVW2: vector.body:
	; FVW2-NEXT: [[INDEX10:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDEX_NEXT:%.]], [[PRED_STORE_CONTINUE9:%.]] ]			; FVW2-NEXT: [[INDEX7:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDEX_NEXT:%.]], [[PRED_STORE_CONTINUE35:%.]] ]
	; FVW2-NEXT: [[VEC_IND:%.]] = phi <2 x i64> [ <i64 0, i64 16>, [[ENTRY]] ], [ [[VEC_IND_NEXT:%.]], [[PRED_STORE_CONTINUE9]] ]			; FVW2-NEXT: [[OFFSET_IDX:%.*]] = shl i64 [[INDEX7]], 4
	; FVW2-NEXT: [[OFFSET_IDX:%.*]] = shl i64 [[INDEX10]], 4
	; FVW2-NEXT: [[TMP0:%.*]] = or i64 [[OFFSET_IDX]], 16			; FVW2-NEXT: [[TMP0:%.*]] = or i64 [[OFFSET_IDX]], 16
	; FVW2-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, i32 [[TRIGGER:%.*]], i64 [[OFFSET_IDX]]			; FVW2-NEXT: [[TMP1:%.*]] = or i64 [[OFFSET_IDX]], 32
	; FVW2-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP0]]			; FVW2-NEXT: [[TMP2:%.*]] = or i64 [[OFFSET_IDX]], 48
	; FVW2-NEXT: [[TMP3:%.]] = load i32, i32 [[TMP1]], align 4			; FVW2-NEXT: [[TMP3:%.*]] = or i64 [[OFFSET_IDX]], 64
	; FVW2-NEXT: [[TMP4:%.]] = load i32, i32 [[TMP2]], align 4			; FVW2-NEXT: [[TMP4:%.*]] = or i64 [[OFFSET_IDX]], 80
	; FVW2-NEXT: [[TMP5:%.*]] = insertelement <2 x i32> poison, i32 [[TMP3]], i64 0			; FVW2-NEXT: [[TMP5:%.*]] = or i64 [[OFFSET_IDX]], 96
	; FVW2-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> [[TMP5]], i32 [[TMP4]], i64 1			; FVW2-NEXT: [[TMP6:%.*]] = or i64 [[OFFSET_IDX]], 112
	; FVW2-NEXT: [[TMP7:%.*]] = icmp sgt <2 x i32> [[TMP6]], zeroinitializer			; FVW2-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 [[TRIGGER:%.*]], i64 [[OFFSET_IDX]]
	; FVW2-NEXT: [[TMP8:%.]] = getelementptr inbounds [[STRUCT_IN:%.]], [[STRUCT_IN]] addrspace(1)* [[IN:%.*]], <2 x i64> [[VEC_IND]], i32 1			; FVW2-NEXT: [[TMP8:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP0]]
	; FVW2-NEXT: [[WIDE_MASKED_GATHER:%.]] = call <2 x float> @llvm.masked.gather.v2f32.v2p1f32(<2 x float addrspace(1)> [[TMP8]], i32 4, <2 x i1> [[TMP7]], <2 x float> undef)			; FVW2-NEXT: [[TMP9:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP1]]
	; FVW2-NEXT: [[TMP9:%.*]] = fadd <2 x float> [[WIDE_MASKED_GATHER]], <float 5.000000e-01, float 5.000000e-01>			; FVW2-NEXT: [[TMP10:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP2]]
	; FVW2-NEXT: [[TMP10:%.*]] = extractelement <2 x i1> [[TMP7]], i64 0			; FVW2-NEXT: [[TMP11:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP3]]
	; FVW2-NEXT: br i1 [[TMP10]], label [[PRED_STORE_IF:%.]], label [[PRED_STORE_CONTINUE:%.]]			; FVW2-NEXT: [[TMP12:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP4]]
				; FVW2-NEXT: [[TMP13:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP5]]
				; FVW2-NEXT: [[TMP14:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP6]]
				; FVW2-NEXT: [[TMP15:%.]] = load i32, i32 [[TMP7]], align 4
				; FVW2-NEXT: [[TMP16:%.]] = load i32, i32 [[TMP8]], align 4
				; FVW2-NEXT: [[TMP17:%.*]] = insertelement <2 x i32> poison, i32 [[TMP15]], i64 0
				; FVW2-NEXT: [[TMP18:%.*]] = insertelement <2 x i32> [[TMP17]], i32 [[TMP16]], i64 1
				; FVW2-NEXT: [[TMP19:%.]] = load i32, i32 [[TMP9]], align 4
				; FVW2-NEXT: [[TMP20:%.]] = load i32, i32 [[TMP10]], align 4
				; FVW2-NEXT: [[TMP21:%.*]] = insertelement <2 x i32> poison, i32 [[TMP19]], i64 0
				; FVW2-NEXT: [[TMP22:%.*]] = insertelement <2 x i32> [[TMP21]], i32 [[TMP20]], i64 1
				; FVW2-NEXT: [[TMP23:%.]] = load i32, i32 [[TMP11]], align 4
				; FVW2-NEXT: [[TMP24:%.]] = load i32, i32 [[TMP12]], align 4
				; FVW2-NEXT: [[TMP25:%.*]] = insertelement <2 x i32> poison, i32 [[TMP23]], i64 0
				; FVW2-NEXT: [[TMP26:%.*]] = insertelement <2 x i32> [[TMP25]], i32 [[TMP24]], i64 1
				; FVW2-NEXT: [[TMP27:%.]] = load i32, i32 [[TMP13]], align 4
				; FVW2-NEXT: [[TMP28:%.]] = load i32, i32 [[TMP14]], align 4
				; FVW2-NEXT: [[TMP29:%.*]] = insertelement <2 x i32> poison, i32 [[TMP27]], i64 0
				; FVW2-NEXT: [[TMP30:%.*]] = insertelement <2 x i32> [[TMP29]], i32 [[TMP28]], i64 1
				; FVW2-NEXT: [[TMP31:%.*]] = icmp sgt <2 x i32> [[TMP18]], zeroinitializer
				; FVW2-NEXT: [[TMP32:%.*]] = icmp sgt <2 x i32> [[TMP22]], zeroinitializer
				; FVW2-NEXT: [[TMP33:%.*]] = icmp sgt <2 x i32> [[TMP26]], zeroinitializer
				; FVW2-NEXT: [[TMP34:%.*]] = icmp sgt <2 x i32> [[TMP30]], zeroinitializer
				; FVW2-NEXT: [[TMP35:%.*]] = extractelement <2 x i1> [[TMP31]], i64 0
				; FVW2-NEXT: br i1 [[TMP35]], label [[PRED_LOAD_IF:%.]], label [[PRED_LOAD_CONTINUE:%.]]
				; FVW2: pred.load.if:
				; FVW2-NEXT: [[TMP36:%.]] = getelementptr inbounds [[STRUCT_IN:%.]], [[STRUCT_IN]] addrspace(1)* [[IN:%.*]], i64 [[OFFSET_IDX]], i32 1
				; FVW2-NEXT: [[TMP37:%.]] = load float, float addrspace(1) [[TMP36]], align 4
				; FVW2-NEXT: [[TMP38:%.*]] = insertelement <2 x float> poison, float [[TMP37]], i64 0
				; FVW2-NEXT: br label [[PRED_LOAD_CONTINUE]]
				; FVW2: pred.load.continue:
				; FVW2-NEXT: [[TMP39:%.*]] = phi <2 x float> [ poison, [[VECTOR_BODY]] ], [ [[TMP38]], [[PRED_LOAD_IF]] ]
				; FVW2-NEXT: [[TMP40:%.*]] = extractelement <2 x i1> [[TMP31]], i64 1
				; FVW2-NEXT: br i1 [[TMP40]], label [[PRED_LOAD_IF8:%.]], label [[PRED_LOAD_CONTINUE9:%.]]
				; FVW2: pred.load.if8:
				; FVW2-NEXT: [[TMP41:%.]] = getelementptr inbounds [[STRUCT_IN]], [[STRUCT_IN]] addrspace(1) [[IN]], i64 [[TMP0]], i32 1
				; FVW2-NEXT: [[TMP42:%.]] = load float, float addrspace(1) [[TMP41]], align 4
				; FVW2-NEXT: [[TMP43:%.*]] = insertelement <2 x float> [[TMP39]], float [[TMP42]], i64 1
				; FVW2-NEXT: br label [[PRED_LOAD_CONTINUE9]]
				; FVW2: pred.load.continue9:
				; FVW2-NEXT: [[TMP44:%.*]] = phi <2 x float> [ [[TMP39]], [[PRED_LOAD_CONTINUE]] ], [ [[TMP43]], [[PRED_LOAD_IF8]] ]
				; FVW2-NEXT: [[TMP45:%.*]] = extractelement <2 x i1> [[TMP32]], i64 0
				; FVW2-NEXT: br i1 [[TMP45]], label [[PRED_LOAD_IF10:%.]], label [[PRED_LOAD_CONTINUE11:%.]]
				; FVW2: pred.load.if10:
				; FVW2-NEXT: [[TMP46:%.]] = getelementptr inbounds [[STRUCT_IN]], [[STRUCT_IN]] addrspace(1) [[IN]], i64 [[TMP1]], i32 1
				; FVW2-NEXT: [[TMP47:%.]] = load float, float addrspace(1) [[TMP46]], align 4
				; FVW2-NEXT: [[TMP48:%.*]] = insertelement <2 x float> poison, float [[TMP47]], i64 0
				; FVW2-NEXT: br label [[PRED_LOAD_CONTINUE11]]
				; FVW2: pred.load.continue11:
				; FVW2-NEXT: [[TMP49:%.*]] = phi <2 x float> [ poison, [[PRED_LOAD_CONTINUE9]] ], [ [[TMP48]], [[PRED_LOAD_IF10]] ]
				; FVW2-NEXT: [[TMP50:%.*]] = extractelement <2 x i1> [[TMP32]], i64 1
				; FVW2-NEXT: br i1 [[TMP50]], label [[PRED_LOAD_IF12:%.]], label [[PRED_LOAD_CONTINUE13:%.]]
				; FVW2: pred.load.if12:
				; FVW2-NEXT: [[TMP51:%.]] = getelementptr inbounds [[STRUCT_IN]], [[STRUCT_IN]] addrspace(1) [[IN]], i64 [[TMP2]], i32 1
				; FVW2-NEXT: [[TMP52:%.]] = load float, float addrspace(1) [[TMP51]], align 4
				; FVW2-NEXT: [[TMP53:%.*]] = insertelement <2 x float> [[TMP49]], float [[TMP52]], i64 1
				; FVW2-NEXT: br label [[PRED_LOAD_CONTINUE13]]
				; FVW2: pred.load.continue13:
				; FVW2-NEXT: [[TMP54:%.*]] = phi <2 x float> [ [[TMP49]], [[PRED_LOAD_CONTINUE11]] ], [ [[TMP53]], [[PRED_LOAD_IF12]] ]
				; FVW2-NEXT: [[TMP55:%.*]] = extractelement <2 x i1> [[TMP33]], i64 0
				; FVW2-NEXT: br i1 [[TMP55]], label [[PRED_LOAD_IF14:%.]], label [[PRED_LOAD_CONTINUE15:%.]]
				; FVW2: pred.load.if14:
				; FVW2-NEXT: [[TMP56:%.]] = getelementptr inbounds [[STRUCT_IN]], [[STRUCT_IN]] addrspace(1) [[IN]], i64 [[TMP3]], i32 1
				; FVW2-NEXT: [[TMP57:%.]] = load float, float addrspace(1) [[TMP56]], align 4
				; FVW2-NEXT: [[TMP58:%.*]] = insertelement <2 x float> poison, float [[TMP57]], i64 0
				; FVW2-NEXT: br label [[PRED_LOAD_CONTINUE15]]
				; FVW2: pred.load.continue15:
				; FVW2-NEXT: [[TMP59:%.*]] = phi <2 x float> [ poison, [[PRED_LOAD_CONTINUE13]] ], [ [[TMP58]], [[PRED_LOAD_IF14]] ]
				; FVW2-NEXT: [[TMP60:%.*]] = extractelement <2 x i1> [[TMP33]], i64 1
				; FVW2-NEXT: br i1 [[TMP60]], label [[PRED_LOAD_IF16:%.]], label [[PRED_LOAD_CONTINUE17:%.]]
				; FVW2: pred.load.if16:
				; FVW2-NEXT: [[TMP61:%.]] = getelementptr inbounds [[STRUCT_IN]], [[STRUCT_IN]] addrspace(1) [[IN]], i64 [[TMP4]], i32 1
				; FVW2-NEXT: [[TMP62:%.]] = load float, float addrspace(1) [[TMP61]], align 4
				; FVW2-NEXT: [[TMP63:%.*]] = insertelement <2 x float> [[TMP59]], float [[TMP62]], i64 1
				; FVW2-NEXT: br label [[PRED_LOAD_CONTINUE17]]
				; FVW2: pred.load.continue17:
				; FVW2-NEXT: [[TMP64:%.*]] = phi <2 x float> [ [[TMP59]], [[PRED_LOAD_CONTINUE15]] ], [ [[TMP63]], [[PRED_LOAD_IF16]] ]
				; FVW2-NEXT: [[TMP65:%.*]] = extractelement <2 x i1> [[TMP34]], i64 0
				; FVW2-NEXT: br i1 [[TMP65]], label [[PRED_LOAD_IF18:%.]], label [[PRED_LOAD_CONTINUE19:%.]]
				; FVW2: pred.load.if18:
				; FVW2-NEXT: [[TMP66:%.]] = getelementptr inbounds [[STRUCT_IN]], [[STRUCT_IN]] addrspace(1) [[IN]], i64 [[TMP5]], i32 1
				; FVW2-NEXT: [[TMP67:%.]] = load float, float addrspace(1) [[TMP66]], align 4
				; FVW2-NEXT: [[TMP68:%.*]] = insertelement <2 x float> poison, float [[TMP67]], i64 0
				; FVW2-NEXT: br label [[PRED_LOAD_CONTINUE19]]
				; FVW2: pred.load.continue19:
				; FVW2-NEXT: [[TMP69:%.*]] = phi <2 x float> [ poison, [[PRED_LOAD_CONTINUE17]] ], [ [[TMP68]], [[PRED_LOAD_IF18]] ]
				; FVW2-NEXT: [[TMP70:%.*]] = extractelement <2 x i1> [[TMP34]], i64 1
				; FVW2-NEXT: br i1 [[TMP70]], label [[PRED_LOAD_IF20:%.]], label [[PRED_LOAD_CONTINUE21:%.]]
				; FVW2: pred.load.if20:
				; FVW2-NEXT: [[TMP71:%.]] = getelementptr inbounds [[STRUCT_IN]], [[STRUCT_IN]] addrspace(1) [[IN]], i64 [[TMP6]], i32 1
				; FVW2-NEXT: [[TMP72:%.]] = load float, float addrspace(1) [[TMP71]], align 4
				; FVW2-NEXT: [[TMP73:%.*]] = insertelement <2 x float> [[TMP69]], float [[TMP72]], i64 1
				; FVW2-NEXT: br label [[PRED_LOAD_CONTINUE21]]
				; FVW2: pred.load.continue21:
				; FVW2-NEXT: [[TMP74:%.*]] = phi <2 x float> [ [[TMP69]], [[PRED_LOAD_CONTINUE19]] ], [ [[TMP73]], [[PRED_LOAD_IF20]] ]
				; FVW2-NEXT: [[TMP75:%.*]] = fadd <2 x float> [[TMP44]], <float 5.000000e-01, float 5.000000e-01>
				; FVW2-NEXT: [[TMP76:%.*]] = fadd <2 x float> [[TMP54]], <float 5.000000e-01, float 5.000000e-01>
				; FVW2-NEXT: [[TMP77:%.*]] = fadd <2 x float> [[TMP64]], <float 5.000000e-01, float 5.000000e-01>
				; FVW2-NEXT: [[TMP78:%.*]] = fadd <2 x float> [[TMP74]], <float 5.000000e-01, float 5.000000e-01>
				; FVW2-NEXT: [[TMP79:%.*]] = extractelement <2 x i1> [[TMP31]], i64 0
				; FVW2-NEXT: br i1 [[TMP79]], label [[PRED_STORE_IF:%.]], label [[PRED_STORE_CONTINUE:%.]]
	; FVW2: pred.store.if:			; FVW2: pred.store.if:
	; FVW2-NEXT: [[TMP11:%.]] = getelementptr inbounds float, float [[OUT:%.*]], i64 [[OFFSET_IDX]]			; FVW2-NEXT: [[TMP80:%.]] = getelementptr inbounds float, float [[OUT:%.*]], i64 [[OFFSET_IDX]]
	; FVW2-NEXT: [[TMP12:%.*]] = extractelement <2 x float> [[TMP9]], i64 0			; FVW2-NEXT: [[TMP81:%.*]] = extractelement <2 x float> [[TMP75]], i64 0
	; FVW2-NEXT: store float [[TMP12]], float* [[TMP11]], align 4			; FVW2-NEXT: store float [[TMP81]], float* [[TMP80]], align 4
	; FVW2-NEXT: br label [[PRED_STORE_CONTINUE]]			; FVW2-NEXT: br label [[PRED_STORE_CONTINUE]]
	; FVW2: pred.store.continue:			; FVW2: pred.store.continue:
	; FVW2-NEXT: [[TMP13:%.*]] = extractelement <2 x i1> [[TMP7]], i64 1			; FVW2-NEXT: [[TMP82:%.*]] = extractelement <2 x i1> [[TMP31]], i64 1
	; FVW2-NEXT: br i1 [[TMP13]], label [[PRED_STORE_IF8:%.*]], label [[PRED_STORE_CONTINUE9]]			; FVW2-NEXT: br i1 [[TMP82]], label [[PRED_STORE_IF22:%.]], label [[PRED_STORE_CONTINUE23:%.]]
	; FVW2: pred.store.if8:			; FVW2: pred.store.if22:
	; FVW2-NEXT: [[TMP14:%.]] = getelementptr inbounds float, float [[OUT]], i64 [[TMP0]]			; FVW2-NEXT: [[TMP83:%.]] = getelementptr inbounds float, float [[OUT]], i64 [[TMP0]]
	; FVW2-NEXT: [[TMP15:%.*]] = extractelement <2 x float> [[TMP9]], i64 1			; FVW2-NEXT: [[TMP84:%.*]] = extractelement <2 x float> [[TMP75]], i64 1
	; FVW2-NEXT: store float [[TMP15]], float* [[TMP14]], align 4			; FVW2-NEXT: store float [[TMP84]], float* [[TMP83]], align 4
	; FVW2-NEXT: br label [[PRED_STORE_CONTINUE9]]			; FVW2-NEXT: br label [[PRED_STORE_CONTINUE23]]
	; FVW2: pred.store.continue9:			; FVW2: pred.store.continue23:
	; FVW2-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX10]], 2			; FVW2-NEXT: [[TMP85:%.*]] = extractelement <2 x i1> [[TMP32]], i64 0
	; FVW2-NEXT: [[VEC_IND_NEXT]] = add <2 x i64> [[VEC_IND]], <i64 32, i64 32>			; FVW2-NEXT: br i1 [[TMP85]], label [[PRED_STORE_IF24:%.]], label [[PRED_STORE_CONTINUE25:%.]]
	; FVW2-NEXT: [[TMP16:%.*]] = icmp eq i64 [[INDEX_NEXT]], 256			; FVW2: pred.store.if24:
	; FVW2-NEXT: br i1 [[TMP16]], label [[FOR_END:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]			; FVW2-NEXT: [[TMP86:%.]] = getelementptr inbounds float, float [[OUT]], i64 [[TMP1]]
				; FVW2-NEXT: [[TMP87:%.*]] = extractelement <2 x float> [[TMP76]], i64 0
				; FVW2-NEXT: store float [[TMP87]], float* [[TMP86]], align 4
				; FVW2-NEXT: br label [[PRED_STORE_CONTINUE25]]
				; FVW2: pred.store.continue25:
				; FVW2-NEXT: [[TMP88:%.*]] = extractelement <2 x i1> [[TMP32]], i64 1
				; FVW2-NEXT: br i1 [[TMP88]], label [[PRED_STORE_IF26:%.]], label [[PRED_STORE_CONTINUE27:%.]]
				; FVW2: pred.store.if26:
				; FVW2-NEXT: [[TMP89:%.]] = getelementptr inbounds float, float [[OUT]], i64 [[TMP2]]
				; FVW2-NEXT: [[TMP90:%.*]] = extractelement <2 x float> [[TMP76]], i64 1
				; FVW2-NEXT: store float [[TMP90]], float* [[TMP89]], align 4
				; FVW2-NEXT: br label [[PRED_STORE_CONTINUE27]]
				; FVW2: pred.store.continue27:
				; FVW2-NEXT: [[TMP91:%.*]] = extractelement <2 x i1> [[TMP33]], i64 0
				; FVW2-NEXT: br i1 [[TMP91]], label [[PRED_STORE_IF28:%.]], label [[PRED_STORE_CONTINUE29:%.]]
				; FVW2: pred.store.if28:
				; FVW2-NEXT: [[TMP92:%.]] = getelementptr inbounds float, float [[OUT]], i64 [[TMP3]]
				; FVW2-NEXT: [[TMP93:%.*]] = extractelement <2 x float> [[TMP77]], i64 0
				; FVW2-NEXT: store float [[TMP93]], float* [[TMP92]], align 4
				; FVW2-NEXT: br label [[PRED_STORE_CONTINUE29]]
				; FVW2: pred.store.continue29:
				; FVW2-NEXT: [[TMP94:%.*]] = extractelement <2 x i1> [[TMP33]], i64 1
				; FVW2-NEXT: br i1 [[TMP94]], label [[PRED_STORE_IF30:%.]], label [[PRED_STORE_CONTINUE31:%.]]
				; FVW2: pred.store.if30:
				; FVW2-NEXT: [[TMP95:%.]] = getelementptr inbounds float, float [[OUT]], i64 [[TMP4]]
				; FVW2-NEXT: [[TMP96:%.*]] = extractelement <2 x float> [[TMP77]], i64 1
				; FVW2-NEXT: store float [[TMP96]], float* [[TMP95]], align 4
				; FVW2-NEXT: br label [[PRED_STORE_CONTINUE31]]
				; FVW2: pred.store.continue31:
				; FVW2-NEXT: [[TMP97:%.*]] = extractelement <2 x i1> [[TMP34]], i64 0
				; FVW2-NEXT: br i1 [[TMP97]], label [[PRED_STORE_IF32:%.]], label [[PRED_STORE_CONTINUE33:%.]]
				; FVW2: pred.store.if32:
				; FVW2-NEXT: [[TMP98:%.]] = getelementptr inbounds float, float [[OUT]], i64 [[TMP5]]
				; FVW2-NEXT: [[TMP99:%.*]] = extractelement <2 x float> [[TMP78]], i64 0
				; FVW2-NEXT: store float [[TMP99]], float* [[TMP98]], align 4
				; FVW2-NEXT: br label [[PRED_STORE_CONTINUE33]]
				; FVW2: pred.store.continue33:
				; FVW2-NEXT: [[TMP100:%.*]] = extractelement <2 x i1> [[TMP34]], i64 1
				; FVW2-NEXT: br i1 [[TMP100]], label [[PRED_STORE_IF34:%.*]], label [[PRED_STORE_CONTINUE35]]
				; FVW2: pred.store.if34:
				; FVW2-NEXT: [[TMP101:%.]] = getelementptr inbounds float, float [[OUT]], i64 [[TMP6]]
				; FVW2-NEXT: [[TMP102:%.*]] = extractelement <2 x float> [[TMP78]], i64 1
				; FVW2-NEXT: store float [[TMP102]], float* [[TMP101]], align 4
				; FVW2-NEXT: br label [[PRED_STORE_CONTINUE35]]
				; FVW2: pred.store.continue35:
				; FVW2-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX7]], 8
				; FVW2-NEXT: [[TMP103:%.*]] = icmp eq i64 [[INDEX_NEXT]], 256
				; FVW2-NEXT: br i1 [[TMP103]], label [[FOR_END:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
	; FVW2: for.end:			; FVW2: for.end:
	; FVW2-NEXT: ret void			; FVW2-NEXT: ret void
	;			;
	entry:			entry:
	%in.addr = alloca %struct.In addrspace(1)*, align 8			%in.addr = alloca %struct.In addrspace(1)*, align 8
	%out.addr = alloca float addrspace(0)*, align 8			%out.addr = alloca float addrspace(0)*, align 8
	%trigger.addr = alloca i32*, align 8			%trigger.addr = alloca i32*, align 8
	%index.addr = alloca i32*, align 8			%index.addr = alloca i32*, align 8
	▲ Show 20 Lines • Show All 181 Lines • ▼ Show 20 Lines
	; AVX512-NEXT: [[TMP79:%.]] = getelementptr inbounds float, float addrspace(1) [[OUT]], <16 x i64> <i64 3840, i64 3856, i64 3872, i64 3888, i64 3904, i64 3920, i64 3936, i64 3952, i64 3968, i64 3984, i64 4000, i64 4016, i64 4032, i64 4048, i64 4064, i64 4080>			; AVX512-NEXT: [[TMP79:%.]] = getelementptr inbounds float, float addrspace(1) [[OUT]], <16 x i64> <i64 3840, i64 3856, i64 3872, i64 3888, i64 3904, i64 3920, i64 3936, i64 3952, i64 3968, i64 3984, i64 4000, i64 4016, i64 4032, i64 4048, i64 4064, i64 4080>
	; AVX512-NEXT: call void @llvm.masked.scatter.v16f32.v16p1f32(<16 x float> [[TMP78]], <16 x float addrspace(1)*> [[TMP79]], i32 4, <16 x i1> [[TMP76]])			; AVX512-NEXT: call void @llvm.masked.scatter.v16f32.v16p1f32(<16 x float> [[TMP78]], <16 x float addrspace(1)*> [[TMP79]], i32 4, <16 x i1> [[TMP76]])
	; AVX512-NEXT: ret void			; AVX512-NEXT: ret void
	;			;
	; FVW2-LABEL: @foo2_addrspace3(			; FVW2-LABEL: @foo2_addrspace3(
	; FVW2-NEXT: entry:			; FVW2-NEXT: entry:
	; FVW2-NEXT: br label [[VECTOR_BODY:%.*]]			; FVW2-NEXT: br label [[VECTOR_BODY:%.*]]
	; FVW2: vector.body:			; FVW2: vector.body:
	; FVW2-NEXT: [[INDEX10:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDEX_NEXT:%.]], [[PRED_STORE_CONTINUE9:%.]] ]			; FVW2-NEXT: [[INDEX7:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDEX_NEXT:%.]], [[PRED_STORE_CONTINUE35:%.]] ]
	; FVW2-NEXT: [[VEC_IND:%.]] = phi <2 x i64> [ <i64 0, i64 16>, [[ENTRY]] ], [ [[VEC_IND_NEXT:%.]], [[PRED_STORE_CONTINUE9]] ]			; FVW2-NEXT: [[OFFSET_IDX:%.*]] = shl i64 [[INDEX7]], 4
	; FVW2-NEXT: [[OFFSET_IDX:%.*]] = shl i64 [[INDEX10]], 4
	; FVW2-NEXT: [[TMP0:%.*]] = or i64 [[OFFSET_IDX]], 16			; FVW2-NEXT: [[TMP0:%.*]] = or i64 [[OFFSET_IDX]], 16
	; FVW2-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, i32 [[TRIGGER:%.*]], i64 [[OFFSET_IDX]]			; FVW2-NEXT: [[TMP1:%.*]] = or i64 [[OFFSET_IDX]], 32
	; FVW2-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP0]]			; FVW2-NEXT: [[TMP2:%.*]] = or i64 [[OFFSET_IDX]], 48
	; FVW2-NEXT: [[TMP3:%.]] = load i32, i32 [[TMP1]], align 4			; FVW2-NEXT: [[TMP3:%.*]] = or i64 [[OFFSET_IDX]], 64
	; FVW2-NEXT: [[TMP4:%.]] = load i32, i32 [[TMP2]], align 4			; FVW2-NEXT: [[TMP4:%.*]] = or i64 [[OFFSET_IDX]], 80
	; FVW2-NEXT: [[TMP5:%.*]] = insertelement <2 x i32> poison, i32 [[TMP3]], i64 0			; FVW2-NEXT: [[TMP5:%.*]] = or i64 [[OFFSET_IDX]], 96
	; FVW2-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> [[TMP5]], i32 [[TMP4]], i64 1			; FVW2-NEXT: [[TMP6:%.*]] = or i64 [[OFFSET_IDX]], 112
	; FVW2-NEXT: [[TMP7:%.*]] = icmp sgt <2 x i32> [[TMP6]], zeroinitializer			; FVW2-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 [[TRIGGER:%.*]], i64 [[OFFSET_IDX]]
	; FVW2-NEXT: [[TMP8:%.]] = getelementptr inbounds [[STRUCT_IN:%.]], %struct.In* [[IN:%.*]], <2 x i64> [[VEC_IND]], i32 1			; FVW2-NEXT: [[TMP8:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP0]]
	; FVW2-NEXT: [[WIDE_MASKED_GATHER:%.]] = call <2 x float> @llvm.masked.gather.v2f32.v2p0f32(<2 x float> [[TMP8]], i32 4, <2 x i1> [[TMP7]], <2 x float> undef)			; FVW2-NEXT: [[TMP9:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP1]]
	; FVW2-NEXT: [[TMP9:%.*]] = fadd <2 x float> [[WIDE_MASKED_GATHER]], <float 5.000000e-01, float 5.000000e-01>			; FVW2-NEXT: [[TMP10:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP2]]
	; FVW2-NEXT: [[TMP10:%.*]] = extractelement <2 x i1> [[TMP7]], i64 0			; FVW2-NEXT: [[TMP11:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP3]]
	; FVW2-NEXT: br i1 [[TMP10]], label [[PRED_STORE_IF:%.]], label [[PRED_STORE_CONTINUE:%.]]			; FVW2-NEXT: [[TMP12:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP4]]
				; FVW2-NEXT: [[TMP13:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP5]]
				; FVW2-NEXT: [[TMP14:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP6]]
				; FVW2-NEXT: [[TMP15:%.]] = load i32, i32 [[TMP7]], align 4
				; FVW2-NEXT: [[TMP16:%.]] = load i32, i32 [[TMP8]], align 4
				; FVW2-NEXT: [[TMP17:%.*]] = insertelement <2 x i32> poison, i32 [[TMP15]], i64 0
				; FVW2-NEXT: [[TMP18:%.*]] = insertelement <2 x i32> [[TMP17]], i32 [[TMP16]], i64 1
				; FVW2-NEXT: [[TMP19:%.]] = load i32, i32 [[TMP9]], align 4
				; FVW2-NEXT: [[TMP20:%.]] = load i32, i32 [[TMP10]], align 4
				; FVW2-NEXT: [[TMP21:%.*]] = insertelement <2 x i32> poison, i32 [[TMP19]], i64 0
				; FVW2-NEXT: [[TMP22:%.*]] = insertelement <2 x i32> [[TMP21]], i32 [[TMP20]], i64 1
				; FVW2-NEXT: [[TMP23:%.]] = load i32, i32 [[TMP11]], align 4
				; FVW2-NEXT: [[TMP24:%.]] = load i32, i32 [[TMP12]], align 4
				; FVW2-NEXT: [[TMP25:%.*]] = insertelement <2 x i32> poison, i32 [[TMP23]], i64 0
				; FVW2-NEXT: [[TMP26:%.*]] = insertelement <2 x i32> [[TMP25]], i32 [[TMP24]], i64 1
				; FVW2-NEXT: [[TMP27:%.]] = load i32, i32 [[TMP13]], align 4
				; FVW2-NEXT: [[TMP28:%.]] = load i32, i32 [[TMP14]], align 4
				; FVW2-NEXT: [[TMP29:%.*]] = insertelement <2 x i32> poison, i32 [[TMP27]], i64 0
				; FVW2-NEXT: [[TMP30:%.*]] = insertelement <2 x i32> [[TMP29]], i32 [[TMP28]], i64 1
				; FVW2-NEXT: [[TMP31:%.*]] = icmp sgt <2 x i32> [[TMP18]], zeroinitializer
				; FVW2-NEXT: [[TMP32:%.*]] = icmp sgt <2 x i32> [[TMP22]], zeroinitializer
				; FVW2-NEXT: [[TMP33:%.*]] = icmp sgt <2 x i32> [[TMP26]], zeroinitializer
				; FVW2-NEXT: [[TMP34:%.*]] = icmp sgt <2 x i32> [[TMP30]], zeroinitializer
				; FVW2-NEXT: [[TMP35:%.*]] = extractelement <2 x i1> [[TMP31]], i64 0
				; FVW2-NEXT: br i1 [[TMP35]], label [[PRED_LOAD_IF:%.]], label [[PRED_LOAD_CONTINUE:%.]]
				; FVW2: pred.load.if:
				; FVW2-NEXT: [[TMP36:%.]] = getelementptr inbounds [[STRUCT_IN:%.]], %struct.In* [[IN:%.*]], i64 [[OFFSET_IDX]], i32 1
				; FVW2-NEXT: [[TMP37:%.]] = load float, float [[TMP36]], align 4
				; FVW2-NEXT: [[TMP38:%.*]] = insertelement <2 x float> poison, float [[TMP37]], i64 0
				; FVW2-NEXT: br label [[PRED_LOAD_CONTINUE]]
				; FVW2: pred.load.continue:
				; FVW2-NEXT: [[TMP39:%.*]] = phi <2 x float> [ poison, [[VECTOR_BODY]] ], [ [[TMP38]], [[PRED_LOAD_IF]] ]
				; FVW2-NEXT: [[TMP40:%.*]] = extractelement <2 x i1> [[TMP31]], i64 1
				; FVW2-NEXT: br i1 [[TMP40]], label [[PRED_LOAD_IF8:%.]], label [[PRED_LOAD_CONTINUE9:%.]]
				; FVW2: pred.load.if8:
				; FVW2-NEXT: [[TMP41:%.]] = getelementptr inbounds [[STRUCT_IN]], %struct.In [[IN]], i64 [[TMP0]], i32 1
				; FVW2-NEXT: [[TMP42:%.]] = load float, float [[TMP41]], align 4
				; FVW2-NEXT: [[TMP43:%.*]] = insertelement <2 x float> [[TMP39]], float [[TMP42]], i64 1
				; FVW2-NEXT: br label [[PRED_LOAD_CONTINUE9]]
				; FVW2: pred.load.continue9:
				; FVW2-NEXT: [[TMP44:%.*]] = phi <2 x float> [ [[TMP39]], [[PRED_LOAD_CONTINUE]] ], [ [[TMP43]], [[PRED_LOAD_IF8]] ]
				; FVW2-NEXT: [[TMP45:%.*]] = extractelement <2 x i1> [[TMP32]], i64 0
				; FVW2-NEXT: br i1 [[TMP45]], label [[PRED_LOAD_IF10:%.]], label [[PRED_LOAD_CONTINUE11:%.]]
				; FVW2: pred.load.if10:
				; FVW2-NEXT: [[TMP46:%.]] = getelementptr inbounds [[STRUCT_IN]], %struct.In [[IN]], i64 [[TMP1]], i32 1
				; FVW2-NEXT: [[TMP47:%.]] = load float, float [[TMP46]], align 4
				; FVW2-NEXT: [[TMP48:%.*]] = insertelement <2 x float> poison, float [[TMP47]], i64 0
				; FVW2-NEXT: br label [[PRED_LOAD_CONTINUE11]]
				; FVW2: pred.load.continue11:
				; FVW2-NEXT: [[TMP49:%.*]] = phi <2 x float> [ poison, [[PRED_LOAD_CONTINUE9]] ], [ [[TMP48]], [[PRED_LOAD_IF10]] ]
				; FVW2-NEXT: [[TMP50:%.*]] = extractelement <2 x i1> [[TMP32]], i64 1
				; FVW2-NEXT: br i1 [[TMP50]], label [[PRED_LOAD_IF12:%.]], label [[PRED_LOAD_CONTINUE13:%.]]
				; FVW2: pred.load.if12:
				; FVW2-NEXT: [[TMP51:%.]] = getelementptr inbounds [[STRUCT_IN]], %struct.In [[IN]], i64 [[TMP2]], i32 1
				; FVW2-NEXT: [[TMP52:%.]] = load float, float [[TMP51]], align 4
				; FVW2-NEXT: [[TMP53:%.*]] = insertelement <2 x float> [[TMP49]], float [[TMP52]], i64 1
				; FVW2-NEXT: br label [[PRED_LOAD_CONTINUE13]]
				; FVW2: pred.load.continue13:
				; FVW2-NEXT: [[TMP54:%.*]] = phi <2 x float> [ [[TMP49]], [[PRED_LOAD_CONTINUE11]] ], [ [[TMP53]], [[PRED_LOAD_IF12]] ]
				; FVW2-NEXT: [[TMP55:%.*]] = extractelement <2 x i1> [[TMP33]], i64 0
				; FVW2-NEXT: br i1 [[TMP55]], label [[PRED_LOAD_IF14:%.]], label [[PRED_LOAD_CONTINUE15:%.]]
				; FVW2: pred.load.if14:
				; FVW2-NEXT: [[TMP56:%.]] = getelementptr inbounds [[STRUCT_IN]], %struct.In [[IN]], i64 [[TMP3]], i32 1
				; FVW2-NEXT: [[TMP57:%.]] = load float, float [[TMP56]], align 4
				; FVW2-NEXT: [[TMP58:%.*]] = insertelement <2 x float> poison, float [[TMP57]], i64 0
				; FVW2-NEXT: br label [[PRED_LOAD_CONTINUE15]]
				; FVW2: pred.load.continue15:
				; FVW2-NEXT: [[TMP59:%.*]] = phi <2 x float> [ poison, [[PRED_LOAD_CONTINUE13]] ], [ [[TMP58]], [[PRED_LOAD_IF14]] ]
				; FVW2-NEXT: [[TMP60:%.*]] = extractelement <2 x i1> [[TMP33]], i64 1
				; FVW2-NEXT: br i1 [[TMP60]], label [[PRED_LOAD_IF16:%.]], label [[PRED_LOAD_CONTINUE17:%.]]
				; FVW2: pred.load.if16:
				; FVW2-NEXT: [[TMP61:%.]] = getelementptr inbounds [[STRUCT_IN]], %struct.In [[IN]], i64 [[TMP4]], i32 1
				; FVW2-NEXT: [[TMP62:%.]] = load float, float [[TMP61]], align 4
				; FVW2-NEXT: [[TMP63:%.*]] = insertelement <2 x float> [[TMP59]], float [[TMP62]], i64 1
				; FVW2-NEXT: br label [[PRED_LOAD_CONTINUE17]]
				; FVW2: pred.load.continue17:
				; FVW2-NEXT: [[TMP64:%.*]] = phi <2 x float> [ [[TMP59]], [[PRED_LOAD_CONTINUE15]] ], [ [[TMP63]], [[PRED_LOAD_IF16]] ]
				; FVW2-NEXT: [[TMP65:%.*]] = extractelement <2 x i1> [[TMP34]], i64 0
				; FVW2-NEXT: br i1 [[TMP65]], label [[PRED_LOAD_IF18:%.]], label [[PRED_LOAD_CONTINUE19:%.]]
				; FVW2: pred.load.if18:
				; FVW2-NEXT: [[TMP66:%.]] = getelementptr inbounds [[STRUCT_IN]], %struct.In [[IN]], i64 [[TMP5]], i32 1
				; FVW2-NEXT: [[TMP67:%.]] = load float, float [[TMP66]], align 4
				; FVW2-NEXT: [[TMP68:%.*]] = insertelement <2 x float> poison, float [[TMP67]], i64 0
				; FVW2-NEXT: br label [[PRED_LOAD_CONTINUE19]]
				; FVW2: pred.load.continue19:
				; FVW2-NEXT: [[TMP69:%.*]] = phi <2 x float> [ poison, [[PRED_LOAD_CONTINUE17]] ], [ [[TMP68]], [[PRED_LOAD_IF18]] ]
				; FVW2-NEXT: [[TMP70:%.*]] = extractelement <2 x i1> [[TMP34]], i64 1
				; FVW2-NEXT: br i1 [[TMP70]], label [[PRED_LOAD_IF20:%.]], label [[PRED_LOAD_CONTINUE21:%.]]
				; FVW2: pred.load.if20:
				; FVW2-NEXT: [[TMP71:%.]] = getelementptr inbounds [[STRUCT_IN]], %struct.In [[IN]], i64 [[TMP6]], i32 1
				; FVW2-NEXT: [[TMP72:%.]] = load float, float [[TMP71]], align 4
				; FVW2-NEXT: [[TMP73:%.*]] = insertelement <2 x float> [[TMP69]], float [[TMP72]], i64 1
				; FVW2-NEXT: br label [[PRED_LOAD_CONTINUE21]]
				; FVW2: pred.load.continue21:
				; FVW2-NEXT: [[TMP74:%.*]] = phi <2 x float> [ [[TMP69]], [[PRED_LOAD_CONTINUE19]] ], [ [[TMP73]], [[PRED_LOAD_IF20]] ]
				; FVW2-NEXT: [[TMP75:%.*]] = fadd <2 x float> [[TMP44]], <float 5.000000e-01, float 5.000000e-01>
				; FVW2-NEXT: [[TMP76:%.*]] = fadd <2 x float> [[TMP54]], <float 5.000000e-01, float 5.000000e-01>
				; FVW2-NEXT: [[TMP77:%.*]] = fadd <2 x float> [[TMP64]], <float 5.000000e-01, float 5.000000e-01>
				; FVW2-NEXT: [[TMP78:%.*]] = fadd <2 x float> [[TMP74]], <float 5.000000e-01, float 5.000000e-01>
				; FVW2-NEXT: [[TMP79:%.*]] = extractelement <2 x i1> [[TMP31]], i64 0
				; FVW2-NEXT: br i1 [[TMP79]], label [[PRED_STORE_IF:%.]], label [[PRED_STORE_CONTINUE:%.]]
	; FVW2: pred.store.if:			; FVW2: pred.store.if:
	; FVW2-NEXT: [[TMP11:%.]] = getelementptr inbounds float, float addrspace(1) [[OUT:%.*]], i64 [[OFFSET_IDX]]			; FVW2-NEXT: [[TMP80:%.]] = getelementptr inbounds float, float addrspace(1) [[OUT:%.*]], i64 [[OFFSET_IDX]]
	; FVW2-NEXT: [[TMP12:%.*]] = extractelement <2 x float> [[TMP9]], i64 0			; FVW2-NEXT: [[TMP81:%.*]] = extractelement <2 x float> [[TMP75]], i64 0
	; FVW2-NEXT: store float [[TMP12]], float addrspace(1)* [[TMP11]], align 4			; FVW2-NEXT: store float [[TMP81]], float addrspace(1)* [[TMP80]], align 4
	; FVW2-NEXT: br label [[PRED_STORE_CONTINUE]]			; FVW2-NEXT: br label [[PRED_STORE_CONTINUE]]
	; FVW2: pred.store.continue:			; FVW2: pred.store.continue:
	; FVW2-NEXT: [[TMP13:%.*]] = extractelement <2 x i1> [[TMP7]], i64 1			; FVW2-NEXT: [[TMP82:%.*]] = extractelement <2 x i1> [[TMP31]], i64 1
	; FVW2-NEXT: br i1 [[TMP13]], label [[PRED_STORE_IF8:%.*]], label [[PRED_STORE_CONTINUE9]]			; FVW2-NEXT: br i1 [[TMP82]], label [[PRED_STORE_IF22:%.]], label [[PRED_STORE_CONTINUE23:%.]]
	; FVW2: pred.store.if8:			; FVW2: pred.store.if22:
	; FVW2-NEXT: [[TMP14:%.]] = getelementptr inbounds float, float addrspace(1) [[OUT]], i64 [[TMP0]]			; FVW2-NEXT: [[TMP83:%.]] = getelementptr inbounds float, float addrspace(1) [[OUT]], i64 [[TMP0]]
	; FVW2-NEXT: [[TMP15:%.*]] = extractelement <2 x float> [[TMP9]], i64 1			; FVW2-NEXT: [[TMP84:%.*]] = extractelement <2 x float> [[TMP75]], i64 1
	; FVW2-NEXT: store float [[TMP15]], float addrspace(1)* [[TMP14]], align 4			; FVW2-NEXT: store float [[TMP84]], float addrspace(1)* [[TMP83]], align 4
	; FVW2-NEXT: br label [[PRED_STORE_CONTINUE9]]			; FVW2-NEXT: br label [[PRED_STORE_CONTINUE23]]
	; FVW2: pred.store.continue9:			; FVW2: pred.store.continue23:
	; FVW2-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX10]], 2			; FVW2-NEXT: [[TMP85:%.*]] = extractelement <2 x i1> [[TMP32]], i64 0
	; FVW2-NEXT: [[VEC_IND_NEXT]] = add <2 x i64> [[VEC_IND]], <i64 32, i64 32>			; FVW2-NEXT: br i1 [[TMP85]], label [[PRED_STORE_IF24:%.]], label [[PRED_STORE_CONTINUE25:%.]]
	; FVW2-NEXT: [[TMP16:%.*]] = icmp eq i64 [[INDEX_NEXT]], 256			; FVW2: pred.store.if24:
	; FVW2-NEXT: br i1 [[TMP16]], label [[FOR_END:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]			; FVW2-NEXT: [[TMP86:%.]] = getelementptr inbounds float, float addrspace(1) [[OUT]], i64 [[TMP1]]
				; FVW2-NEXT: [[TMP87:%.*]] = extractelement <2 x float> [[TMP76]], i64 0
				; FVW2-NEXT: store float [[TMP87]], float addrspace(1)* [[TMP86]], align 4
				; FVW2-NEXT: br label [[PRED_STORE_CONTINUE25]]
				; FVW2: pred.store.continue25:
				; FVW2-NEXT: [[TMP88:%.*]] = extractelement <2 x i1> [[TMP32]], i64 1
				; FVW2-NEXT: br i1 [[TMP88]], label [[PRED_STORE_IF26:%.]], label [[PRED_STORE_CONTINUE27:%.]]
				; FVW2: pred.store.if26:
				; FVW2-NEXT: [[TMP89:%.]] = getelementptr inbounds float, float addrspace(1) [[OUT]], i64 [[TMP2]]
				; FVW2-NEXT: [[TMP90:%.*]] = extractelement <2 x float> [[TMP76]], i64 1
				; FVW2-NEXT: store float [[TMP90]], float addrspace(1)* [[TMP89]], align 4
				; FVW2-NEXT: br label [[PRED_STORE_CONTINUE27]]
				; FVW2: pred.store.continue27:
				; FVW2-NEXT: [[TMP91:%.*]] = extractelement <2 x i1> [[TMP33]], i64 0
				; FVW2-NEXT: br i1 [[TMP91]], label [[PRED_STORE_IF28:%.]], label [[PRED_STORE_CONTINUE29:%.]]
				; FVW2: pred.store.if28:
				; FVW2-NEXT: [[TMP92:%.]] = getelementptr inbounds float, float addrspace(1) [[OUT]], i64 [[TMP3]]
				; FVW2-NEXT: [[TMP93:%.*]] = extractelement <2 x float> [[TMP77]], i64 0
				; FVW2-NEXT: store float [[TMP93]], float addrspace(1)* [[TMP92]], align 4
				; FVW2-NEXT: br label [[PRED_STORE_CONTINUE29]]
				; FVW2: pred.store.continue29:
				; FVW2-NEXT: [[TMP94:%.*]] = extractelement <2 x i1> [[TMP33]], i64 1
				; FVW2-NEXT: br i1 [[TMP94]], label [[PRED_STORE_IF30:%.]], label [[PRED_STORE_CONTINUE31:%.]]
				; FVW2: pred.store.if30:
				; FVW2-NEXT: [[TMP95:%.]] = getelementptr inbounds float, float addrspace(1) [[OUT]], i64 [[TMP4]]
				; FVW2-NEXT: [[TMP96:%.*]] = extractelement <2 x float> [[TMP77]], i64 1
				; FVW2-NEXT: store float [[TMP96]], float addrspace(1)* [[TMP95]], align 4
				; FVW2-NEXT: br label [[PRED_STORE_CONTINUE31]]
				; FVW2: pred.store.continue31:
				; FVW2-NEXT: [[TMP97:%.*]] = extractelement <2 x i1> [[TMP34]], i64 0
				; FVW2-NEXT: br i1 [[TMP97]], label [[PRED_STORE_IF32:%.]], label [[PRED_STORE_CONTINUE33:%.]]
				; FVW2: pred.store.if32:
				; FVW2-NEXT: [[TMP98:%.]] = getelementptr inbounds float, float addrspace(1) [[OUT]], i64 [[TMP5]]
				; FVW2-NEXT: [[TMP99:%.*]] = extractelement <2 x float> [[TMP78]], i64 0
				; FVW2-NEXT: store float [[TMP99]], float addrspace(1)* [[TMP98]], align 4
				; FVW2-NEXT: br label [[PRED_STORE_CONTINUE33]]
				; FVW2: pred.store.continue33:
				; FVW2-NEXT: [[TMP100:%.*]] = extractelement <2 x i1> [[TMP34]], i64 1
				; FVW2-NEXT: br i1 [[TMP100]], label [[PRED_STORE_IF34:%.*]], label [[PRED_STORE_CONTINUE35]]
				; FVW2: pred.store.if34:
				; FVW2-NEXT: [[TMP101:%.]] = getelementptr inbounds float, float addrspace(1) [[OUT]], i64 [[TMP6]]
				; FVW2-NEXT: [[TMP102:%.*]] = extractelement <2 x float> [[TMP78]], i64 1
				; FVW2-NEXT: store float [[TMP102]], float addrspace(1)* [[TMP101]], align 4
				; FVW2-NEXT: br label [[PRED_STORE_CONTINUE35]]
				; FVW2: pred.store.continue35:
				; FVW2-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX7]], 8
				; FVW2-NEXT: [[TMP103:%.*]] = icmp eq i64 [[INDEX_NEXT]], 256
				; FVW2-NEXT: br i1 [[TMP103]], label [[FOR_END:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]
	; FVW2: for.end:			; FVW2: for.end:
	; FVW2-NEXT: ret void			; FVW2-NEXT: ret void
	;			;
	entry:			entry:
	%in.addr = alloca %struct.In addrspace(0)*, align 8			%in.addr = alloca %struct.In addrspace(0)*, align 8
	%out.addr = alloca float addrspace(1)*, align 8			%out.addr = alloca float addrspace(1)*, align 8
	%trigger.addr = alloca i32*, align 8			%trigger.addr = alloca i32*, align 8
	%index.addr = alloca i32*, align 8			%index.addr = alloca i32*, align 8
	▲ Show 20 Lines • Show All 394 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/X86/x86-interleaved-accesses-masked-group.ll

	Show First 20 Lines • Show All 1,097 Lines • ▼ Show 20 Lines
	define dso_local void @masked_strided2(i8* noalias nocapture readonly %p, i8* noalias nocapture %q, i8 zeroext %guard) local_unnamed_addr {			define dso_local void @masked_strided2(i8* noalias nocapture readonly %p, i8* noalias nocapture %q, i8 zeroext %guard) local_unnamed_addr {
	; DISABLED_MASKED_STRIDED-LABEL: @masked_strided2(			; DISABLED_MASKED_STRIDED-LABEL: @masked_strided2(
	; DISABLED_MASKED_STRIDED-NEXT: entry:			; DISABLED_MASKED_STRIDED-NEXT: entry:
	; DISABLED_MASKED_STRIDED-NEXT: [[CONV:%.]] = zext i8 [[GUARD:%.]] to i32			; DISABLED_MASKED_STRIDED-NEXT: [[CONV:%.]] = zext i8 [[GUARD:%.]] to i32
	; DISABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <8 x i32> poison, i32 [[CONV]], i64 0			; DISABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <8 x i32> poison, i32 [[CONV]], i64 0
	; DISABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <8 x i32> [[BROADCAST_SPLATINSERT]], <8 x i32> poison, <8 x i32> zeroinitializer			; DISABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <8 x i32> [[BROADCAST_SPLATINSERT]], <8 x i32> poison, <8 x i32> zeroinitializer
	; DISABLED_MASKED_STRIDED-NEXT: br label [[VECTOR_BODY:%.*]]			; DISABLED_MASKED_STRIDED-NEXT: br label [[VECTOR_BODY:%.*]]
	; DISABLED_MASKED_STRIDED: vector.body:			; DISABLED_MASKED_STRIDED: vector.body:
	; DISABLED_MASKED_STRIDED-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[INDEX_NEXT:%.]], [[PRED_STORE_CONTINUE60:%.]] ]			; DISABLED_MASKED_STRIDED-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[INDEX_NEXT:%.]], [[PRED_STORE_CONTINUE44:%.]] ]
	; DISABLED_MASKED_STRIDED-NEXT: [[VEC_IND:%.]] = phi <8 x i32> [ <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>, [[ENTRY]] ], [ [[VEC_IND_NEXT:%.]], [[PRED_STORE_CONTINUE60]] ]			; DISABLED_MASKED_STRIDED-NEXT: [[VEC_IND:%.]] = phi <8 x i32> [ <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>, [[ENTRY]] ], [ [[VEC_IND_NEXT:%.]], [[PRED_STORE_CONTINUE44]] ]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP0:%.*]] = icmp ugt <8 x i32> [[VEC_IND]], [[BROADCAST_SPLAT]]			; DISABLED_MASKED_STRIDED-NEXT: [[TMP0:%.*]] = icmp ugt <8 x i32> [[VEC_IND]], [[BROADCAST_SPLAT]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP1:%.*]] = shl nuw nsw <8 x i32> [[VEC_IND]], <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>			; DISABLED_MASKED_STRIDED-NEXT: [[TMP1:%.*]] = shl nuw nsw <8 x i32> [[VEC_IND]], <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP2:%.*]] = extractelement <8 x i1> [[TMP0]], i64 0			; DISABLED_MASKED_STRIDED-NEXT: [[TMP2:%.*]] = extractelement <8 x i1> [[TMP0]], i64 0
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP2]], label [[PRED_LOAD_IF:%.]], label [[PRED_LOAD_CONTINUE:%.]]			; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP2]], label [[PRED_LOAD_IF:%.]], label [[PRED_LOAD_CONTINUE:%.]]
	; DISABLED_MASKED_STRIDED: pred.load.if:			; DISABLED_MASKED_STRIDED: pred.load.if:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP3:%.*]] = extractelement <8 x i32> [[TMP1]], i64 0			; DISABLED_MASKED_STRIDED-NEXT: [[TMP3:%.*]] = extractelement <8 x i32> [[TMP1]], i64 0
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP4:%.]] = getelementptr inbounds i8, i8 [[P:%.*]], i32 [[TMP3]]			; DISABLED_MASKED_STRIDED-NEXT: [[TMP4:%.]] = getelementptr inbounds i8, i8 [[P:%.*]], i32 [[TMP3]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP5:%.]] = load i8, i8 [[TMP4]], align 1			; DISABLED_MASKED_STRIDED-NEXT: [[TMP5:%.]] = load i8, i8 [[TMP4]], align 1
	▲ Show 20 Lines • Show All 156 Lines • ▼ Show 20 Lines
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP100:%.*]] = select <8 x i1> [[TMP99]], <8 x i8> [[TMP98]], <8 x i8> [[TMP49]]			; DISABLED_MASKED_STRIDED-NEXT: [[TMP100:%.*]] = select <8 x i1> [[TMP99]], <8 x i8> [[TMP98]], <8 x i8> [[TMP49]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP101:%.*]] = extractelement <8 x i1> [[TMP0]], i64 0			; DISABLED_MASKED_STRIDED-NEXT: [[TMP101:%.*]] = extractelement <8 x i1> [[TMP0]], i64 0
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP101]], label [[PRED_STORE_IF:%.]], label [[PRED_STORE_CONTINUE:%.]]			; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP101]], label [[PRED_STORE_IF:%.]], label [[PRED_STORE_CONTINUE:%.]]
	; DISABLED_MASKED_STRIDED: pred.store.if:			; DISABLED_MASKED_STRIDED: pred.store.if:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP102:%.*]] = extractelement <8 x i32> [[TMP1]], i64 0			; DISABLED_MASKED_STRIDED-NEXT: [[TMP102:%.*]] = extractelement <8 x i32> [[TMP1]], i64 0
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP103:%.]] = getelementptr inbounds i8, i8 [[Q:%.*]], i32 [[TMP102]]			; DISABLED_MASKED_STRIDED-NEXT: [[TMP103:%.]] = getelementptr inbounds i8, i8 [[Q:%.*]], i32 [[TMP102]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP104:%.*]] = extractelement <8 x i8> [[TMP100]], i64 0			; DISABLED_MASKED_STRIDED-NEXT: [[TMP104:%.*]] = extractelement <8 x i8> [[TMP100]], i64 0
	; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP104]], i8* [[TMP103]], align 1			; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP104]], i8* [[TMP103]], align 1
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP105:%.*]] = extractelement <8 x i8> [[TMP100]], i64 0
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP106:%.*]] = sub i8 0, [[TMP105]]
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP107:%.*]] = extractelement <8 x i32> [[TMP50]], i64 0
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP108:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP107]]
				; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP106]], i8* [[TMP108]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE]]			; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE]]
	; DISABLED_MASKED_STRIDED: pred.store.continue:			; DISABLED_MASKED_STRIDED: pred.store.continue:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP105:%.*]] = extractelement <8 x i1> [[TMP0]], i64 1			; DISABLED_MASKED_STRIDED-NEXT: [[TMP109:%.*]] = extractelement <8 x i1> [[TMP0]], i64 1
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP105]], label [[PRED_STORE_IF31:%.]], label [[PRED_STORE_CONTINUE32:%.]]			; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP109]], label [[PRED_STORE_IF31:%.]], label [[PRED_STORE_CONTINUE32:%.]]
	; DISABLED_MASKED_STRIDED: pred.store.if31:			; DISABLED_MASKED_STRIDED: pred.store.if31:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP106:%.*]] = extractelement <8 x i32> [[TMP1]], i64 1			; DISABLED_MASKED_STRIDED-NEXT: [[TMP110:%.*]] = extractelement <8 x i32> [[TMP1]], i64 1
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP107:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP106]]			; DISABLED_MASKED_STRIDED-NEXT: [[TMP111:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP110]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP108:%.*]] = extractelement <8 x i8> [[TMP100]], i64 1			; DISABLED_MASKED_STRIDED-NEXT: [[TMP112:%.*]] = extractelement <8 x i8> [[TMP100]], i64 1
	; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP108]], i8* [[TMP107]], align 1			; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP112]], i8* [[TMP111]], align 1
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP113:%.*]] = extractelement <8 x i8> [[TMP100]], i64 1
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP114:%.*]] = sub i8 0, [[TMP113]]
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP115:%.*]] = extractelement <8 x i32> [[TMP50]], i64 1
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP116:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP115]]
				; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP114]], i8* [[TMP116]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE32]]			; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE32]]
	; DISABLED_MASKED_STRIDED: pred.store.continue32:			; DISABLED_MASKED_STRIDED: pred.store.continue32:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP109:%.*]] = extractelement <8 x i1> [[TMP0]], i64 2			; DISABLED_MASKED_STRIDED-NEXT: [[TMP117:%.*]] = extractelement <8 x i1> [[TMP0]], i64 2
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP109]], label [[PRED_STORE_IF33:%.]], label [[PRED_STORE_CONTINUE34:%.]]			; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP117]], label [[PRED_STORE_IF33:%.]], label [[PRED_STORE_CONTINUE34:%.]]
	; DISABLED_MASKED_STRIDED: pred.store.if33:			; DISABLED_MASKED_STRIDED: pred.store.if33:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP110:%.*]] = extractelement <8 x i32> [[TMP1]], i64 2			; DISABLED_MASKED_STRIDED-NEXT: [[TMP118:%.*]] = extractelement <8 x i32> [[TMP1]], i64 2
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP111:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP110]]			; DISABLED_MASKED_STRIDED-NEXT: [[TMP119:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP118]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP112:%.*]] = extractelement <8 x i8> [[TMP100]], i64 2			; DISABLED_MASKED_STRIDED-NEXT: [[TMP120:%.*]] = extractelement <8 x i8> [[TMP100]], i64 2
	; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP112]], i8* [[TMP111]], align 1			; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP120]], i8* [[TMP119]], align 1
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP121:%.*]] = extractelement <8 x i8> [[TMP100]], i64 2
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP122:%.*]] = sub i8 0, [[TMP121]]
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP123:%.*]] = extractelement <8 x i32> [[TMP50]], i64 2
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP124:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP123]]
				; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP122]], i8* [[TMP124]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE34]]			; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE34]]
	; DISABLED_MASKED_STRIDED: pred.store.continue34:			; DISABLED_MASKED_STRIDED: pred.store.continue34:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP113:%.*]] = extractelement <8 x i1> [[TMP0]], i64 3			; DISABLED_MASKED_STRIDED-NEXT: [[TMP125:%.*]] = extractelement <8 x i1> [[TMP0]], i64 3
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP113]], label [[PRED_STORE_IF35:%.]], label [[PRED_STORE_CONTINUE36:%.]]			; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP125]], label [[PRED_STORE_IF35:%.]], label [[PRED_STORE_CONTINUE36:%.]]
	; DISABLED_MASKED_STRIDED: pred.store.if35:			; DISABLED_MASKED_STRIDED: pred.store.if35:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP114:%.*]] = extractelement <8 x i32> [[TMP1]], i64 3			; DISABLED_MASKED_STRIDED-NEXT: [[TMP126:%.*]] = extractelement <8 x i32> [[TMP1]], i64 3
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP115:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP114]]			; DISABLED_MASKED_STRIDED-NEXT: [[TMP127:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP126]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP116:%.*]] = extractelement <8 x i8> [[TMP100]], i64 3			; DISABLED_MASKED_STRIDED-NEXT: [[TMP128:%.*]] = extractelement <8 x i8> [[TMP100]], i64 3
	; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP116]], i8* [[TMP115]], align 1			; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP128]], i8* [[TMP127]], align 1
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP129:%.*]] = extractelement <8 x i8> [[TMP100]], i64 3
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP130:%.*]] = sub i8 0, [[TMP129]]
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP131:%.*]] = extractelement <8 x i32> [[TMP50]], i64 3
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP132:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP131]]
				; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP130]], i8* [[TMP132]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE36]]			; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE36]]
	; DISABLED_MASKED_STRIDED: pred.store.continue36:			; DISABLED_MASKED_STRIDED: pred.store.continue36:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP117:%.*]] = extractelement <8 x i1> [[TMP0]], i64 4			; DISABLED_MASKED_STRIDED-NEXT: [[TMP133:%.*]] = extractelement <8 x i1> [[TMP0]], i64 4
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP117]], label [[PRED_STORE_IF37:%.]], label [[PRED_STORE_CONTINUE38:%.]]			; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP133]], label [[PRED_STORE_IF37:%.]], label [[PRED_STORE_CONTINUE38:%.]]
	; DISABLED_MASKED_STRIDED: pred.store.if37:			; DISABLED_MASKED_STRIDED: pred.store.if37:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP118:%.*]] = extractelement <8 x i32> [[TMP1]], i64 4			; DISABLED_MASKED_STRIDED-NEXT: [[TMP134:%.*]] = extractelement <8 x i32> [[TMP1]], i64 4
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP119:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP118]]			; DISABLED_MASKED_STRIDED-NEXT: [[TMP135:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP134]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP120:%.*]] = extractelement <8 x i8> [[TMP100]], i64 4			; DISABLED_MASKED_STRIDED-NEXT: [[TMP136:%.*]] = extractelement <8 x i8> [[TMP100]], i64 4
	; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP120]], i8* [[TMP119]], align 1			; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP136]], i8* [[TMP135]], align 1
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP137:%.*]] = extractelement <8 x i8> [[TMP100]], i64 4
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP138:%.*]] = sub i8 0, [[TMP137]]
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP139:%.*]] = extractelement <8 x i32> [[TMP50]], i64 4
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP140:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP139]]
				; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP138]], i8* [[TMP140]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE38]]			; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE38]]
	; DISABLED_MASKED_STRIDED: pred.store.continue38:			; DISABLED_MASKED_STRIDED: pred.store.continue38:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP121:%.*]] = extractelement <8 x i1> [[TMP0]], i64 5			; DISABLED_MASKED_STRIDED-NEXT: [[TMP141:%.*]] = extractelement <8 x i1> [[TMP0]], i64 5
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP121]], label [[PRED_STORE_IF39:%.]], label [[PRED_STORE_CONTINUE40:%.]]			; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP141]], label [[PRED_STORE_IF39:%.]], label [[PRED_STORE_CONTINUE40:%.]]
	; DISABLED_MASKED_STRIDED: pred.store.if39:			; DISABLED_MASKED_STRIDED: pred.store.if39:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP122:%.*]] = extractelement <8 x i32> [[TMP1]], i64 5			; DISABLED_MASKED_STRIDED-NEXT: [[TMP142:%.*]] = extractelement <8 x i32> [[TMP1]], i64 5
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP123:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP122]]			; DISABLED_MASKED_STRIDED-NEXT: [[TMP143:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP142]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP124:%.*]] = extractelement <8 x i8> [[TMP100]], i64 5			; DISABLED_MASKED_STRIDED-NEXT: [[TMP144:%.*]] = extractelement <8 x i8> [[TMP100]], i64 5
	; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP124]], i8* [[TMP123]], align 1			; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP144]], i8* [[TMP143]], align 1
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP145:%.*]] = extractelement <8 x i8> [[TMP100]], i64 5
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP146:%.*]] = sub i8 0, [[TMP145]]
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP147:%.*]] = extractelement <8 x i32> [[TMP50]], i64 5
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP148:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP147]]
				; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP146]], i8* [[TMP148]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE40]]			; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE40]]
	; DISABLED_MASKED_STRIDED: pred.store.continue40:			; DISABLED_MASKED_STRIDED: pred.store.continue40:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP125:%.*]] = extractelement <8 x i1> [[TMP0]], i64 6			; DISABLED_MASKED_STRIDED-NEXT: [[TMP149:%.*]] = extractelement <8 x i1> [[TMP0]], i64 6
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP125]], label [[PRED_STORE_IF41:%.]], label [[PRED_STORE_CONTINUE42:%.]]			; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP149]], label [[PRED_STORE_IF41:%.]], label [[PRED_STORE_CONTINUE42:%.]]
	; DISABLED_MASKED_STRIDED: pred.store.if41:			; DISABLED_MASKED_STRIDED: pred.store.if41:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP126:%.*]] = extractelement <8 x i32> [[TMP1]], i64 6			; DISABLED_MASKED_STRIDED-NEXT: [[TMP150:%.*]] = extractelement <8 x i32> [[TMP1]], i64 6
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP127:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP126]]			; DISABLED_MASKED_STRIDED-NEXT: [[TMP151:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP150]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP128:%.*]] = extractelement <8 x i8> [[TMP100]], i64 6			; DISABLED_MASKED_STRIDED-NEXT: [[TMP152:%.*]] = extractelement <8 x i8> [[TMP100]], i64 6
	; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP128]], i8* [[TMP127]], align 1			; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP152]], i8* [[TMP151]], align 1
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP153:%.*]] = extractelement <8 x i8> [[TMP100]], i64 6
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP154:%.*]] = sub i8 0, [[TMP153]]
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP155:%.*]] = extractelement <8 x i32> [[TMP50]], i64 6
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP156:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP155]]
				; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP154]], i8* [[TMP156]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE42]]			; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE42]]
	; DISABLED_MASKED_STRIDED: pred.store.continue42:			; DISABLED_MASKED_STRIDED: pred.store.continue42:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP129:%.*]] = extractelement <8 x i1> [[TMP0]], i64 7			; DISABLED_MASKED_STRIDED-NEXT: [[TMP157:%.*]] = extractelement <8 x i1> [[TMP0]], i64 7
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP129]], label [[PRED_STORE_IF43:%.]], label [[PRED_STORE_CONTINUE44:%.]]			; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP157]], label [[PRED_STORE_IF43:%.*]], label [[PRED_STORE_CONTINUE44]]
	; DISABLED_MASKED_STRIDED: pred.store.if43:			; DISABLED_MASKED_STRIDED: pred.store.if43:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP130:%.*]] = extractelement <8 x i32> [[TMP1]], i64 7			; DISABLED_MASKED_STRIDED-NEXT: [[TMP158:%.*]] = extractelement <8 x i32> [[TMP1]], i64 7
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP131:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP130]]			; DISABLED_MASKED_STRIDED-NEXT: [[TMP159:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP158]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP132:%.*]] = extractelement <8 x i8> [[TMP100]], i64 7			; DISABLED_MASKED_STRIDED-NEXT: [[TMP160:%.*]] = extractelement <8 x i8> [[TMP100]], i64 7
	; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP132]], i8* [[TMP131]], align 1			; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP160]], i8* [[TMP159]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE44]]			; DISABLED_MASKED_STRIDED-NEXT: [[TMP161:%.*]] = extractelement <8 x i8> [[TMP100]], i64 7
	; DISABLED_MASKED_STRIDED: pred.store.continue44:			; DISABLED_MASKED_STRIDED-NEXT: [[TMP162:%.*]] = sub i8 0, [[TMP161]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP133:%.*]] = sub <8 x i8> zeroinitializer, [[TMP100]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP134:%.*]] = extractelement <8 x i1> [[TMP0]], i64 0
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP134]], label [[PRED_STORE_IF45:%.]], label [[PRED_STORE_CONTINUE46:%.]]
	; DISABLED_MASKED_STRIDED: pred.store.if45:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP135:%.*]] = extractelement <8 x i32> [[TMP50]], i64 0
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP136:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP135]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP137:%.*]] = extractelement <8 x i8> [[TMP133]], i64 0
	; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP137]], i8* [[TMP136]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE46]]
	; DISABLED_MASKED_STRIDED: pred.store.continue46:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP138:%.*]] = extractelement <8 x i1> [[TMP0]], i64 1
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP138]], label [[PRED_STORE_IF47:%.]], label [[PRED_STORE_CONTINUE48:%.]]
	; DISABLED_MASKED_STRIDED: pred.store.if47:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP139:%.*]] = extractelement <8 x i32> [[TMP50]], i64 1
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP140:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP139]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP141:%.*]] = extractelement <8 x i8> [[TMP133]], i64 1
	; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP141]], i8* [[TMP140]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE48]]
	; DISABLED_MASKED_STRIDED: pred.store.continue48:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP142:%.*]] = extractelement <8 x i1> [[TMP0]], i64 2
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP142]], label [[PRED_STORE_IF49:%.]], label [[PRED_STORE_CONTINUE50:%.]]
	; DISABLED_MASKED_STRIDED: pred.store.if49:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP143:%.*]] = extractelement <8 x i32> [[TMP50]], i64 2
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP144:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP143]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP145:%.*]] = extractelement <8 x i8> [[TMP133]], i64 2
	; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP145]], i8* [[TMP144]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE50]]
	; DISABLED_MASKED_STRIDED: pred.store.continue50:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP146:%.*]] = extractelement <8 x i1> [[TMP0]], i64 3
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP146]], label [[PRED_STORE_IF51:%.]], label [[PRED_STORE_CONTINUE52:%.]]
	; DISABLED_MASKED_STRIDED: pred.store.if51:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP147:%.*]] = extractelement <8 x i32> [[TMP50]], i64 3
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP148:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP147]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP149:%.*]] = extractelement <8 x i8> [[TMP133]], i64 3
	; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP149]], i8* [[TMP148]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE52]]
	; DISABLED_MASKED_STRIDED: pred.store.continue52:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP150:%.*]] = extractelement <8 x i1> [[TMP0]], i64 4
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP150]], label [[PRED_STORE_IF53:%.]], label [[PRED_STORE_CONTINUE54:%.]]
	; DISABLED_MASKED_STRIDED: pred.store.if53:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP151:%.*]] = extractelement <8 x i32> [[TMP50]], i64 4
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP152:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP151]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP153:%.*]] = extractelement <8 x i8> [[TMP133]], i64 4
	; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP153]], i8* [[TMP152]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE54]]
	; DISABLED_MASKED_STRIDED: pred.store.continue54:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP154:%.*]] = extractelement <8 x i1> [[TMP0]], i64 5
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP154]], label [[PRED_STORE_IF55:%.]], label [[PRED_STORE_CONTINUE56:%.]]
	; DISABLED_MASKED_STRIDED: pred.store.if55:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP155:%.*]] = extractelement <8 x i32> [[TMP50]], i64 5
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP156:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP155]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP157:%.*]] = extractelement <8 x i8> [[TMP133]], i64 5
	; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP157]], i8* [[TMP156]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE56]]
	; DISABLED_MASKED_STRIDED: pred.store.continue56:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP158:%.*]] = extractelement <8 x i1> [[TMP0]], i64 6
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP158]], label [[PRED_STORE_IF57:%.]], label [[PRED_STORE_CONTINUE58:%.]]
	; DISABLED_MASKED_STRIDED: pred.store.if57:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP159:%.*]] = extractelement <8 x i32> [[TMP50]], i64 6
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP160:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP159]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP161:%.*]] = extractelement <8 x i8> [[TMP133]], i64 6
	; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP161]], i8* [[TMP160]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE58]]
	; DISABLED_MASKED_STRIDED: pred.store.continue58:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP162:%.*]] = extractelement <8 x i1> [[TMP0]], i64 7
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP162]], label [[PRED_STORE_IF59:%.*]], label [[PRED_STORE_CONTINUE60]]
	; DISABLED_MASKED_STRIDED: pred.store.if59:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP163:%.*]] = extractelement <8 x i32> [[TMP50]], i64 7			; DISABLED_MASKED_STRIDED-NEXT: [[TMP163:%.*]] = extractelement <8 x i32> [[TMP50]], i64 7
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP164:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP163]]			; DISABLED_MASKED_STRIDED-NEXT: [[TMP164:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP163]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP165:%.*]] = extractelement <8 x i8> [[TMP133]], i64 7			; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP162]], i8* [[TMP164]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP165]], i8* [[TMP164]], align 1			; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE44]]
	; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE60]]			; DISABLED_MASKED_STRIDED: pred.store.continue44:
	; DISABLED_MASKED_STRIDED: pred.store.continue60:
	; DISABLED_MASKED_STRIDED-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 8			; DISABLED_MASKED_STRIDED-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 8
	; DISABLED_MASKED_STRIDED-NEXT: [[VEC_IND_NEXT]] = add <8 x i32> [[VEC_IND]], <i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8>			; DISABLED_MASKED_STRIDED-NEXT: [[VEC_IND_NEXT]] = add <8 x i32> [[VEC_IND]], <i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8>
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP166:%.*]] = icmp eq i32 [[INDEX_NEXT]], 1024			; DISABLED_MASKED_STRIDED-NEXT: [[TMP165:%.*]] = icmp eq i32 [[INDEX_NEXT]], 1024
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP166]], label [[FOR_END:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP7:![0-9]+]]			; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP165]], label [[FOR_END:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP7:![0-9]+]]
	; DISABLED_MASKED_STRIDED: for.end:			; DISABLED_MASKED_STRIDED: for.end:
	; DISABLED_MASKED_STRIDED-NEXT: ret void			; DISABLED_MASKED_STRIDED-NEXT: ret void
	;			;
	; ENABLED_MASKED_STRIDED-LABEL: @masked_strided2(			; ENABLED_MASKED_STRIDED-LABEL: @masked_strided2(
	; ENABLED_MASKED_STRIDED-NEXT: entry:			; ENABLED_MASKED_STRIDED-NEXT: entry:
	; ENABLED_MASKED_STRIDED-NEXT: [[CONV:%.]] = zext i8 [[GUARD:%.]] to i32			; ENABLED_MASKED_STRIDED-NEXT: [[CONV:%.]] = zext i8 [[GUARD:%.]] to i32
	; ENABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <8 x i32> poison, i32 [[CONV]], i64 0			; ENABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <8 x i32> poison, i32 [[CONV]], i64 0
	; ENABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <8 x i32> [[BROADCAST_SPLATINSERT]], <8 x i32> poison, <8 x i32> zeroinitializer			; ENABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <8 x i32> [[BROADCAST_SPLATINSERT]], <8 x i32> poison, <8 x i32> zeroinitializer
	▲ Show 20 Lines • Show All 88 Lines • ▼ Show 20 Lines
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP3:%.]] = icmp ugt i8 [[TMP2]], [[SCEVGEP2]]			; DISABLED_MASKED_STRIDED-NEXT: [[TMP3:%.]] = icmp ugt i8 [[TMP2]], [[SCEVGEP2]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP4:%.*]] = or i1 [[TMP1]], [[TMP3]]			; DISABLED_MASKED_STRIDED-NEXT: [[TMP4:%.*]] = or i1 [[TMP1]], [[TMP3]]
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP4]], label [[FOR_BODY:%.]], label [[VECTOR_PH:%.]]			; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP4]], label [[FOR_BODY:%.]], label [[VECTOR_PH:%.]]
	; DISABLED_MASKED_STRIDED: vector.ph:			; DISABLED_MASKED_STRIDED: vector.ph:
	; DISABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <8 x i32> poison, i32 [[CONV]], i64 0			; DISABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <8 x i32> poison, i32 [[CONV]], i64 0
	; DISABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <8 x i32> [[BROADCAST_SPLATINSERT]], <8 x i32> poison, <8 x i32> zeroinitializer			; DISABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <8 x i32> [[BROADCAST_SPLATINSERT]], <8 x i32> poison, <8 x i32> zeroinitializer
	; DISABLED_MASKED_STRIDED-NEXT: br label [[VECTOR_BODY:%.*]]			; DISABLED_MASKED_STRIDED-NEXT: br label [[VECTOR_BODY:%.*]]
	; DISABLED_MASKED_STRIDED: vector.body:			; DISABLED_MASKED_STRIDED: vector.body:
	; DISABLED_MASKED_STRIDED-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[PRED_STORE_CONTINUE65:%.*]] ]			; DISABLED_MASKED_STRIDED-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[PRED_STORE_CONTINUE49:%.*]] ]
	; DISABLED_MASKED_STRIDED-NEXT: [[VEC_IND:%.]] = phi <8 x i32> [ <i32 1024, i32 1023, i32 1022, i32 1021, i32 1020, i32 1019, i32 1018, i32 1017>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[PRED_STORE_CONTINUE65]] ]			; DISABLED_MASKED_STRIDED-NEXT: [[VEC_IND:%.]] = phi <8 x i32> [ <i32 1024, i32 1023, i32 1022, i32 1021, i32 1020, i32 1019, i32 1018, i32 1017>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[PRED_STORE_CONTINUE49]] ]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP5:%.*]] = icmp ugt <8 x i32> [[VEC_IND]], [[BROADCAST_SPLAT]]			; DISABLED_MASKED_STRIDED-NEXT: [[TMP5:%.*]] = icmp ugt <8 x i32> [[VEC_IND]], [[BROADCAST_SPLAT]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP6:%.*]] = shl nuw nsw <8 x i32> [[VEC_IND]], <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>			; DISABLED_MASKED_STRIDED-NEXT: [[TMP6:%.*]] = shl nuw nsw <8 x i32> [[VEC_IND]], <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP7:%.*]] = extractelement <8 x i1> [[TMP5]], i64 0			; DISABLED_MASKED_STRIDED-NEXT: [[TMP7:%.*]] = extractelement <8 x i1> [[TMP5]], i64 0
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP7]], label [[PRED_LOAD_IF:%.]], label [[PRED_LOAD_CONTINUE:%.]]			; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP7]], label [[PRED_LOAD_IF:%.]], label [[PRED_LOAD_CONTINUE:%.]]
	; DISABLED_MASKED_STRIDED: pred.load.if:			; DISABLED_MASKED_STRIDED: pred.load.if:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP8:%.*]] = extractelement <8 x i32> [[TMP6]], i64 0			; DISABLED_MASKED_STRIDED-NEXT: [[TMP8:%.*]] = extractelement <8 x i32> [[TMP6]], i64 0
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP9:%.]] = getelementptr inbounds i8, i8 [[P:%.*]], i32 [[TMP8]]			; DISABLED_MASKED_STRIDED-NEXT: [[TMP9:%.]] = getelementptr inbounds i8, i8 [[P:%.*]], i32 [[TMP8]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP10:%.]] = load i8, i8 [[TMP9]], align 1			; DISABLED_MASKED_STRIDED-NEXT: [[TMP10:%.]] = load i8, i8 [[TMP9]], align 1
	▲ Show 20 Lines • Show All 156 Lines • ▼ Show 20 Lines
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP105:%.*]] = select <8 x i1> [[TMP104]], <8 x i8> [[TMP103]], <8 x i8> [[TMP54]]			; DISABLED_MASKED_STRIDED-NEXT: [[TMP105:%.*]] = select <8 x i1> [[TMP104]], <8 x i8> [[TMP103]], <8 x i8> [[TMP54]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP106:%.*]] = extractelement <8 x i1> [[TMP5]], i64 0			; DISABLED_MASKED_STRIDED-NEXT: [[TMP106:%.*]] = extractelement <8 x i1> [[TMP5]], i64 0
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP106]], label [[PRED_STORE_IF:%.]], label [[PRED_STORE_CONTINUE:%.]]			; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP106]], label [[PRED_STORE_IF:%.]], label [[PRED_STORE_CONTINUE:%.]]
	; DISABLED_MASKED_STRIDED: pred.store.if:			; DISABLED_MASKED_STRIDED: pred.store.if:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP107:%.*]] = extractelement <8 x i32> [[TMP6]], i64 0			; DISABLED_MASKED_STRIDED-NEXT: [[TMP107:%.*]] = extractelement <8 x i32> [[TMP6]], i64 0
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP108:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP107]]			; DISABLED_MASKED_STRIDED-NEXT: [[TMP108:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP107]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP109:%.*]] = extractelement <8 x i8> [[TMP105]], i64 0			; DISABLED_MASKED_STRIDED-NEXT: [[TMP109:%.*]] = extractelement <8 x i8> [[TMP105]], i64 0
	; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP109]], i8* [[TMP108]], align 1			; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP109]], i8* [[TMP108]], align 1
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP110:%.*]] = extractelement <8 x i8> [[TMP105]], i64 0
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP111:%.*]] = sub i8 0, [[TMP110]]
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP112:%.*]] = extractelement <8 x i32> [[TMP55]], i64 0
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP113:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP112]]
				; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP111]], i8* [[TMP113]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE]]			; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE]]
	; DISABLED_MASKED_STRIDED: pred.store.continue:			; DISABLED_MASKED_STRIDED: pred.store.continue:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP110:%.*]] = extractelement <8 x i1> [[TMP5]], i64 1			; DISABLED_MASKED_STRIDED-NEXT: [[TMP114:%.*]] = extractelement <8 x i1> [[TMP5]], i64 1
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP110]], label [[PRED_STORE_IF36:%.]], label [[PRED_STORE_CONTINUE37:%.]]			; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP114]], label [[PRED_STORE_IF36:%.]], label [[PRED_STORE_CONTINUE37:%.]]
	; DISABLED_MASKED_STRIDED: pred.store.if36:			; DISABLED_MASKED_STRIDED: pred.store.if36:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP111:%.*]] = extractelement <8 x i32> [[TMP6]], i64 1			; DISABLED_MASKED_STRIDED-NEXT: [[TMP115:%.*]] = extractelement <8 x i32> [[TMP6]], i64 1
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP112:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP111]]			; DISABLED_MASKED_STRIDED-NEXT: [[TMP116:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP115]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP113:%.*]] = extractelement <8 x i8> [[TMP105]], i64 1			; DISABLED_MASKED_STRIDED-NEXT: [[TMP117:%.*]] = extractelement <8 x i8> [[TMP105]], i64 1
	; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP113]], i8* [[TMP112]], align 1			; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP117]], i8* [[TMP116]], align 1
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP118:%.*]] = extractelement <8 x i8> [[TMP105]], i64 1
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP119:%.*]] = sub i8 0, [[TMP118]]
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP120:%.*]] = extractelement <8 x i32> [[TMP55]], i64 1
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP121:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP120]]
				; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP119]], i8* [[TMP121]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE37]]			; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE37]]
	; DISABLED_MASKED_STRIDED: pred.store.continue37:			; DISABLED_MASKED_STRIDED: pred.store.continue37:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP114:%.*]] = extractelement <8 x i1> [[TMP5]], i64 2			; DISABLED_MASKED_STRIDED-NEXT: [[TMP122:%.*]] = extractelement <8 x i1> [[TMP5]], i64 2
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP114]], label [[PRED_STORE_IF38:%.]], label [[PRED_STORE_CONTINUE39:%.]]			; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP122]], label [[PRED_STORE_IF38:%.]], label [[PRED_STORE_CONTINUE39:%.]]
	; DISABLED_MASKED_STRIDED: pred.store.if38:			; DISABLED_MASKED_STRIDED: pred.store.if38:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP115:%.*]] = extractelement <8 x i32> [[TMP6]], i64 2			; DISABLED_MASKED_STRIDED-NEXT: [[TMP123:%.*]] = extractelement <8 x i32> [[TMP6]], i64 2
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP116:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP115]]			; DISABLED_MASKED_STRIDED-NEXT: [[TMP124:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP123]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP117:%.*]] = extractelement <8 x i8> [[TMP105]], i64 2			; DISABLED_MASKED_STRIDED-NEXT: [[TMP125:%.*]] = extractelement <8 x i8> [[TMP105]], i64 2
	; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP117]], i8* [[TMP116]], align 1			; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP125]], i8* [[TMP124]], align 1
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP126:%.*]] = extractelement <8 x i8> [[TMP105]], i64 2
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP127:%.*]] = sub i8 0, [[TMP126]]
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP128:%.*]] = extractelement <8 x i32> [[TMP55]], i64 2
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP129:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP128]]
				; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP127]], i8* [[TMP129]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE39]]			; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE39]]
	; DISABLED_MASKED_STRIDED: pred.store.continue39:			; DISABLED_MASKED_STRIDED: pred.store.continue39:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP118:%.*]] = extractelement <8 x i1> [[TMP5]], i64 3			; DISABLED_MASKED_STRIDED-NEXT: [[TMP130:%.*]] = extractelement <8 x i1> [[TMP5]], i64 3
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP118]], label [[PRED_STORE_IF40:%.]], label [[PRED_STORE_CONTINUE41:%.]]			; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP130]], label [[PRED_STORE_IF40:%.]], label [[PRED_STORE_CONTINUE41:%.]]
	; DISABLED_MASKED_STRIDED: pred.store.if40:			; DISABLED_MASKED_STRIDED: pred.store.if40:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP119:%.*]] = extractelement <8 x i32> [[TMP6]], i64 3			; DISABLED_MASKED_STRIDED-NEXT: [[TMP131:%.*]] = extractelement <8 x i32> [[TMP6]], i64 3
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP120:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP119]]			; DISABLED_MASKED_STRIDED-NEXT: [[TMP132:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP131]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP121:%.*]] = extractelement <8 x i8> [[TMP105]], i64 3			; DISABLED_MASKED_STRIDED-NEXT: [[TMP133:%.*]] = extractelement <8 x i8> [[TMP105]], i64 3
	; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP121]], i8* [[TMP120]], align 1			; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP133]], i8* [[TMP132]], align 1
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP134:%.*]] = extractelement <8 x i8> [[TMP105]], i64 3
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP135:%.*]] = sub i8 0, [[TMP134]]
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP136:%.*]] = extractelement <8 x i32> [[TMP55]], i64 3
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP137:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP136]]
				; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP135]], i8* [[TMP137]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE41]]			; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE41]]
	; DISABLED_MASKED_STRIDED: pred.store.continue41:			; DISABLED_MASKED_STRIDED: pred.store.continue41:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP122:%.*]] = extractelement <8 x i1> [[TMP5]], i64 4			; DISABLED_MASKED_STRIDED-NEXT: [[TMP138:%.*]] = extractelement <8 x i1> [[TMP5]], i64 4
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP122]], label [[PRED_STORE_IF42:%.]], label [[PRED_STORE_CONTINUE43:%.]]			; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP138]], label [[PRED_STORE_IF42:%.]], label [[PRED_STORE_CONTINUE43:%.]]
	; DISABLED_MASKED_STRIDED: pred.store.if42:			; DISABLED_MASKED_STRIDED: pred.store.if42:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP123:%.*]] = extractelement <8 x i32> [[TMP6]], i64 4			; DISABLED_MASKED_STRIDED-NEXT: [[TMP139:%.*]] = extractelement <8 x i32> [[TMP6]], i64 4
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP124:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP123]]			; DISABLED_MASKED_STRIDED-NEXT: [[TMP140:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP139]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP125:%.*]] = extractelement <8 x i8> [[TMP105]], i64 4			; DISABLED_MASKED_STRIDED-NEXT: [[TMP141:%.*]] = extractelement <8 x i8> [[TMP105]], i64 4
	; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP125]], i8* [[TMP124]], align 1			; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP141]], i8* [[TMP140]], align 1
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP142:%.*]] = extractelement <8 x i8> [[TMP105]], i64 4
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP143:%.*]] = sub i8 0, [[TMP142]]
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP144:%.*]] = extractelement <8 x i32> [[TMP55]], i64 4
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP145:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP144]]
				; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP143]], i8* [[TMP145]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE43]]			; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE43]]
	; DISABLED_MASKED_STRIDED: pred.store.continue43:			; DISABLED_MASKED_STRIDED: pred.store.continue43:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP126:%.*]] = extractelement <8 x i1> [[TMP5]], i64 5			; DISABLED_MASKED_STRIDED-NEXT: [[TMP146:%.*]] = extractelement <8 x i1> [[TMP5]], i64 5
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP126]], label [[PRED_STORE_IF44:%.]], label [[PRED_STORE_CONTINUE45:%.]]			; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP146]], label [[PRED_STORE_IF44:%.]], label [[PRED_STORE_CONTINUE45:%.]]
	; DISABLED_MASKED_STRIDED: pred.store.if44:			; DISABLED_MASKED_STRIDED: pred.store.if44:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP127:%.*]] = extractelement <8 x i32> [[TMP6]], i64 5			; DISABLED_MASKED_STRIDED-NEXT: [[TMP147:%.*]] = extractelement <8 x i32> [[TMP6]], i64 5
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP128:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP127]]			; DISABLED_MASKED_STRIDED-NEXT: [[TMP148:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP147]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP129:%.*]] = extractelement <8 x i8> [[TMP105]], i64 5			; DISABLED_MASKED_STRIDED-NEXT: [[TMP149:%.*]] = extractelement <8 x i8> [[TMP105]], i64 5
	; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP129]], i8* [[TMP128]], align 1			; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP149]], i8* [[TMP148]], align 1
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP150:%.*]] = extractelement <8 x i8> [[TMP105]], i64 5
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP151:%.*]] = sub i8 0, [[TMP150]]
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP152:%.*]] = extractelement <8 x i32> [[TMP55]], i64 5
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP153:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP152]]
				; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP151]], i8* [[TMP153]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE45]]			; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE45]]
	; DISABLED_MASKED_STRIDED: pred.store.continue45:			; DISABLED_MASKED_STRIDED: pred.store.continue45:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP130:%.*]] = extractelement <8 x i1> [[TMP5]], i64 6			; DISABLED_MASKED_STRIDED-NEXT: [[TMP154:%.*]] = extractelement <8 x i1> [[TMP5]], i64 6
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP130]], label [[PRED_STORE_IF46:%.]], label [[PRED_STORE_CONTINUE47:%.]]			; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP154]], label [[PRED_STORE_IF46:%.]], label [[PRED_STORE_CONTINUE47:%.]]
	; DISABLED_MASKED_STRIDED: pred.store.if46:			; DISABLED_MASKED_STRIDED: pred.store.if46:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP131:%.*]] = extractelement <8 x i32> [[TMP6]], i64 6			; DISABLED_MASKED_STRIDED-NEXT: [[TMP155:%.*]] = extractelement <8 x i32> [[TMP6]], i64 6
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP132:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP131]]			; DISABLED_MASKED_STRIDED-NEXT: [[TMP156:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP155]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP133:%.*]] = extractelement <8 x i8> [[TMP105]], i64 6			; DISABLED_MASKED_STRIDED-NEXT: [[TMP157:%.*]] = extractelement <8 x i8> [[TMP105]], i64 6
	; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP133]], i8* [[TMP132]], align 1			; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP157]], i8* [[TMP156]], align 1
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP158:%.*]] = extractelement <8 x i8> [[TMP105]], i64 6
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP159:%.*]] = sub i8 0, [[TMP158]]
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP160:%.*]] = extractelement <8 x i32> [[TMP55]], i64 6
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP161:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP160]]
				; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP159]], i8* [[TMP161]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE47]]			; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE47]]
	; DISABLED_MASKED_STRIDED: pred.store.continue47:			; DISABLED_MASKED_STRIDED: pred.store.continue47:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP134:%.*]] = extractelement <8 x i1> [[TMP5]], i64 7			; DISABLED_MASKED_STRIDED-NEXT: [[TMP162:%.*]] = extractelement <8 x i1> [[TMP5]], i64 7
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP134]], label [[PRED_STORE_IF48:%.]], label [[PRED_STORE_CONTINUE49:%.]]			; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP162]], label [[PRED_STORE_IF48:%.*]], label [[PRED_STORE_CONTINUE49]]
	; DISABLED_MASKED_STRIDED: pred.store.if48:			; DISABLED_MASKED_STRIDED: pred.store.if48:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP135:%.*]] = extractelement <8 x i32> [[TMP6]], i64 7			; DISABLED_MASKED_STRIDED-NEXT: [[TMP163:%.*]] = extractelement <8 x i32> [[TMP6]], i64 7
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP136:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP135]]			; DISABLED_MASKED_STRIDED-NEXT: [[TMP164:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP163]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP137:%.*]] = extractelement <8 x i8> [[TMP105]], i64 7			; DISABLED_MASKED_STRIDED-NEXT: [[TMP165:%.*]] = extractelement <8 x i8> [[TMP105]], i64 7
	; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP137]], i8* [[TMP136]], align 1			; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP165]], i8* [[TMP164]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE49]]			; DISABLED_MASKED_STRIDED-NEXT: [[TMP166:%.*]] = extractelement <8 x i8> [[TMP105]], i64 7
	; DISABLED_MASKED_STRIDED: pred.store.continue49:			; DISABLED_MASKED_STRIDED-NEXT: [[TMP167:%.*]] = sub i8 0, [[TMP166]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP138:%.*]] = sub <8 x i8> zeroinitializer, [[TMP105]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP139:%.*]] = extractelement <8 x i1> [[TMP5]], i64 0
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP139]], label [[PRED_STORE_IF50:%.]], label [[PRED_STORE_CONTINUE51:%.]]
	; DISABLED_MASKED_STRIDED: pred.store.if50:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP140:%.*]] = extractelement <8 x i32> [[TMP55]], i64 0
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP141:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP140]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP142:%.*]] = extractelement <8 x i8> [[TMP138]], i64 0
	; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP142]], i8* [[TMP141]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE51]]
	; DISABLED_MASKED_STRIDED: pred.store.continue51:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP143:%.*]] = extractelement <8 x i1> [[TMP5]], i64 1
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP143]], label [[PRED_STORE_IF52:%.]], label [[PRED_STORE_CONTINUE53:%.]]
	; DISABLED_MASKED_STRIDED: pred.store.if52:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP144:%.*]] = extractelement <8 x i32> [[TMP55]], i64 1
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP145:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP144]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP146:%.*]] = extractelement <8 x i8> [[TMP138]], i64 1
	; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP146]], i8* [[TMP145]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE53]]
	; DISABLED_MASKED_STRIDED: pred.store.continue53:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP147:%.*]] = extractelement <8 x i1> [[TMP5]], i64 2
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP147]], label [[PRED_STORE_IF54:%.]], label [[PRED_STORE_CONTINUE55:%.]]
	; DISABLED_MASKED_STRIDED: pred.store.if54:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP148:%.*]] = extractelement <8 x i32> [[TMP55]], i64 2
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP149:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP148]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP150:%.*]] = extractelement <8 x i8> [[TMP138]], i64 2
	; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP150]], i8* [[TMP149]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE55]]
	; DISABLED_MASKED_STRIDED: pred.store.continue55:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP151:%.*]] = extractelement <8 x i1> [[TMP5]], i64 3
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP151]], label [[PRED_STORE_IF56:%.]], label [[PRED_STORE_CONTINUE57:%.]]
	; DISABLED_MASKED_STRIDED: pred.store.if56:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP152:%.*]] = extractelement <8 x i32> [[TMP55]], i64 3
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP153:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP152]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP154:%.*]] = extractelement <8 x i8> [[TMP138]], i64 3
	; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP154]], i8* [[TMP153]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE57]]
	; DISABLED_MASKED_STRIDED: pred.store.continue57:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP155:%.*]] = extractelement <8 x i1> [[TMP5]], i64 4
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP155]], label [[PRED_STORE_IF58:%.]], label [[PRED_STORE_CONTINUE59:%.]]
	; DISABLED_MASKED_STRIDED: pred.store.if58:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP156:%.*]] = extractelement <8 x i32> [[TMP55]], i64 4
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP157:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP156]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP158:%.*]] = extractelement <8 x i8> [[TMP138]], i64 4
	; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP158]], i8* [[TMP157]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE59]]
	; DISABLED_MASKED_STRIDED: pred.store.continue59:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP159:%.*]] = extractelement <8 x i1> [[TMP5]], i64 5
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP159]], label [[PRED_STORE_IF60:%.]], label [[PRED_STORE_CONTINUE61:%.]]
	; DISABLED_MASKED_STRIDED: pred.store.if60:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP160:%.*]] = extractelement <8 x i32> [[TMP55]], i64 5
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP161:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP160]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP162:%.*]] = extractelement <8 x i8> [[TMP138]], i64 5
	; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP162]], i8* [[TMP161]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE61]]
	; DISABLED_MASKED_STRIDED: pred.store.continue61:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP163:%.*]] = extractelement <8 x i1> [[TMP5]], i64 6
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP163]], label [[PRED_STORE_IF62:%.]], label [[PRED_STORE_CONTINUE63:%.]]
	; DISABLED_MASKED_STRIDED: pred.store.if62:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP164:%.*]] = extractelement <8 x i32> [[TMP55]], i64 6
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP165:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP164]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP166:%.*]] = extractelement <8 x i8> [[TMP138]], i64 6
	; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP166]], i8* [[TMP165]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE63]]
	; DISABLED_MASKED_STRIDED: pred.store.continue63:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP167:%.*]] = extractelement <8 x i1> [[TMP5]], i64 7
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP167]], label [[PRED_STORE_IF64:%.*]], label [[PRED_STORE_CONTINUE65]]
	; DISABLED_MASKED_STRIDED: pred.store.if64:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP168:%.*]] = extractelement <8 x i32> [[TMP55]], i64 7			; DISABLED_MASKED_STRIDED-NEXT: [[TMP168:%.*]] = extractelement <8 x i32> [[TMP55]], i64 7
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP169:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP168]]			; DISABLED_MASKED_STRIDED-NEXT: [[TMP169:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP168]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP170:%.*]] = extractelement <8 x i8> [[TMP138]], i64 7			; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP167]], i8* [[TMP169]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP170]], i8* [[TMP169]], align 1			; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE49]]
	; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE65]]			; DISABLED_MASKED_STRIDED: pred.store.continue49:
	; DISABLED_MASKED_STRIDED: pred.store.continue65:
	; DISABLED_MASKED_STRIDED-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 8			; DISABLED_MASKED_STRIDED-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 8
	; DISABLED_MASKED_STRIDED-NEXT: [[VEC_IND_NEXT]] = add <8 x i32> [[VEC_IND]], <i32 -8, i32 -8, i32 -8, i32 -8, i32 -8, i32 -8, i32 -8, i32 -8>			; DISABLED_MASKED_STRIDED-NEXT: [[VEC_IND_NEXT]] = add <8 x i32> [[VEC_IND]], <i32 -8, i32 -8, i32 -8, i32 -8, i32 -8, i32 -8, i32 -8, i32 -8>
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP171:%.*]] = icmp eq i32 [[INDEX_NEXT]], 1024			; DISABLED_MASKED_STRIDED-NEXT: [[TMP170:%.*]] = icmp eq i32 [[INDEX_NEXT]], 1024
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP171]], label [[FOR_END:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP8:![0-9]+]]			; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP170]], label [[FOR_END:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP8:![0-9]+]]
	; DISABLED_MASKED_STRIDED: for.body:			; DISABLED_MASKED_STRIDED: for.body:
	; DISABLED_MASKED_STRIDED-NEXT: [[IX_024:%.]] = phi i32 [ [[INC:%.]], [[FOR_INC:%.]] ], [ 1024, [[ENTRY:%.]] ]			; DISABLED_MASKED_STRIDED-NEXT: [[IX_024:%.]] = phi i32 [ [[INC:%.]], [[FOR_INC:%.]] ], [ 1024, [[ENTRY:%.]] ]
	; DISABLED_MASKED_STRIDED-NEXT: [[CMP1:%.*]] = icmp ugt i32 [[IX_024]], [[CONV]]			; DISABLED_MASKED_STRIDED-NEXT: [[CMP1:%.*]] = icmp ugt i32 [[IX_024]], [[CONV]]
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[CMP1]], label [[IF_THEN:%.*]], label [[FOR_INC]]			; DISABLED_MASKED_STRIDED-NEXT: br i1 [[CMP1]], label [[IF_THEN:%.*]], label [[FOR_INC]]
	; DISABLED_MASKED_STRIDED: if.then:			; DISABLED_MASKED_STRIDED: if.then:
	; DISABLED_MASKED_STRIDED-NEXT: [[MUL:%.*]] = shl nuw nsw i32 [[IX_024]], 1			; DISABLED_MASKED_STRIDED-NEXT: [[MUL:%.*]] = shl nuw nsw i32 [[IX_024]], 1
	; DISABLED_MASKED_STRIDED-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i8, i8 [[P]], i32 [[MUL]]			; DISABLED_MASKED_STRIDED-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i8, i8 [[P]], i32 [[MUL]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP172:%.]] = load i8, i8 [[ARRAYIDX]], align 1			; DISABLED_MASKED_STRIDED-NEXT: [[TMP171:%.]] = load i8, i8 [[ARRAYIDX]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: [[ADD:%.*]] = or i32 [[MUL]], 1			; DISABLED_MASKED_STRIDED-NEXT: [[ADD:%.*]] = or i32 [[MUL]], 1
	; DISABLED_MASKED_STRIDED-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds i8, i8 [[P]], i32 [[ADD]]			; DISABLED_MASKED_STRIDED-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds i8, i8 [[P]], i32 [[ADD]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP173:%.]] = load i8, i8 [[ARRAYIDX4]], align 1			; DISABLED_MASKED_STRIDED-NEXT: [[TMP172:%.]] = load i8, i8 [[ARRAYIDX4]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: [[CMP_I:%.*]] = icmp slt i8 [[TMP172]], [[TMP173]]			; DISABLED_MASKED_STRIDED-NEXT: [[CMP_I:%.*]] = icmp slt i8 [[TMP171]], [[TMP172]]
	; DISABLED_MASKED_STRIDED-NEXT: [[SPEC_SELECT_I:%.*]] = select i1 [[CMP_I]], i8 [[TMP173]], i8 [[TMP172]]			; DISABLED_MASKED_STRIDED-NEXT: [[SPEC_SELECT_I:%.*]] = select i1 [[CMP_I]], i8 [[TMP172]], i8 [[TMP171]]
	; DISABLED_MASKED_STRIDED-NEXT: [[ARRAYIDX6:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[MUL]]			; DISABLED_MASKED_STRIDED-NEXT: [[ARRAYIDX6:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[MUL]]
	; DISABLED_MASKED_STRIDED-NEXT: store i8 [[SPEC_SELECT_I]], i8* [[ARRAYIDX6]], align 1			; DISABLED_MASKED_STRIDED-NEXT: store i8 [[SPEC_SELECT_I]], i8* [[ARRAYIDX6]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: [[SUB:%.*]] = sub i8 0, [[SPEC_SELECT_I]]			; DISABLED_MASKED_STRIDED-NEXT: [[SUB:%.*]] = sub i8 0, [[SPEC_SELECT_I]]
	; DISABLED_MASKED_STRIDED-NEXT: [[ARRAYIDX11:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[ADD]]			; DISABLED_MASKED_STRIDED-NEXT: [[ARRAYIDX11:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[ADD]]
	; DISABLED_MASKED_STRIDED-NEXT: store i8 [[SUB]], i8* [[ARRAYIDX11]], align 1			; DISABLED_MASKED_STRIDED-NEXT: store i8 [[SUB]], i8* [[ARRAYIDX11]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: br label [[FOR_INC]]			; DISABLED_MASKED_STRIDED-NEXT: br label [[FOR_INC]]
	; DISABLED_MASKED_STRIDED: for.inc:			; DISABLED_MASKED_STRIDED: for.inc:
	; DISABLED_MASKED_STRIDED-NEXT: [[INC]] = add nsw i32 [[IX_024]], -1			; DISABLED_MASKED_STRIDED-NEXT: [[INC]] = add nsw i32 [[IX_024]], -1
	Show All 13 Lines
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP3:%.]] = icmp ugt i8 [[TMP2]], [[SCEVGEP2]]			; ENABLED_MASKED_STRIDED-NEXT: [[TMP3:%.]] = icmp ugt i8 [[TMP2]], [[SCEVGEP2]]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP4:%.*]] = or i1 [[TMP1]], [[TMP3]]			; ENABLED_MASKED_STRIDED-NEXT: [[TMP4:%.*]] = or i1 [[TMP1]], [[TMP3]]
	; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP4]], label [[FOR_BODY:%.]], label [[VECTOR_PH:%.]]			; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP4]], label [[FOR_BODY:%.]], label [[VECTOR_PH:%.]]
	; ENABLED_MASKED_STRIDED: vector.ph:			; ENABLED_MASKED_STRIDED: vector.ph:
	; ENABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <8 x i32> poison, i32 [[CONV]], i64 0			; ENABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <8 x i32> poison, i32 [[CONV]], i64 0
	; ENABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <8 x i32> [[BROADCAST_SPLATINSERT]], <8 x i32> poison, <8 x i32> zeroinitializer			; ENABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <8 x i32> [[BROADCAST_SPLATINSERT]], <8 x i32> poison, <8 x i32> zeroinitializer
	; ENABLED_MASKED_STRIDED-NEXT: br label [[VECTOR_BODY:%.*]]			; ENABLED_MASKED_STRIDED-NEXT: br label [[VECTOR_BODY:%.*]]
	; ENABLED_MASKED_STRIDED: vector.body:			; ENABLED_MASKED_STRIDED: vector.body:
	; ENABLED_MASKED_STRIDED-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[PRED_STORE_CONTINUE65:%.*]] ]			; ENABLED_MASKED_STRIDED-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[PRED_STORE_CONTINUE49:%.*]] ]
	; ENABLED_MASKED_STRIDED-NEXT: [[VEC_IND:%.]] = phi <8 x i32> [ <i32 1024, i32 1023, i32 1022, i32 1021, i32 1020, i32 1019, i32 1018, i32 1017>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[PRED_STORE_CONTINUE65]] ]			; ENABLED_MASKED_STRIDED-NEXT: [[VEC_IND:%.]] = phi <8 x i32> [ <i32 1024, i32 1023, i32 1022, i32 1021, i32 1020, i32 1019, i32 1018, i32 1017>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[PRED_STORE_CONTINUE49]] ]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP5:%.*]] = icmp ugt <8 x i32> [[VEC_IND]], [[BROADCAST_SPLAT]]			; ENABLED_MASKED_STRIDED-NEXT: [[TMP5:%.*]] = icmp ugt <8 x i32> [[VEC_IND]], [[BROADCAST_SPLAT]]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP6:%.*]] = shl nuw nsw <8 x i32> [[VEC_IND]], <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>			; ENABLED_MASKED_STRIDED-NEXT: [[TMP6:%.*]] = shl nuw nsw <8 x i32> [[VEC_IND]], <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP7:%.*]] = extractelement <8 x i1> [[TMP5]], i64 0			; ENABLED_MASKED_STRIDED-NEXT: [[TMP7:%.*]] = extractelement <8 x i1> [[TMP5]], i64 0
	; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP7]], label [[PRED_LOAD_IF:%.]], label [[PRED_LOAD_CONTINUE:%.]]			; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP7]], label [[PRED_LOAD_IF:%.]], label [[PRED_LOAD_CONTINUE:%.]]
	; ENABLED_MASKED_STRIDED: pred.load.if:			; ENABLED_MASKED_STRIDED: pred.load.if:
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP8:%.*]] = extractelement <8 x i32> [[TMP6]], i64 0			; ENABLED_MASKED_STRIDED-NEXT: [[TMP8:%.*]] = extractelement <8 x i32> [[TMP6]], i64 0
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP9:%.]] = getelementptr inbounds i8, i8 [[P:%.*]], i32 [[TMP8]]			; ENABLED_MASKED_STRIDED-NEXT: [[TMP9:%.]] = getelementptr inbounds i8, i8 [[P:%.*]], i32 [[TMP8]]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP10:%.]] = load i8, i8 [[TMP9]], align 1			; ENABLED_MASKED_STRIDED-NEXT: [[TMP10:%.]] = load i8, i8 [[TMP9]], align 1
	▲ Show 20 Lines • Show All 156 Lines • ▼ Show 20 Lines
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP105:%.*]] = select <8 x i1> [[TMP104]], <8 x i8> [[TMP103]], <8 x i8> [[TMP54]]			; ENABLED_MASKED_STRIDED-NEXT: [[TMP105:%.*]] = select <8 x i1> [[TMP104]], <8 x i8> [[TMP103]], <8 x i8> [[TMP54]]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP106:%.*]] = extractelement <8 x i1> [[TMP5]], i64 0			; ENABLED_MASKED_STRIDED-NEXT: [[TMP106:%.*]] = extractelement <8 x i1> [[TMP5]], i64 0
	; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP106]], label [[PRED_STORE_IF:%.]], label [[PRED_STORE_CONTINUE:%.]]			; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP106]], label [[PRED_STORE_IF:%.]], label [[PRED_STORE_CONTINUE:%.]]
	; ENABLED_MASKED_STRIDED: pred.store.if:			; ENABLED_MASKED_STRIDED: pred.store.if:
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP107:%.*]] = extractelement <8 x i32> [[TMP6]], i64 0			; ENABLED_MASKED_STRIDED-NEXT: [[TMP107:%.*]] = extractelement <8 x i32> [[TMP6]], i64 0
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP108:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP107]]			; ENABLED_MASKED_STRIDED-NEXT: [[TMP108:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP107]]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP109:%.*]] = extractelement <8 x i8> [[TMP105]], i64 0			; ENABLED_MASKED_STRIDED-NEXT: [[TMP109:%.*]] = extractelement <8 x i8> [[TMP105]], i64 0
	; ENABLED_MASKED_STRIDED-NEXT: store i8 [[TMP109]], i8* [[TMP108]], align 1			; ENABLED_MASKED_STRIDED-NEXT: store i8 [[TMP109]], i8* [[TMP108]], align 1
				; ENABLED_MASKED_STRIDED-NEXT: [[TMP110:%.*]] = extractelement <8 x i8> [[TMP105]], i64 0
				; ENABLED_MASKED_STRIDED-NEXT: [[TMP111:%.*]] = sub i8 0, [[TMP110]]
				; ENABLED_MASKED_STRIDED-NEXT: [[TMP112:%.*]] = extractelement <8 x i32> [[TMP55]], i64 0
				; ENABLED_MASKED_STRIDED-NEXT: [[TMP113:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP112]]
				; ENABLED_MASKED_STRIDED-NEXT: store i8 [[TMP111]], i8* [[TMP113]], align 1
	; ENABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE]]			; ENABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE]]
	; ENABLED_MASKED_STRIDED: pred.store.continue:			; ENABLED_MASKED_STRIDED: pred.store.continue:
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP110:%.*]] = extractelement <8 x i1> [[TMP5]], i64 1			; ENABLED_MASKED_STRIDED-NEXT: [[TMP114:%.*]] = extractelement <8 x i1> [[TMP5]], i64 1
	; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP110]], label [[PRED_STORE_IF36:%.]], label [[PRED_STORE_CONTINUE37:%.]]			; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP114]], label [[PRED_STORE_IF36:%.]], label [[PRED_STORE_CONTINUE37:%.]]
	; ENABLED_MASKED_STRIDED: pred.store.if36:			; ENABLED_MASKED_STRIDED: pred.store.if36:
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP111:%.*]] = extractelement <8 x i32> [[TMP6]], i64 1			; ENABLED_MASKED_STRIDED-NEXT: [[TMP115:%.*]] = extractelement <8 x i32> [[TMP6]], i64 1
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP112:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP111]]			; ENABLED_MASKED_STRIDED-NEXT: [[TMP116:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP115]]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP113:%.*]] = extractelement <8 x i8> [[TMP105]], i64 1			; ENABLED_MASKED_STRIDED-NEXT: [[TMP117:%.*]] = extractelement <8 x i8> [[TMP105]], i64 1
	; ENABLED_MASKED_STRIDED-NEXT: store i8 [[TMP113]], i8* [[TMP112]], align 1			; ENABLED_MASKED_STRIDED-NEXT: store i8 [[TMP117]], i8* [[TMP116]], align 1
				; ENABLED_MASKED_STRIDED-NEXT: [[TMP118:%.*]] = extractelement <8 x i8> [[TMP105]], i64 1
				; ENABLED_MASKED_STRIDED-NEXT: [[TMP119:%.*]] = sub i8 0, [[TMP118]]
				; ENABLED_MASKED_STRIDED-NEXT: [[TMP120:%.*]] = extractelement <8 x i32> [[TMP55]], i64 1
				; ENABLED_MASKED_STRIDED-NEXT: [[TMP121:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP120]]
				; ENABLED_MASKED_STRIDED-NEXT: store i8 [[TMP119]], i8* [[TMP121]], align 1
	; ENABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE37]]			; ENABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE37]]
	; ENABLED_MASKED_STRIDED: pred.store.continue37:			; ENABLED_MASKED_STRIDED: pred.store.continue37:
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP114:%.*]] = extractelement <8 x i1> [[TMP5]], i64 2			; ENABLED_MASKED_STRIDED-NEXT: [[TMP122:%.*]] = extractelement <8 x i1> [[TMP5]], i64 2
	; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP114]], label [[PRED_STORE_IF38:%.]], label [[PRED_STORE_CONTINUE39:%.]]			; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP122]], label [[PRED_STORE_IF38:%.]], label [[PRED_STORE_CONTINUE39:%.]]
	; ENABLED_MASKED_STRIDED: pred.store.if38:			; ENABLED_MASKED_STRIDED: pred.store.if38:
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP115:%.*]] = extractelement <8 x i32> [[TMP6]], i64 2			; ENABLED_MASKED_STRIDED-NEXT: [[TMP123:%.*]] = extractelement <8 x i32> [[TMP6]], i64 2
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP116:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP115]]			; ENABLED_MASKED_STRIDED-NEXT: [[TMP124:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP123]]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP117:%.*]] = extractelement <8 x i8> [[TMP105]], i64 2			; ENABLED_MASKED_STRIDED-NEXT: [[TMP125:%.*]] = extractelement <8 x i8> [[TMP105]], i64 2
	; ENABLED_MASKED_STRIDED-NEXT: store i8 [[TMP117]], i8* [[TMP116]], align 1			; ENABLED_MASKED_STRIDED-NEXT: store i8 [[TMP125]], i8* [[TMP124]], align 1
				; ENABLED_MASKED_STRIDED-NEXT: [[TMP126:%.*]] = extractelement <8 x i8> [[TMP105]], i64 2
				; ENABLED_MASKED_STRIDED-NEXT: [[TMP127:%.*]] = sub i8 0, [[TMP126]]
				; ENABLED_MASKED_STRIDED-NEXT: [[TMP128:%.*]] = extractelement <8 x i32> [[TMP55]], i64 2
				; ENABLED_MASKED_STRIDED-NEXT: [[TMP129:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP128]]
				; ENABLED_MASKED_STRIDED-NEXT: store i8 [[TMP127]], i8* [[TMP129]], align 1
	; ENABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE39]]			; ENABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE39]]
	; ENABLED_MASKED_STRIDED: pred.store.continue39:			; ENABLED_MASKED_STRIDED: pred.store.continue39:
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP118:%.*]] = extractelement <8 x i1> [[TMP5]], i64 3			; ENABLED_MASKED_STRIDED-NEXT: [[TMP130:%.*]] = extractelement <8 x i1> [[TMP5]], i64 3
	; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP118]], label [[PRED_STORE_IF40:%.]], label [[PRED_STORE_CONTINUE41:%.]]			; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP130]], label [[PRED_STORE_IF40:%.]], label [[PRED_STORE_CONTINUE41:%.]]
	; ENABLED_MASKED_STRIDED: pred.store.if40:			; ENABLED_MASKED_STRIDED: pred.store.if40:
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP119:%.*]] = extractelement <8 x i32> [[TMP6]], i64 3			; ENABLED_MASKED_STRIDED-NEXT: [[TMP131:%.*]] = extractelement <8 x i32> [[TMP6]], i64 3
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP120:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP119]]			; ENABLED_MASKED_STRIDED-NEXT: [[TMP132:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP131]]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP121:%.*]] = extractelement <8 x i8> [[TMP105]], i64 3			; ENABLED_MASKED_STRIDED-NEXT: [[TMP133:%.*]] = extractelement <8 x i8> [[TMP105]], i64 3
	; ENABLED_MASKED_STRIDED-NEXT: store i8 [[TMP121]], i8* [[TMP120]], align 1			; ENABLED_MASKED_STRIDED-NEXT: store i8 [[TMP133]], i8* [[TMP132]], align 1
				; ENABLED_MASKED_STRIDED-NEXT: [[TMP134:%.*]] = extractelement <8 x i8> [[TMP105]], i64 3
				; ENABLED_MASKED_STRIDED-NEXT: [[TMP135:%.*]] = sub i8 0, [[TMP134]]
				; ENABLED_MASKED_STRIDED-NEXT: [[TMP136:%.*]] = extractelement <8 x i32> [[TMP55]], i64 3
				; ENABLED_MASKED_STRIDED-NEXT: [[TMP137:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP136]]
				; ENABLED_MASKED_STRIDED-NEXT: store i8 [[TMP135]], i8* [[TMP137]], align 1
	; ENABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE41]]			; ENABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE41]]
	; ENABLED_MASKED_STRIDED: pred.store.continue41:			; ENABLED_MASKED_STRIDED: pred.store.continue41:
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP122:%.*]] = extractelement <8 x i1> [[TMP5]], i64 4			; ENABLED_MASKED_STRIDED-NEXT: [[TMP138:%.*]] = extractelement <8 x i1> [[TMP5]], i64 4
	; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP122]], label [[PRED_STORE_IF42:%.]], label [[PRED_STORE_CONTINUE43:%.]]			; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP138]], label [[PRED_STORE_IF42:%.]], label [[PRED_STORE_CONTINUE43:%.]]
	; ENABLED_MASKED_STRIDED: pred.store.if42:			; ENABLED_MASKED_STRIDED: pred.store.if42:
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP123:%.*]] = extractelement <8 x i32> [[TMP6]], i64 4			; ENABLED_MASKED_STRIDED-NEXT: [[TMP139:%.*]] = extractelement <8 x i32> [[TMP6]], i64 4
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP124:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP123]]			; ENABLED_MASKED_STRIDED-NEXT: [[TMP140:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP139]]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP125:%.*]] = extractelement <8 x i8> [[TMP105]], i64 4			; ENABLED_MASKED_STRIDED-NEXT: [[TMP141:%.*]] = extractelement <8 x i8> [[TMP105]], i64 4
	; ENABLED_MASKED_STRIDED-NEXT: store i8 [[TMP125]], i8* [[TMP124]], align 1			; ENABLED_MASKED_STRIDED-NEXT: store i8 [[TMP141]], i8* [[TMP140]], align 1
				; ENABLED_MASKED_STRIDED-NEXT: [[TMP142:%.*]] = extractelement <8 x i8> [[TMP105]], i64 4
				; ENABLED_MASKED_STRIDED-NEXT: [[TMP143:%.*]] = sub i8 0, [[TMP142]]
				; ENABLED_MASKED_STRIDED-NEXT: [[TMP144:%.*]] = extractelement <8 x i32> [[TMP55]], i64 4
				; ENABLED_MASKED_STRIDED-NEXT: [[TMP145:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP144]]
				; ENABLED_MASKED_STRIDED-NEXT: store i8 [[TMP143]], i8* [[TMP145]], align 1
	; ENABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE43]]			; ENABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE43]]
	; ENABLED_MASKED_STRIDED: pred.store.continue43:			; ENABLED_MASKED_STRIDED: pred.store.continue43:
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP126:%.*]] = extractelement <8 x i1> [[TMP5]], i64 5			; ENABLED_MASKED_STRIDED-NEXT: [[TMP146:%.*]] = extractelement <8 x i1> [[TMP5]], i64 5
	; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP126]], label [[PRED_STORE_IF44:%.]], label [[PRED_STORE_CONTINUE45:%.]]			; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP146]], label [[PRED_STORE_IF44:%.]], label [[PRED_STORE_CONTINUE45:%.]]
	; ENABLED_MASKED_STRIDED: pred.store.if44:			; ENABLED_MASKED_STRIDED: pred.store.if44:
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP127:%.*]] = extractelement <8 x i32> [[TMP6]], i64 5			; ENABLED_MASKED_STRIDED-NEXT: [[TMP147:%.*]] = extractelement <8 x i32> [[TMP6]], i64 5
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP128:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP127]]			; ENABLED_MASKED_STRIDED-NEXT: [[TMP148:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP147]]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP129:%.*]] = extractelement <8 x i8> [[TMP105]], i64 5			; ENABLED_MASKED_STRIDED-NEXT: [[TMP149:%.*]] = extractelement <8 x i8> [[TMP105]], i64 5
	; ENABLED_MASKED_STRIDED-NEXT: store i8 [[TMP129]], i8* [[TMP128]], align 1			; ENABLED_MASKED_STRIDED-NEXT: store i8 [[TMP149]], i8* [[TMP148]], align 1
				; ENABLED_MASKED_STRIDED-NEXT: [[TMP150:%.*]] = extractelement <8 x i8> [[TMP105]], i64 5
				; ENABLED_MASKED_STRIDED-NEXT: [[TMP151:%.*]] = sub i8 0, [[TMP150]]
				; ENABLED_MASKED_STRIDED-NEXT: [[TMP152:%.*]] = extractelement <8 x i32> [[TMP55]], i64 5
				; ENABLED_MASKED_STRIDED-NEXT: [[TMP153:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP152]]
				; ENABLED_MASKED_STRIDED-NEXT: store i8 [[TMP151]], i8* [[TMP153]], align 1
	; ENABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE45]]			; ENABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE45]]
	; ENABLED_MASKED_STRIDED: pred.store.continue45:			; ENABLED_MASKED_STRIDED: pred.store.continue45:
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP130:%.*]] = extractelement <8 x i1> [[TMP5]], i64 6			; ENABLED_MASKED_STRIDED-NEXT: [[TMP154:%.*]] = extractelement <8 x i1> [[TMP5]], i64 6
	; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP130]], label [[PRED_STORE_IF46:%.]], label [[PRED_STORE_CONTINUE47:%.]]			; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP154]], label [[PRED_STORE_IF46:%.]], label [[PRED_STORE_CONTINUE47:%.]]
	; ENABLED_MASKED_STRIDED: pred.store.if46:			; ENABLED_MASKED_STRIDED: pred.store.if46:
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP131:%.*]] = extractelement <8 x i32> [[TMP6]], i64 6			; ENABLED_MASKED_STRIDED-NEXT: [[TMP155:%.*]] = extractelement <8 x i32> [[TMP6]], i64 6
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP132:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP131]]			; ENABLED_MASKED_STRIDED-NEXT: [[TMP156:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP155]]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP133:%.*]] = extractelement <8 x i8> [[TMP105]], i64 6			; ENABLED_MASKED_STRIDED-NEXT: [[TMP157:%.*]] = extractelement <8 x i8> [[TMP105]], i64 6
	; ENABLED_MASKED_STRIDED-NEXT: store i8 [[TMP133]], i8* [[TMP132]], align 1			; ENABLED_MASKED_STRIDED-NEXT: store i8 [[TMP157]], i8* [[TMP156]], align 1
				; ENABLED_MASKED_STRIDED-NEXT: [[TMP158:%.*]] = extractelement <8 x i8> [[TMP105]], i64 6
				; ENABLED_MASKED_STRIDED-NEXT: [[TMP159:%.*]] = sub i8 0, [[TMP158]]
				; ENABLED_MASKED_STRIDED-NEXT: [[TMP160:%.*]] = extractelement <8 x i32> [[TMP55]], i64 6
				; ENABLED_MASKED_STRIDED-NEXT: [[TMP161:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP160]]
				; ENABLED_MASKED_STRIDED-NEXT: store i8 [[TMP159]], i8* [[TMP161]], align 1
	; ENABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE47]]			; ENABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE47]]
	; ENABLED_MASKED_STRIDED: pred.store.continue47:			; ENABLED_MASKED_STRIDED: pred.store.continue47:
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP134:%.*]] = extractelement <8 x i1> [[TMP5]], i64 7			; ENABLED_MASKED_STRIDED-NEXT: [[TMP162:%.*]] = extractelement <8 x i1> [[TMP5]], i64 7
	; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP134]], label [[PRED_STORE_IF48:%.]], label [[PRED_STORE_CONTINUE49:%.]]			; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP162]], label [[PRED_STORE_IF48:%.*]], label [[PRED_STORE_CONTINUE49]]
	; ENABLED_MASKED_STRIDED: pred.store.if48:			; ENABLED_MASKED_STRIDED: pred.store.if48:
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP135:%.*]] = extractelement <8 x i32> [[TMP6]], i64 7			; ENABLED_MASKED_STRIDED-NEXT: [[TMP163:%.*]] = extractelement <8 x i32> [[TMP6]], i64 7
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP136:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP135]]			; ENABLED_MASKED_STRIDED-NEXT: [[TMP164:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP163]]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP137:%.*]] = extractelement <8 x i8> [[TMP105]], i64 7			; ENABLED_MASKED_STRIDED-NEXT: [[TMP165:%.*]] = extractelement <8 x i8> [[TMP105]], i64 7
	; ENABLED_MASKED_STRIDED-NEXT: store i8 [[TMP137]], i8* [[TMP136]], align 1			; ENABLED_MASKED_STRIDED-NEXT: store i8 [[TMP165]], i8* [[TMP164]], align 1
	; ENABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE49]]			; ENABLED_MASKED_STRIDED-NEXT: [[TMP166:%.*]] = extractelement <8 x i8> [[TMP105]], i64 7
	; ENABLED_MASKED_STRIDED: pred.store.continue49:			; ENABLED_MASKED_STRIDED-NEXT: [[TMP167:%.*]] = sub i8 0, [[TMP166]]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP138:%.*]] = sub <8 x i8> zeroinitializer, [[TMP105]]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP139:%.*]] = extractelement <8 x i1> [[TMP5]], i64 0
	; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP139]], label [[PRED_STORE_IF50:%.]], label [[PRED_STORE_CONTINUE51:%.]]
	; ENABLED_MASKED_STRIDED: pred.store.if50:
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP140:%.*]] = extractelement <8 x i32> [[TMP55]], i64 0
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP141:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP140]]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP142:%.*]] = extractelement <8 x i8> [[TMP138]], i64 0
	; ENABLED_MASKED_STRIDED-NEXT: store i8 [[TMP142]], i8* [[TMP141]], align 1
	; ENABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE51]]
	; ENABLED_MASKED_STRIDED: pred.store.continue51:
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP143:%.*]] = extractelement <8 x i1> [[TMP5]], i64 1
	; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP143]], label [[PRED_STORE_IF52:%.]], label [[PRED_STORE_CONTINUE53:%.]]
	; ENABLED_MASKED_STRIDED: pred.store.if52:
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP144:%.*]] = extractelement <8 x i32> [[TMP55]], i64 1
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP145:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP144]]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP146:%.*]] = extractelement <8 x i8> [[TMP138]], i64 1
	; ENABLED_MASKED_STRIDED-NEXT: store i8 [[TMP146]], i8* [[TMP145]], align 1
	; ENABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE53]]
	; ENABLED_MASKED_STRIDED: pred.store.continue53:
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP147:%.*]] = extractelement <8 x i1> [[TMP5]], i64 2
	; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP147]], label [[PRED_STORE_IF54:%.]], label [[PRED_STORE_CONTINUE55:%.]]
	; ENABLED_MASKED_STRIDED: pred.store.if54:
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP148:%.*]] = extractelement <8 x i32> [[TMP55]], i64 2
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP149:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP148]]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP150:%.*]] = extractelement <8 x i8> [[TMP138]], i64 2
	; ENABLED_MASKED_STRIDED-NEXT: store i8 [[TMP150]], i8* [[TMP149]], align 1
	; ENABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE55]]
	; ENABLED_MASKED_STRIDED: pred.store.continue55:
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP151:%.*]] = extractelement <8 x i1> [[TMP5]], i64 3
	; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP151]], label [[PRED_STORE_IF56:%.]], label [[PRED_STORE_CONTINUE57:%.]]
	; ENABLED_MASKED_STRIDED: pred.store.if56:
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP152:%.*]] = extractelement <8 x i32> [[TMP55]], i64 3
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP153:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP152]]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP154:%.*]] = extractelement <8 x i8> [[TMP138]], i64 3
	; ENABLED_MASKED_STRIDED-NEXT: store i8 [[TMP154]], i8* [[TMP153]], align 1
	; ENABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE57]]
	; ENABLED_MASKED_STRIDED: pred.store.continue57:
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP155:%.*]] = extractelement <8 x i1> [[TMP5]], i64 4
	; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP155]], label [[PRED_STORE_IF58:%.]], label [[PRED_STORE_CONTINUE59:%.]]
	; ENABLED_MASKED_STRIDED: pred.store.if58:
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP156:%.*]] = extractelement <8 x i32> [[TMP55]], i64 4
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP157:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP156]]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP158:%.*]] = extractelement <8 x i8> [[TMP138]], i64 4
	; ENABLED_MASKED_STRIDED-NEXT: store i8 [[TMP158]], i8* [[TMP157]], align 1
	; ENABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE59]]
	; ENABLED_MASKED_STRIDED: pred.store.continue59:
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP159:%.*]] = extractelement <8 x i1> [[TMP5]], i64 5
	; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP159]], label [[PRED_STORE_IF60:%.]], label [[PRED_STORE_CONTINUE61:%.]]
	; ENABLED_MASKED_STRIDED: pred.store.if60:
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP160:%.*]] = extractelement <8 x i32> [[TMP55]], i64 5
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP161:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP160]]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP162:%.*]] = extractelement <8 x i8> [[TMP138]], i64 5
	; ENABLED_MASKED_STRIDED-NEXT: store i8 [[TMP162]], i8* [[TMP161]], align 1
	; ENABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE61]]
	; ENABLED_MASKED_STRIDED: pred.store.continue61:
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP163:%.*]] = extractelement <8 x i1> [[TMP5]], i64 6
	; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP163]], label [[PRED_STORE_IF62:%.]], label [[PRED_STORE_CONTINUE63:%.]]
	; ENABLED_MASKED_STRIDED: pred.store.if62:
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP164:%.*]] = extractelement <8 x i32> [[TMP55]], i64 6
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP165:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP164]]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP166:%.*]] = extractelement <8 x i8> [[TMP138]], i64 6
	; ENABLED_MASKED_STRIDED-NEXT: store i8 [[TMP166]], i8* [[TMP165]], align 1
	; ENABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE63]]
	; ENABLED_MASKED_STRIDED: pred.store.continue63:
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP167:%.*]] = extractelement <8 x i1> [[TMP5]], i64 7
	; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP167]], label [[PRED_STORE_IF64:%.*]], label [[PRED_STORE_CONTINUE65]]
	; ENABLED_MASKED_STRIDED: pred.store.if64:
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP168:%.*]] = extractelement <8 x i32> [[TMP55]], i64 7			; ENABLED_MASKED_STRIDED-NEXT: [[TMP168:%.*]] = extractelement <8 x i32> [[TMP55]], i64 7
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP169:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP168]]			; ENABLED_MASKED_STRIDED-NEXT: [[TMP169:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP168]]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP170:%.*]] = extractelement <8 x i8> [[TMP138]], i64 7			; ENABLED_MASKED_STRIDED-NEXT: store i8 [[TMP167]], i8* [[TMP169]], align 1
	; ENABLED_MASKED_STRIDED-NEXT: store i8 [[TMP170]], i8* [[TMP169]], align 1			; ENABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE49]]
	; ENABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE65]]			; ENABLED_MASKED_STRIDED: pred.store.continue49:
	; ENABLED_MASKED_STRIDED: pred.store.continue65:
	; ENABLED_MASKED_STRIDED-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 8			; ENABLED_MASKED_STRIDED-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 8
	; ENABLED_MASKED_STRIDED-NEXT: [[VEC_IND_NEXT]] = add <8 x i32> [[VEC_IND]], <i32 -8, i32 -8, i32 -8, i32 -8, i32 -8, i32 -8, i32 -8, i32 -8>			; ENABLED_MASKED_STRIDED-NEXT: [[VEC_IND_NEXT]] = add <8 x i32> [[VEC_IND]], <i32 -8, i32 -8, i32 -8, i32 -8, i32 -8, i32 -8, i32 -8, i32 -8>
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP171:%.*]] = icmp eq i32 [[INDEX_NEXT]], 1024			; ENABLED_MASKED_STRIDED-NEXT: [[TMP170:%.*]] = icmp eq i32 [[INDEX_NEXT]], 1024
	; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP171]], label [[FOR_END:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP10:![0-9]+]]			; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP170]], label [[FOR_END:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP10:![0-9]+]]
	; ENABLED_MASKED_STRIDED: for.body:			; ENABLED_MASKED_STRIDED: for.body:
	; ENABLED_MASKED_STRIDED-NEXT: [[IX_024:%.]] = phi i32 [ [[INC:%.]], [[FOR_INC:%.]] ], [ 1024, [[ENTRY:%.]] ]			; ENABLED_MASKED_STRIDED-NEXT: [[IX_024:%.]] = phi i32 [ [[INC:%.]], [[FOR_INC:%.]] ], [ 1024, [[ENTRY:%.]] ]
	; ENABLED_MASKED_STRIDED-NEXT: [[CMP1:%.*]] = icmp ugt i32 [[IX_024]], [[CONV]]			; ENABLED_MASKED_STRIDED-NEXT: [[CMP1:%.*]] = icmp ugt i32 [[IX_024]], [[CONV]]
	; ENABLED_MASKED_STRIDED-NEXT: br i1 [[CMP1]], label [[IF_THEN:%.*]], label [[FOR_INC]]			; ENABLED_MASKED_STRIDED-NEXT: br i1 [[CMP1]], label [[IF_THEN:%.*]], label [[FOR_INC]]
	; ENABLED_MASKED_STRIDED: if.then:			; ENABLED_MASKED_STRIDED: if.then:
	; ENABLED_MASKED_STRIDED-NEXT: [[MUL:%.*]] = shl nuw nsw i32 [[IX_024]], 1			; ENABLED_MASKED_STRIDED-NEXT: [[MUL:%.*]] = shl nuw nsw i32 [[IX_024]], 1
	; ENABLED_MASKED_STRIDED-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i8, i8 [[P]], i32 [[MUL]]			; ENABLED_MASKED_STRIDED-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i8, i8 [[P]], i32 [[MUL]]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP172:%.]] = load i8, i8 [[ARRAYIDX]], align 1			; ENABLED_MASKED_STRIDED-NEXT: [[TMP171:%.]] = load i8, i8 [[ARRAYIDX]], align 1
	; ENABLED_MASKED_STRIDED-NEXT: [[ADD:%.*]] = or i32 [[MUL]], 1			; ENABLED_MASKED_STRIDED-NEXT: [[ADD:%.*]] = or i32 [[MUL]], 1
	; ENABLED_MASKED_STRIDED-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds i8, i8 [[P]], i32 [[ADD]]			; ENABLED_MASKED_STRIDED-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds i8, i8 [[P]], i32 [[ADD]]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP173:%.]] = load i8, i8 [[ARRAYIDX4]], align 1			; ENABLED_MASKED_STRIDED-NEXT: [[TMP172:%.]] = load i8, i8 [[ARRAYIDX4]], align 1
	; ENABLED_MASKED_STRIDED-NEXT: [[CMP_I:%.*]] = icmp slt i8 [[TMP172]], [[TMP173]]			; ENABLED_MASKED_STRIDED-NEXT: [[CMP_I:%.*]] = icmp slt i8 [[TMP171]], [[TMP172]]
	; ENABLED_MASKED_STRIDED-NEXT: [[SPEC_SELECT_I:%.*]] = select i1 [[CMP_I]], i8 [[TMP173]], i8 [[TMP172]]			; ENABLED_MASKED_STRIDED-NEXT: [[SPEC_SELECT_I:%.*]] = select i1 [[CMP_I]], i8 [[TMP172]], i8 [[TMP171]]
	; ENABLED_MASKED_STRIDED-NEXT: [[ARRAYIDX6:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[MUL]]			; ENABLED_MASKED_STRIDED-NEXT: [[ARRAYIDX6:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[MUL]]
	; ENABLED_MASKED_STRIDED-NEXT: store i8 [[SPEC_SELECT_I]], i8* [[ARRAYIDX6]], align 1			; ENABLED_MASKED_STRIDED-NEXT: store i8 [[SPEC_SELECT_I]], i8* [[ARRAYIDX6]], align 1
	; ENABLED_MASKED_STRIDED-NEXT: [[SUB:%.*]] = sub i8 0, [[SPEC_SELECT_I]]			; ENABLED_MASKED_STRIDED-NEXT: [[SUB:%.*]] = sub i8 0, [[SPEC_SELECT_I]]
	; ENABLED_MASKED_STRIDED-NEXT: [[ARRAYIDX11:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[ADD]]			; ENABLED_MASKED_STRIDED-NEXT: [[ARRAYIDX11:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[ADD]]
	; ENABLED_MASKED_STRIDED-NEXT: store i8 [[SUB]], i8* [[ARRAYIDX11]], align 1			; ENABLED_MASKED_STRIDED-NEXT: store i8 [[SUB]], i8* [[ARRAYIDX11]], align 1
	; ENABLED_MASKED_STRIDED-NEXT: br label [[FOR_INC]]			; ENABLED_MASKED_STRIDED-NEXT: br label [[FOR_INC]]
	; ENABLED_MASKED_STRIDED: for.inc:			; ENABLED_MASKED_STRIDED: for.inc:
	; ENABLED_MASKED_STRIDED-NEXT: [[INC]] = add nsw i32 [[IX_024]], -1			; ENABLED_MASKED_STRIDED-NEXT: [[INC]] = add nsw i32 [[IX_024]], -1
	▲ Show 20 Lines • Show All 72 Lines • ▼ Show 20 Lines
	; DISABLED_MASKED_STRIDED-NEXT: [[N_VEC:%.*]] = and i32 [[N_RND_UP]], -8			; DISABLED_MASKED_STRIDED-NEXT: [[N_VEC:%.*]] = and i32 [[N_RND_UP]], -8
	; DISABLED_MASKED_STRIDED-NEXT: [[TRIP_COUNT_MINUS_1:%.*]] = add i32 [[N]], -1			; DISABLED_MASKED_STRIDED-NEXT: [[TRIP_COUNT_MINUS_1:%.*]] = add i32 [[N]], -1
	; DISABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <8 x i32> poison, i32 [[TRIP_COUNT_MINUS_1]], i64 0			; DISABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <8 x i32> poison, i32 [[TRIP_COUNT_MINUS_1]], i64 0
	; DISABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <8 x i32> [[BROADCAST_SPLATINSERT]], <8 x i32> poison, <8 x i32> zeroinitializer			; DISABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <8 x i32> [[BROADCAST_SPLATINSERT]], <8 x i32> poison, <8 x i32> zeroinitializer
	; DISABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLATINSERT1:%.]] = insertelement <8 x i32> poison, i32 [[GUARD:%.]], i64 0			; DISABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLATINSERT1:%.]] = insertelement <8 x i32> poison, i32 [[GUARD:%.]], i64 0
	; DISABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLAT2:%.*]] = shufflevector <8 x i32> [[BROADCAST_SPLATINSERT1]], <8 x i32> poison, <8 x i32> zeroinitializer			; DISABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLAT2:%.*]] = shufflevector <8 x i32> [[BROADCAST_SPLATINSERT1]], <8 x i32> poison, <8 x i32> zeroinitializer
	; DISABLED_MASKED_STRIDED-NEXT: br label [[VECTOR_BODY:%.*]]			; DISABLED_MASKED_STRIDED-NEXT: br label [[VECTOR_BODY:%.*]]
	; DISABLED_MASKED_STRIDED: vector.body:			; DISABLED_MASKED_STRIDED: vector.body:
	; DISABLED_MASKED_STRIDED-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[PRED_STORE_CONTINUE62:%.*]] ]			; DISABLED_MASKED_STRIDED-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[PRED_STORE_CONTINUE46:%.*]] ]
	; DISABLED_MASKED_STRIDED-NEXT: [[VEC_IND:%.]] = phi <8 x i32> [ <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[PRED_STORE_CONTINUE62]] ]			; DISABLED_MASKED_STRIDED-NEXT: [[VEC_IND:%.]] = phi <8 x i32> [ <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[PRED_STORE_CONTINUE46]] ]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP0:%.*]] = icmp ule <8 x i32> [[VEC_IND]], [[BROADCAST_SPLAT]]			; DISABLED_MASKED_STRIDED-NEXT: [[TMP0:%.*]] = icmp ule <8 x i32> [[VEC_IND]], [[BROADCAST_SPLAT]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP1:%.*]] = icmp sgt <8 x i32> [[VEC_IND]], [[BROADCAST_SPLAT2]]			; DISABLED_MASKED_STRIDED-NEXT: [[TMP1:%.*]] = icmp sgt <8 x i32> [[VEC_IND]], [[BROADCAST_SPLAT2]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP2:%.*]] = shl nuw nsw <8 x i32> [[VEC_IND]], <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>			; DISABLED_MASKED_STRIDED-NEXT: [[TMP2:%.*]] = shl nuw nsw <8 x i32> [[VEC_IND]], <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP3:%.*]] = select <8 x i1> [[TMP0]], <8 x i1> [[TMP1]], <8 x i1> zeroinitializer			; DISABLED_MASKED_STRIDED-NEXT: [[TMP3:%.*]] = select <8 x i1> [[TMP0]], <8 x i1> [[TMP1]], <8 x i1> zeroinitializer
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP4:%.*]] = extractelement <8 x i1> [[TMP3]], i64 0			; DISABLED_MASKED_STRIDED-NEXT: [[TMP4:%.*]] = extractelement <8 x i1> [[TMP3]], i64 0
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP4]], label [[PRED_LOAD_IF:%.]], label [[PRED_LOAD_CONTINUE:%.]]			; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP4]], label [[PRED_LOAD_IF:%.]], label [[PRED_LOAD_CONTINUE:%.]]
	; DISABLED_MASKED_STRIDED: pred.load.if:			; DISABLED_MASKED_STRIDED: pred.load.if:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP5:%.*]] = extractelement <8 x i32> [[TMP2]], i64 0			; DISABLED_MASKED_STRIDED-NEXT: [[TMP5:%.*]] = extractelement <8 x i32> [[TMP2]], i64 0
	▲ Show 20 Lines • Show All 158 Lines • ▼ Show 20 Lines
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP102:%.*]] = select <8 x i1> [[TMP101]], <8 x i8> [[TMP100]], <8 x i8> [[TMP51]]			; DISABLED_MASKED_STRIDED-NEXT: [[TMP102:%.*]] = select <8 x i1> [[TMP101]], <8 x i8> [[TMP100]], <8 x i8> [[TMP51]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP103:%.*]] = extractelement <8 x i1> [[TMP3]], i64 0			; DISABLED_MASKED_STRIDED-NEXT: [[TMP103:%.*]] = extractelement <8 x i1> [[TMP3]], i64 0
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP103]], label [[PRED_STORE_IF:%.]], label [[PRED_STORE_CONTINUE:%.]]			; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP103]], label [[PRED_STORE_IF:%.]], label [[PRED_STORE_CONTINUE:%.]]
	; DISABLED_MASKED_STRIDED: pred.store.if:			; DISABLED_MASKED_STRIDED: pred.store.if:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP104:%.*]] = extractelement <8 x i32> [[TMP2]], i64 0			; DISABLED_MASKED_STRIDED-NEXT: [[TMP104:%.*]] = extractelement <8 x i32> [[TMP2]], i64 0
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP105:%.]] = getelementptr inbounds i8, i8 [[Q:%.*]], i32 [[TMP104]]			; DISABLED_MASKED_STRIDED-NEXT: [[TMP105:%.]] = getelementptr inbounds i8, i8 [[Q:%.*]], i32 [[TMP104]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP106:%.*]] = extractelement <8 x i8> [[TMP102]], i64 0			; DISABLED_MASKED_STRIDED-NEXT: [[TMP106:%.*]] = extractelement <8 x i8> [[TMP102]], i64 0
	; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP106]], i8* [[TMP105]], align 1			; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP106]], i8* [[TMP105]], align 1
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP107:%.*]] = extractelement <8 x i8> [[TMP102]], i64 0
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP108:%.*]] = sub i8 0, [[TMP107]]
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP109:%.*]] = extractelement <8 x i32> [[TMP52]], i64 0
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP110:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP109]]
				; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP108]], i8* [[TMP110]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE]]			; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE]]
	; DISABLED_MASKED_STRIDED: pred.store.continue:			; DISABLED_MASKED_STRIDED: pred.store.continue:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP107:%.*]] = extractelement <8 x i1> [[TMP3]], i64 1			; DISABLED_MASKED_STRIDED-NEXT: [[TMP111:%.*]] = extractelement <8 x i1> [[TMP3]], i64 1
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP107]], label [[PRED_STORE_IF33:%.]], label [[PRED_STORE_CONTINUE34:%.]]			; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP111]], label [[PRED_STORE_IF33:%.]], label [[PRED_STORE_CONTINUE34:%.]]
	; DISABLED_MASKED_STRIDED: pred.store.if33:			; DISABLED_MASKED_STRIDED: pred.store.if33:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP108:%.*]] = extractelement <8 x i32> [[TMP2]], i64 1			; DISABLED_MASKED_STRIDED-NEXT: [[TMP112:%.*]] = extractelement <8 x i32> [[TMP2]], i64 1
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP109:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP108]]			; DISABLED_MASKED_STRIDED-NEXT: [[TMP113:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP112]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP110:%.*]] = extractelement <8 x i8> [[TMP102]], i64 1			; DISABLED_MASKED_STRIDED-NEXT: [[TMP114:%.*]] = extractelement <8 x i8> [[TMP102]], i64 1
	; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP110]], i8* [[TMP109]], align 1			; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP114]], i8* [[TMP113]], align 1
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP115:%.*]] = extractelement <8 x i8> [[TMP102]], i64 1
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP116:%.*]] = sub i8 0, [[TMP115]]
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP117:%.*]] = extractelement <8 x i32> [[TMP52]], i64 1
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP118:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP117]]
				; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP116]], i8* [[TMP118]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE34]]			; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE34]]
	; DISABLED_MASKED_STRIDED: pred.store.continue34:			; DISABLED_MASKED_STRIDED: pred.store.continue34:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP111:%.*]] = extractelement <8 x i1> [[TMP3]], i64 2			; DISABLED_MASKED_STRIDED-NEXT: [[TMP119:%.*]] = extractelement <8 x i1> [[TMP3]], i64 2
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP111]], label [[PRED_STORE_IF35:%.]], label [[PRED_STORE_CONTINUE36:%.]]			; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP119]], label [[PRED_STORE_IF35:%.]], label [[PRED_STORE_CONTINUE36:%.]]
	; DISABLED_MASKED_STRIDED: pred.store.if35:			; DISABLED_MASKED_STRIDED: pred.store.if35:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP112:%.*]] = extractelement <8 x i32> [[TMP2]], i64 2			; DISABLED_MASKED_STRIDED-NEXT: [[TMP120:%.*]] = extractelement <8 x i32> [[TMP2]], i64 2
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP113:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP112]]			; DISABLED_MASKED_STRIDED-NEXT: [[TMP121:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP120]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP114:%.*]] = extractelement <8 x i8> [[TMP102]], i64 2			; DISABLED_MASKED_STRIDED-NEXT: [[TMP122:%.*]] = extractelement <8 x i8> [[TMP102]], i64 2
	; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP114]], i8* [[TMP113]], align 1			; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP122]], i8* [[TMP121]], align 1
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP123:%.*]] = extractelement <8 x i8> [[TMP102]], i64 2
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP124:%.*]] = sub i8 0, [[TMP123]]
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP125:%.*]] = extractelement <8 x i32> [[TMP52]], i64 2
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP126:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP125]]
				; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP124]], i8* [[TMP126]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE36]]			; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE36]]
	; DISABLED_MASKED_STRIDED: pred.store.continue36:			; DISABLED_MASKED_STRIDED: pred.store.continue36:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP115:%.*]] = extractelement <8 x i1> [[TMP3]], i64 3			; DISABLED_MASKED_STRIDED-NEXT: [[TMP127:%.*]] = extractelement <8 x i1> [[TMP3]], i64 3
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP115]], label [[PRED_STORE_IF37:%.]], label [[PRED_STORE_CONTINUE38:%.]]			; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP127]], label [[PRED_STORE_IF37:%.]], label [[PRED_STORE_CONTINUE38:%.]]
	; DISABLED_MASKED_STRIDED: pred.store.if37:			; DISABLED_MASKED_STRIDED: pred.store.if37:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP116:%.*]] = extractelement <8 x i32> [[TMP2]], i64 3			; DISABLED_MASKED_STRIDED-NEXT: [[TMP128:%.*]] = extractelement <8 x i32> [[TMP2]], i64 3
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP117:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP116]]			; DISABLED_MASKED_STRIDED-NEXT: [[TMP129:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP128]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP118:%.*]] = extractelement <8 x i8> [[TMP102]], i64 3			; DISABLED_MASKED_STRIDED-NEXT: [[TMP130:%.*]] = extractelement <8 x i8> [[TMP102]], i64 3
	; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP118]], i8* [[TMP117]], align 1			; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP130]], i8* [[TMP129]], align 1
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP131:%.*]] = extractelement <8 x i8> [[TMP102]], i64 3
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP132:%.*]] = sub i8 0, [[TMP131]]
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP133:%.*]] = extractelement <8 x i32> [[TMP52]], i64 3
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP134:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP133]]
				; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP132]], i8* [[TMP134]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE38]]			; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE38]]
	; DISABLED_MASKED_STRIDED: pred.store.continue38:			; DISABLED_MASKED_STRIDED: pred.store.continue38:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP119:%.*]] = extractelement <8 x i1> [[TMP3]], i64 4			; DISABLED_MASKED_STRIDED-NEXT: [[TMP135:%.*]] = extractelement <8 x i1> [[TMP3]], i64 4
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP119]], label [[PRED_STORE_IF39:%.]], label [[PRED_STORE_CONTINUE40:%.]]			; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP135]], label [[PRED_STORE_IF39:%.]], label [[PRED_STORE_CONTINUE40:%.]]
	; DISABLED_MASKED_STRIDED: pred.store.if39:			; DISABLED_MASKED_STRIDED: pred.store.if39:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP120:%.*]] = extractelement <8 x i32> [[TMP2]], i64 4			; DISABLED_MASKED_STRIDED-NEXT: [[TMP136:%.*]] = extractelement <8 x i32> [[TMP2]], i64 4
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP121:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP120]]			; DISABLED_MASKED_STRIDED-NEXT: [[TMP137:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP136]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP122:%.*]] = extractelement <8 x i8> [[TMP102]], i64 4			; DISABLED_MASKED_STRIDED-NEXT: [[TMP138:%.*]] = extractelement <8 x i8> [[TMP102]], i64 4
	; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP122]], i8* [[TMP121]], align 1			; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP138]], i8* [[TMP137]], align 1
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP139:%.*]] = extractelement <8 x i8> [[TMP102]], i64 4
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP140:%.*]] = sub i8 0, [[TMP139]]
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP141:%.*]] = extractelement <8 x i32> [[TMP52]], i64 4
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP142:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP141]]
				; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP140]], i8* [[TMP142]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE40]]			; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE40]]
	; DISABLED_MASKED_STRIDED: pred.store.continue40:			; DISABLED_MASKED_STRIDED: pred.store.continue40:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP123:%.*]] = extractelement <8 x i1> [[TMP3]], i64 5			; DISABLED_MASKED_STRIDED-NEXT: [[TMP143:%.*]] = extractelement <8 x i1> [[TMP3]], i64 5
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP123]], label [[PRED_STORE_IF41:%.]], label [[PRED_STORE_CONTINUE42:%.]]			; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP143]], label [[PRED_STORE_IF41:%.]], label [[PRED_STORE_CONTINUE42:%.]]
	; DISABLED_MASKED_STRIDED: pred.store.if41:			; DISABLED_MASKED_STRIDED: pred.store.if41:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP124:%.*]] = extractelement <8 x i32> [[TMP2]], i64 5			; DISABLED_MASKED_STRIDED-NEXT: [[TMP144:%.*]] = extractelement <8 x i32> [[TMP2]], i64 5
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP125:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP124]]			; DISABLED_MASKED_STRIDED-NEXT: [[TMP145:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP144]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP126:%.*]] = extractelement <8 x i8> [[TMP102]], i64 5			; DISABLED_MASKED_STRIDED-NEXT: [[TMP146:%.*]] = extractelement <8 x i8> [[TMP102]], i64 5
	; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP126]], i8* [[TMP125]], align 1			; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP146]], i8* [[TMP145]], align 1
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP147:%.*]] = extractelement <8 x i8> [[TMP102]], i64 5
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP148:%.*]] = sub i8 0, [[TMP147]]
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP149:%.*]] = extractelement <8 x i32> [[TMP52]], i64 5
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP150:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP149]]
				; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP148]], i8* [[TMP150]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE42]]			; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE42]]
	; DISABLED_MASKED_STRIDED: pred.store.continue42:			; DISABLED_MASKED_STRIDED: pred.store.continue42:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP127:%.*]] = extractelement <8 x i1> [[TMP3]], i64 6			; DISABLED_MASKED_STRIDED-NEXT: [[TMP151:%.*]] = extractelement <8 x i1> [[TMP3]], i64 6
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP127]], label [[PRED_STORE_IF43:%.]], label [[PRED_STORE_CONTINUE44:%.]]			; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP151]], label [[PRED_STORE_IF43:%.]], label [[PRED_STORE_CONTINUE44:%.]]
	; DISABLED_MASKED_STRIDED: pred.store.if43:			; DISABLED_MASKED_STRIDED: pred.store.if43:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP128:%.*]] = extractelement <8 x i32> [[TMP2]], i64 6			; DISABLED_MASKED_STRIDED-NEXT: [[TMP152:%.*]] = extractelement <8 x i32> [[TMP2]], i64 6
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP129:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP128]]			; DISABLED_MASKED_STRIDED-NEXT: [[TMP153:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP152]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP130:%.*]] = extractelement <8 x i8> [[TMP102]], i64 6			; DISABLED_MASKED_STRIDED-NEXT: [[TMP154:%.*]] = extractelement <8 x i8> [[TMP102]], i64 6
	; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP130]], i8* [[TMP129]], align 1			; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP154]], i8* [[TMP153]], align 1
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP155:%.*]] = extractelement <8 x i8> [[TMP102]], i64 6
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP156:%.*]] = sub i8 0, [[TMP155]]
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP157:%.*]] = extractelement <8 x i32> [[TMP52]], i64 6
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP158:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP157]]
				; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP156]], i8* [[TMP158]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE44]]			; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE44]]
	; DISABLED_MASKED_STRIDED: pred.store.continue44:			; DISABLED_MASKED_STRIDED: pred.store.continue44:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP131:%.*]] = extractelement <8 x i1> [[TMP3]], i64 7			; DISABLED_MASKED_STRIDED-NEXT: [[TMP159:%.*]] = extractelement <8 x i1> [[TMP3]], i64 7
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP131]], label [[PRED_STORE_IF45:%.]], label [[PRED_STORE_CONTINUE46:%.]]			; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP159]], label [[PRED_STORE_IF45:%.*]], label [[PRED_STORE_CONTINUE46]]
	; DISABLED_MASKED_STRIDED: pred.store.if45:			; DISABLED_MASKED_STRIDED: pred.store.if45:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP132:%.*]] = extractelement <8 x i32> [[TMP2]], i64 7			; DISABLED_MASKED_STRIDED-NEXT: [[TMP160:%.*]] = extractelement <8 x i32> [[TMP2]], i64 7
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP133:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP132]]			; DISABLED_MASKED_STRIDED-NEXT: [[TMP161:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP160]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP134:%.*]] = extractelement <8 x i8> [[TMP102]], i64 7			; DISABLED_MASKED_STRIDED-NEXT: [[TMP162:%.*]] = extractelement <8 x i8> [[TMP102]], i64 7
	; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP134]], i8* [[TMP133]], align 1			; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP162]], i8* [[TMP161]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE46]]			; DISABLED_MASKED_STRIDED-NEXT: [[TMP163:%.*]] = extractelement <8 x i8> [[TMP102]], i64 7
	; DISABLED_MASKED_STRIDED: pred.store.continue46:			; DISABLED_MASKED_STRIDED-NEXT: [[TMP164:%.*]] = sub i8 0, [[TMP163]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP135:%.*]] = sub <8 x i8> zeroinitializer, [[TMP102]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP136:%.*]] = extractelement <8 x i1> [[TMP3]], i64 0
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP136]], label [[PRED_STORE_IF47:%.]], label [[PRED_STORE_CONTINUE48:%.]]
	; DISABLED_MASKED_STRIDED: pred.store.if47:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP137:%.*]] = extractelement <8 x i32> [[TMP52]], i64 0
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP138:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP137]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP139:%.*]] = extractelement <8 x i8> [[TMP135]], i64 0
	; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP139]], i8* [[TMP138]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE48]]
	; DISABLED_MASKED_STRIDED: pred.store.continue48:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP140:%.*]] = extractelement <8 x i1> [[TMP3]], i64 1
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP140]], label [[PRED_STORE_IF49:%.]], label [[PRED_STORE_CONTINUE50:%.]]
	; DISABLED_MASKED_STRIDED: pred.store.if49:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP141:%.*]] = extractelement <8 x i32> [[TMP52]], i64 1
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP142:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP141]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP143:%.*]] = extractelement <8 x i8> [[TMP135]], i64 1
	; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP143]], i8* [[TMP142]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE50]]
	; DISABLED_MASKED_STRIDED: pred.store.continue50:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP144:%.*]] = extractelement <8 x i1> [[TMP3]], i64 2
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP144]], label [[PRED_STORE_IF51:%.]], label [[PRED_STORE_CONTINUE52:%.]]
	; DISABLED_MASKED_STRIDED: pred.store.if51:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP145:%.*]] = extractelement <8 x i32> [[TMP52]], i64 2
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP146:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP145]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP147:%.*]] = extractelement <8 x i8> [[TMP135]], i64 2
	; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP147]], i8* [[TMP146]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE52]]
	; DISABLED_MASKED_STRIDED: pred.store.continue52:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP148:%.*]] = extractelement <8 x i1> [[TMP3]], i64 3
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP148]], label [[PRED_STORE_IF53:%.]], label [[PRED_STORE_CONTINUE54:%.]]
	; DISABLED_MASKED_STRIDED: pred.store.if53:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP149:%.*]] = extractelement <8 x i32> [[TMP52]], i64 3
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP150:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP149]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP151:%.*]] = extractelement <8 x i8> [[TMP135]], i64 3
	; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP151]], i8* [[TMP150]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE54]]
	; DISABLED_MASKED_STRIDED: pred.store.continue54:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP152:%.*]] = extractelement <8 x i1> [[TMP3]], i64 4
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP152]], label [[PRED_STORE_IF55:%.]], label [[PRED_STORE_CONTINUE56:%.]]
	; DISABLED_MASKED_STRIDED: pred.store.if55:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP153:%.*]] = extractelement <8 x i32> [[TMP52]], i64 4
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP154:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP153]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP155:%.*]] = extractelement <8 x i8> [[TMP135]], i64 4
	; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP155]], i8* [[TMP154]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE56]]
	; DISABLED_MASKED_STRIDED: pred.store.continue56:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP156:%.*]] = extractelement <8 x i1> [[TMP3]], i64 5
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP156]], label [[PRED_STORE_IF57:%.]], label [[PRED_STORE_CONTINUE58:%.]]
	; DISABLED_MASKED_STRIDED: pred.store.if57:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP157:%.*]] = extractelement <8 x i32> [[TMP52]], i64 5
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP158:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP157]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP159:%.*]] = extractelement <8 x i8> [[TMP135]], i64 5
	; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP159]], i8* [[TMP158]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE58]]
	; DISABLED_MASKED_STRIDED: pred.store.continue58:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP160:%.*]] = extractelement <8 x i1> [[TMP3]], i64 6
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP160]], label [[PRED_STORE_IF59:%.]], label [[PRED_STORE_CONTINUE60:%.]]
	; DISABLED_MASKED_STRIDED: pred.store.if59:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP161:%.*]] = extractelement <8 x i32> [[TMP52]], i64 6
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP162:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP161]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP163:%.*]] = extractelement <8 x i8> [[TMP135]], i64 6
	; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP163]], i8* [[TMP162]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE60]]
	; DISABLED_MASKED_STRIDED: pred.store.continue60:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP164:%.*]] = extractelement <8 x i1> [[TMP3]], i64 7
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP164]], label [[PRED_STORE_IF61:%.*]], label [[PRED_STORE_CONTINUE62]]
	; DISABLED_MASKED_STRIDED: pred.store.if61:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP165:%.*]] = extractelement <8 x i32> [[TMP52]], i64 7			; DISABLED_MASKED_STRIDED-NEXT: [[TMP165:%.*]] = extractelement <8 x i32> [[TMP52]], i64 7
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP166:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP165]]			; DISABLED_MASKED_STRIDED-NEXT: [[TMP166:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP165]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP167:%.*]] = extractelement <8 x i8> [[TMP135]], i64 7			; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP164]], i8* [[TMP166]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP167]], i8* [[TMP166]], align 1			; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE46]]
	; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE62]]			; DISABLED_MASKED_STRIDED: pred.store.continue46:
	; DISABLED_MASKED_STRIDED: pred.store.continue62:
	; DISABLED_MASKED_STRIDED-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 8			; DISABLED_MASKED_STRIDED-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 8
	; DISABLED_MASKED_STRIDED-NEXT: [[VEC_IND_NEXT]] = add <8 x i32> [[VEC_IND]], <i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8>			; DISABLED_MASKED_STRIDED-NEXT: [[VEC_IND_NEXT]] = add <8 x i32> [[VEC_IND]], <i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8>
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP168:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]			; DISABLED_MASKED_STRIDED-NEXT: [[TMP167:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP168]], label [[FOR_END]], label [[VECTOR_BODY]], !llvm.loop [[LOOP10:![0-9]+]]			; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP167]], label [[FOR_END]], label [[VECTOR_BODY]], !llvm.loop [[LOOP10:![0-9]+]]
	; DISABLED_MASKED_STRIDED: for.end:			; DISABLED_MASKED_STRIDED: for.end:
	; DISABLED_MASKED_STRIDED-NEXT: ret void			; DISABLED_MASKED_STRIDED-NEXT: ret void
	;			;
	; ENABLED_MASKED_STRIDED-LABEL: @masked_strided2_unknown_tc(			; ENABLED_MASKED_STRIDED-LABEL: @masked_strided2_unknown_tc(
	; ENABLED_MASKED_STRIDED-NEXT: entry:			; ENABLED_MASKED_STRIDED-NEXT: entry:
	; ENABLED_MASKED_STRIDED-NEXT: [[CMP22:%.]] = icmp sgt i32 [[N:%.]], 0			; ENABLED_MASKED_STRIDED-NEXT: [[CMP22:%.]] = icmp sgt i32 [[N:%.]], 0
	; ENABLED_MASKED_STRIDED-NEXT: br i1 [[CMP22]], label [[VECTOR_PH:%.]], label [[FOR_END:%.]]			; ENABLED_MASKED_STRIDED-NEXT: br i1 [[CMP22]], label [[VECTOR_PH:%.]], label [[FOR_END:%.]]
	; ENABLED_MASKED_STRIDED: vector.ph:			; ENABLED_MASKED_STRIDED: vector.ph:
	▲ Show 20 Lines • Show All 108 Lines • ▼ Show 20 Lines
	; DISABLED_MASKED_STRIDED: vector.ph:			; DISABLED_MASKED_STRIDED: vector.ph:
	; DISABLED_MASKED_STRIDED-NEXT: [[N_RND_UP:%.*]] = add i32 [[N]], 7			; DISABLED_MASKED_STRIDED-NEXT: [[N_RND_UP:%.*]] = add i32 [[N]], 7
	; DISABLED_MASKED_STRIDED-NEXT: [[N_VEC:%.*]] = and i32 [[N_RND_UP]], -8			; DISABLED_MASKED_STRIDED-NEXT: [[N_VEC:%.*]] = and i32 [[N_RND_UP]], -8
	; DISABLED_MASKED_STRIDED-NEXT: [[TRIP_COUNT_MINUS_1:%.*]] = add i32 [[N]], -1			; DISABLED_MASKED_STRIDED-NEXT: [[TRIP_COUNT_MINUS_1:%.*]] = add i32 [[N]], -1
	; DISABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <8 x i32> poison, i32 [[TRIP_COUNT_MINUS_1]], i64 0			; DISABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <8 x i32> poison, i32 [[TRIP_COUNT_MINUS_1]], i64 0
	; DISABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <8 x i32> [[BROADCAST_SPLATINSERT]], <8 x i32> poison, <8 x i32> zeroinitializer			; DISABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <8 x i32> [[BROADCAST_SPLATINSERT]], <8 x i32> poison, <8 x i32> zeroinitializer
	; DISABLED_MASKED_STRIDED-NEXT: br label [[VECTOR_BODY:%.*]]			; DISABLED_MASKED_STRIDED-NEXT: br label [[VECTOR_BODY:%.*]]
	; DISABLED_MASKED_STRIDED: vector.body:			; DISABLED_MASKED_STRIDED: vector.body:
	; DISABLED_MASKED_STRIDED-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[PRED_STORE_CONTINUE60:%.*]] ]			; DISABLED_MASKED_STRIDED-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[PRED_STORE_CONTINUE44:%.*]] ]
	; DISABLED_MASKED_STRIDED-NEXT: [[VEC_IND:%.]] = phi <8 x i32> [ <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[PRED_STORE_CONTINUE60]] ]			; DISABLED_MASKED_STRIDED-NEXT: [[VEC_IND:%.]] = phi <8 x i32> [ <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[PRED_STORE_CONTINUE44]] ]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP0:%.*]] = icmp ule <8 x i32> [[VEC_IND]], [[BROADCAST_SPLAT]]			; DISABLED_MASKED_STRIDED-NEXT: [[TMP0:%.*]] = icmp ule <8 x i32> [[VEC_IND]], [[BROADCAST_SPLAT]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP1:%.*]] = shl nuw nsw <8 x i32> [[VEC_IND]], <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>			; DISABLED_MASKED_STRIDED-NEXT: [[TMP1:%.*]] = shl nuw nsw <8 x i32> [[VEC_IND]], <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP2:%.*]] = extractelement <8 x i1> [[TMP0]], i64 0			; DISABLED_MASKED_STRIDED-NEXT: [[TMP2:%.*]] = extractelement <8 x i1> [[TMP0]], i64 0
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP2]], label [[PRED_LOAD_IF:%.]], label [[PRED_LOAD_CONTINUE:%.]]			; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP2]], label [[PRED_LOAD_IF:%.]], label [[PRED_LOAD_CONTINUE:%.]]
	; DISABLED_MASKED_STRIDED: pred.load.if:			; DISABLED_MASKED_STRIDED: pred.load.if:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP3:%.*]] = extractelement <8 x i32> [[TMP1]], i64 0			; DISABLED_MASKED_STRIDED-NEXT: [[TMP3:%.*]] = extractelement <8 x i32> [[TMP1]], i64 0
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP4:%.]] = getelementptr inbounds i8, i8 [[P:%.*]], i32 [[TMP3]]			; DISABLED_MASKED_STRIDED-NEXT: [[TMP4:%.]] = getelementptr inbounds i8, i8 [[P:%.*]], i32 [[TMP3]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP5:%.]] = load i8, i8 [[TMP4]], align 1			; DISABLED_MASKED_STRIDED-NEXT: [[TMP5:%.]] = load i8, i8 [[TMP4]], align 1
	▲ Show 20 Lines • Show All 156 Lines • ▼ Show 20 Lines
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP100:%.*]] = select <8 x i1> [[TMP99]], <8 x i8> [[TMP98]], <8 x i8> [[TMP49]]			; DISABLED_MASKED_STRIDED-NEXT: [[TMP100:%.*]] = select <8 x i1> [[TMP99]], <8 x i8> [[TMP98]], <8 x i8> [[TMP49]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP101:%.*]] = extractelement <8 x i1> [[TMP0]], i64 0			; DISABLED_MASKED_STRIDED-NEXT: [[TMP101:%.*]] = extractelement <8 x i1> [[TMP0]], i64 0
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP101]], label [[PRED_STORE_IF:%.]], label [[PRED_STORE_CONTINUE:%.]]			; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP101]], label [[PRED_STORE_IF:%.]], label [[PRED_STORE_CONTINUE:%.]]
	; DISABLED_MASKED_STRIDED: pred.store.if:			; DISABLED_MASKED_STRIDED: pred.store.if:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP102:%.*]] = extractelement <8 x i32> [[TMP1]], i64 0			; DISABLED_MASKED_STRIDED-NEXT: [[TMP102:%.*]] = extractelement <8 x i32> [[TMP1]], i64 0
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP103:%.]] = getelementptr inbounds i8, i8 [[Q:%.*]], i32 [[TMP102]]			; DISABLED_MASKED_STRIDED-NEXT: [[TMP103:%.]] = getelementptr inbounds i8, i8 [[Q:%.*]], i32 [[TMP102]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP104:%.*]] = extractelement <8 x i8> [[TMP100]], i64 0			; DISABLED_MASKED_STRIDED-NEXT: [[TMP104:%.*]] = extractelement <8 x i8> [[TMP100]], i64 0
	; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP104]], i8* [[TMP103]], align 1			; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP104]], i8* [[TMP103]], align 1
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP105:%.*]] = extractelement <8 x i8> [[TMP100]], i64 0
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP106:%.*]] = sub i8 0, [[TMP105]]
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP107:%.*]] = extractelement <8 x i32> [[TMP50]], i64 0
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP108:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP107]]
				; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP106]], i8* [[TMP108]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE]]			; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE]]
	; DISABLED_MASKED_STRIDED: pred.store.continue:			; DISABLED_MASKED_STRIDED: pred.store.continue:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP105:%.*]] = extractelement <8 x i1> [[TMP0]], i64 1			; DISABLED_MASKED_STRIDED-NEXT: [[TMP109:%.*]] = extractelement <8 x i1> [[TMP0]], i64 1
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP105]], label [[PRED_STORE_IF31:%.]], label [[PRED_STORE_CONTINUE32:%.]]			; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP109]], label [[PRED_STORE_IF31:%.]], label [[PRED_STORE_CONTINUE32:%.]]
	; DISABLED_MASKED_STRIDED: pred.store.if31:			; DISABLED_MASKED_STRIDED: pred.store.if31:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP106:%.*]] = extractelement <8 x i32> [[TMP1]], i64 1			; DISABLED_MASKED_STRIDED-NEXT: [[TMP110:%.*]] = extractelement <8 x i32> [[TMP1]], i64 1
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP107:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP106]]			; DISABLED_MASKED_STRIDED-NEXT: [[TMP111:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP110]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP108:%.*]] = extractelement <8 x i8> [[TMP100]], i64 1			; DISABLED_MASKED_STRIDED-NEXT: [[TMP112:%.*]] = extractelement <8 x i8> [[TMP100]], i64 1
	; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP108]], i8* [[TMP107]], align 1			; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP112]], i8* [[TMP111]], align 1
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP113:%.*]] = extractelement <8 x i8> [[TMP100]], i64 1
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP114:%.*]] = sub i8 0, [[TMP113]]
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP115:%.*]] = extractelement <8 x i32> [[TMP50]], i64 1
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP116:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP115]]
				; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP114]], i8* [[TMP116]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE32]]			; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE32]]
	; DISABLED_MASKED_STRIDED: pred.store.continue32:			; DISABLED_MASKED_STRIDED: pred.store.continue32:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP109:%.*]] = extractelement <8 x i1> [[TMP0]], i64 2			; DISABLED_MASKED_STRIDED-NEXT: [[TMP117:%.*]] = extractelement <8 x i1> [[TMP0]], i64 2
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP109]], label [[PRED_STORE_IF33:%.]], label [[PRED_STORE_CONTINUE34:%.]]			; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP117]], label [[PRED_STORE_IF33:%.]], label [[PRED_STORE_CONTINUE34:%.]]
	; DISABLED_MASKED_STRIDED: pred.store.if33:			; DISABLED_MASKED_STRIDED: pred.store.if33:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP110:%.*]] = extractelement <8 x i32> [[TMP1]], i64 2			; DISABLED_MASKED_STRIDED-NEXT: [[TMP118:%.*]] = extractelement <8 x i32> [[TMP1]], i64 2
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP111:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP110]]			; DISABLED_MASKED_STRIDED-NEXT: [[TMP119:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP118]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP112:%.*]] = extractelement <8 x i8> [[TMP100]], i64 2			; DISABLED_MASKED_STRIDED-NEXT: [[TMP120:%.*]] = extractelement <8 x i8> [[TMP100]], i64 2
	; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP112]], i8* [[TMP111]], align 1			; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP120]], i8* [[TMP119]], align 1
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP121:%.*]] = extractelement <8 x i8> [[TMP100]], i64 2
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP122:%.*]] = sub i8 0, [[TMP121]]
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP123:%.*]] = extractelement <8 x i32> [[TMP50]], i64 2
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP124:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP123]]
				; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP122]], i8* [[TMP124]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE34]]			; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE34]]
	; DISABLED_MASKED_STRIDED: pred.store.continue34:			; DISABLED_MASKED_STRIDED: pred.store.continue34:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP113:%.*]] = extractelement <8 x i1> [[TMP0]], i64 3			; DISABLED_MASKED_STRIDED-NEXT: [[TMP125:%.*]] = extractelement <8 x i1> [[TMP0]], i64 3
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP113]], label [[PRED_STORE_IF35:%.]], label [[PRED_STORE_CONTINUE36:%.]]			; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP125]], label [[PRED_STORE_IF35:%.]], label [[PRED_STORE_CONTINUE36:%.]]
	; DISABLED_MASKED_STRIDED: pred.store.if35:			; DISABLED_MASKED_STRIDED: pred.store.if35:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP114:%.*]] = extractelement <8 x i32> [[TMP1]], i64 3			; DISABLED_MASKED_STRIDED-NEXT: [[TMP126:%.*]] = extractelement <8 x i32> [[TMP1]], i64 3
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP115:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP114]]			; DISABLED_MASKED_STRIDED-NEXT: [[TMP127:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP126]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP116:%.*]] = extractelement <8 x i8> [[TMP100]], i64 3			; DISABLED_MASKED_STRIDED-NEXT: [[TMP128:%.*]] = extractelement <8 x i8> [[TMP100]], i64 3
	; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP116]], i8* [[TMP115]], align 1			; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP128]], i8* [[TMP127]], align 1
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP129:%.*]] = extractelement <8 x i8> [[TMP100]], i64 3
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP130:%.*]] = sub i8 0, [[TMP129]]
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP131:%.*]] = extractelement <8 x i32> [[TMP50]], i64 3
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP132:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP131]]
				; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP130]], i8* [[TMP132]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE36]]			; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE36]]
	; DISABLED_MASKED_STRIDED: pred.store.continue36:			; DISABLED_MASKED_STRIDED: pred.store.continue36:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP117:%.*]] = extractelement <8 x i1> [[TMP0]], i64 4			; DISABLED_MASKED_STRIDED-NEXT: [[TMP133:%.*]] = extractelement <8 x i1> [[TMP0]], i64 4
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP117]], label [[PRED_STORE_IF37:%.]], label [[PRED_STORE_CONTINUE38:%.]]			; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP133]], label [[PRED_STORE_IF37:%.]], label [[PRED_STORE_CONTINUE38:%.]]
	; DISABLED_MASKED_STRIDED: pred.store.if37:			; DISABLED_MASKED_STRIDED: pred.store.if37:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP118:%.*]] = extractelement <8 x i32> [[TMP1]], i64 4			; DISABLED_MASKED_STRIDED-NEXT: [[TMP134:%.*]] = extractelement <8 x i32> [[TMP1]], i64 4
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP119:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP118]]			; DISABLED_MASKED_STRIDED-NEXT: [[TMP135:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP134]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP120:%.*]] = extractelement <8 x i8> [[TMP100]], i64 4			; DISABLED_MASKED_STRIDED-NEXT: [[TMP136:%.*]] = extractelement <8 x i8> [[TMP100]], i64 4
	; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP120]], i8* [[TMP119]], align 1			; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP136]], i8* [[TMP135]], align 1
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP137:%.*]] = extractelement <8 x i8> [[TMP100]], i64 4
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP138:%.*]] = sub i8 0, [[TMP137]]
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP139:%.*]] = extractelement <8 x i32> [[TMP50]], i64 4
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP140:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP139]]
				; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP138]], i8* [[TMP140]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE38]]			; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE38]]
	; DISABLED_MASKED_STRIDED: pred.store.continue38:			; DISABLED_MASKED_STRIDED: pred.store.continue38:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP121:%.*]] = extractelement <8 x i1> [[TMP0]], i64 5			; DISABLED_MASKED_STRIDED-NEXT: [[TMP141:%.*]] = extractelement <8 x i1> [[TMP0]], i64 5
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP121]], label [[PRED_STORE_IF39:%.]], label [[PRED_STORE_CONTINUE40:%.]]			; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP141]], label [[PRED_STORE_IF39:%.]], label [[PRED_STORE_CONTINUE40:%.]]
	; DISABLED_MASKED_STRIDED: pred.store.if39:			; DISABLED_MASKED_STRIDED: pred.store.if39:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP122:%.*]] = extractelement <8 x i32> [[TMP1]], i64 5			; DISABLED_MASKED_STRIDED-NEXT: [[TMP142:%.*]] = extractelement <8 x i32> [[TMP1]], i64 5
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP123:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP122]]			; DISABLED_MASKED_STRIDED-NEXT: [[TMP143:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP142]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP124:%.*]] = extractelement <8 x i8> [[TMP100]], i64 5			; DISABLED_MASKED_STRIDED-NEXT: [[TMP144:%.*]] = extractelement <8 x i8> [[TMP100]], i64 5
	; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP124]], i8* [[TMP123]], align 1			; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP144]], i8* [[TMP143]], align 1
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP145:%.*]] = extractelement <8 x i8> [[TMP100]], i64 5
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP146:%.*]] = sub i8 0, [[TMP145]]
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP147:%.*]] = extractelement <8 x i32> [[TMP50]], i64 5
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP148:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP147]]
				; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP146]], i8* [[TMP148]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE40]]			; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE40]]
	; DISABLED_MASKED_STRIDED: pred.store.continue40:			; DISABLED_MASKED_STRIDED: pred.store.continue40:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP125:%.*]] = extractelement <8 x i1> [[TMP0]], i64 6			; DISABLED_MASKED_STRIDED-NEXT: [[TMP149:%.*]] = extractelement <8 x i1> [[TMP0]], i64 6
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP125]], label [[PRED_STORE_IF41:%.]], label [[PRED_STORE_CONTINUE42:%.]]			; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP149]], label [[PRED_STORE_IF41:%.]], label [[PRED_STORE_CONTINUE42:%.]]
	; DISABLED_MASKED_STRIDED: pred.store.if41:			; DISABLED_MASKED_STRIDED: pred.store.if41:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP126:%.*]] = extractelement <8 x i32> [[TMP1]], i64 6			; DISABLED_MASKED_STRIDED-NEXT: [[TMP150:%.*]] = extractelement <8 x i32> [[TMP1]], i64 6
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP127:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP126]]			; DISABLED_MASKED_STRIDED-NEXT: [[TMP151:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP150]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP128:%.*]] = extractelement <8 x i8> [[TMP100]], i64 6			; DISABLED_MASKED_STRIDED-NEXT: [[TMP152:%.*]] = extractelement <8 x i8> [[TMP100]], i64 6
	; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP128]], i8* [[TMP127]], align 1			; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP152]], i8* [[TMP151]], align 1
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP153:%.*]] = extractelement <8 x i8> [[TMP100]], i64 6
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP154:%.*]] = sub i8 0, [[TMP153]]
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP155:%.*]] = extractelement <8 x i32> [[TMP50]], i64 6
				; DISABLED_MASKED_STRIDED-NEXT: [[TMP156:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP155]]
				; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP154]], i8* [[TMP156]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE42]]			; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE42]]
	; DISABLED_MASKED_STRIDED: pred.store.continue42:			; DISABLED_MASKED_STRIDED: pred.store.continue42:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP129:%.*]] = extractelement <8 x i1> [[TMP0]], i64 7			; DISABLED_MASKED_STRIDED-NEXT: [[TMP157:%.*]] = extractelement <8 x i1> [[TMP0]], i64 7
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP129]], label [[PRED_STORE_IF43:%.]], label [[PRED_STORE_CONTINUE44:%.]]			; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP157]], label [[PRED_STORE_IF43:%.*]], label [[PRED_STORE_CONTINUE44]]
	; DISABLED_MASKED_STRIDED: pred.store.if43:			; DISABLED_MASKED_STRIDED: pred.store.if43:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP130:%.*]] = extractelement <8 x i32> [[TMP1]], i64 7			; DISABLED_MASKED_STRIDED-NEXT: [[TMP158:%.*]] = extractelement <8 x i32> [[TMP1]], i64 7
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP131:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP130]]			; DISABLED_MASKED_STRIDED-NEXT: [[TMP159:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP158]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP132:%.*]] = extractelement <8 x i8> [[TMP100]], i64 7			; DISABLED_MASKED_STRIDED-NEXT: [[TMP160:%.*]] = extractelement <8 x i8> [[TMP100]], i64 7
	; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP132]], i8* [[TMP131]], align 1			; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP160]], i8* [[TMP159]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE44]]			; DISABLED_MASKED_STRIDED-NEXT: [[TMP161:%.*]] = extractelement <8 x i8> [[TMP100]], i64 7
	; DISABLED_MASKED_STRIDED: pred.store.continue44:			; DISABLED_MASKED_STRIDED-NEXT: [[TMP162:%.*]] = sub i8 0, [[TMP161]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP133:%.*]] = sub <8 x i8> zeroinitializer, [[TMP100]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP134:%.*]] = extractelement <8 x i1> [[TMP0]], i64 0
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP134]], label [[PRED_STORE_IF45:%.]], label [[PRED_STORE_CONTINUE46:%.]]
	; DISABLED_MASKED_STRIDED: pred.store.if45:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP135:%.*]] = extractelement <8 x i32> [[TMP50]], i64 0
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP136:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP135]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP137:%.*]] = extractelement <8 x i8> [[TMP133]], i64 0
	; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP137]], i8* [[TMP136]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE46]]
	; DISABLED_MASKED_STRIDED: pred.store.continue46:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP138:%.*]] = extractelement <8 x i1> [[TMP0]], i64 1
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP138]], label [[PRED_STORE_IF47:%.]], label [[PRED_STORE_CONTINUE48:%.]]
	; DISABLED_MASKED_STRIDED: pred.store.if47:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP139:%.*]] = extractelement <8 x i32> [[TMP50]], i64 1
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP140:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP139]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP141:%.*]] = extractelement <8 x i8> [[TMP133]], i64 1
	; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP141]], i8* [[TMP140]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE48]]
	; DISABLED_MASKED_STRIDED: pred.store.continue48:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP142:%.*]] = extractelement <8 x i1> [[TMP0]], i64 2
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP142]], label [[PRED_STORE_IF49:%.]], label [[PRED_STORE_CONTINUE50:%.]]
	; DISABLED_MASKED_STRIDED: pred.store.if49:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP143:%.*]] = extractelement <8 x i32> [[TMP50]], i64 2
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP144:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP143]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP145:%.*]] = extractelement <8 x i8> [[TMP133]], i64 2
	; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP145]], i8* [[TMP144]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE50]]
	; DISABLED_MASKED_STRIDED: pred.store.continue50:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP146:%.*]] = extractelement <8 x i1> [[TMP0]], i64 3
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP146]], label [[PRED_STORE_IF51:%.]], label [[PRED_STORE_CONTINUE52:%.]]
	; DISABLED_MASKED_STRIDED: pred.store.if51:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP147:%.*]] = extractelement <8 x i32> [[TMP50]], i64 3
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP148:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP147]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP149:%.*]] = extractelement <8 x i8> [[TMP133]], i64 3
	; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP149]], i8* [[TMP148]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE52]]
	; DISABLED_MASKED_STRIDED: pred.store.continue52:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP150:%.*]] = extractelement <8 x i1> [[TMP0]], i64 4
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP150]], label [[PRED_STORE_IF53:%.]], label [[PRED_STORE_CONTINUE54:%.]]
	; DISABLED_MASKED_STRIDED: pred.store.if53:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP151:%.*]] = extractelement <8 x i32> [[TMP50]], i64 4
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP152:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP151]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP153:%.*]] = extractelement <8 x i8> [[TMP133]], i64 4
	; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP153]], i8* [[TMP152]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE54]]
	; DISABLED_MASKED_STRIDED: pred.store.continue54:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP154:%.*]] = extractelement <8 x i1> [[TMP0]], i64 5
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP154]], label [[PRED_STORE_IF55:%.]], label [[PRED_STORE_CONTINUE56:%.]]
	; DISABLED_MASKED_STRIDED: pred.store.if55:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP155:%.*]] = extractelement <8 x i32> [[TMP50]], i64 5
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP156:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP155]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP157:%.*]] = extractelement <8 x i8> [[TMP133]], i64 5
	; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP157]], i8* [[TMP156]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE56]]
	; DISABLED_MASKED_STRIDED: pred.store.continue56:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP158:%.*]] = extractelement <8 x i1> [[TMP0]], i64 6
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP158]], label [[PRED_STORE_IF57:%.]], label [[PRED_STORE_CONTINUE58:%.]]
	; DISABLED_MASKED_STRIDED: pred.store.if57:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP159:%.*]] = extractelement <8 x i32> [[TMP50]], i64 6
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP160:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP159]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP161:%.*]] = extractelement <8 x i8> [[TMP133]], i64 6
	; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP161]], i8* [[TMP160]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE58]]
	; DISABLED_MASKED_STRIDED: pred.store.continue58:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP162:%.*]] = extractelement <8 x i1> [[TMP0]], i64 7
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP162]], label [[PRED_STORE_IF59:%.*]], label [[PRED_STORE_CONTINUE60]]
	; DISABLED_MASKED_STRIDED: pred.store.if59:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP163:%.*]] = extractelement <8 x i32> [[TMP50]], i64 7			; DISABLED_MASKED_STRIDED-NEXT: [[TMP163:%.*]] = extractelement <8 x i32> [[TMP50]], i64 7
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP164:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP163]]			; DISABLED_MASKED_STRIDED-NEXT: [[TMP164:%.]] = getelementptr inbounds i8, i8 [[Q]], i32 [[TMP163]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP165:%.*]] = extractelement <8 x i8> [[TMP133]], i64 7			; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP162]], i8* [[TMP164]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: store i8 [[TMP165]], i8* [[TMP164]], align 1			; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE44]]
	; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_STORE_CONTINUE60]]			; DISABLED_MASKED_STRIDED: pred.store.continue44:
	; DISABLED_MASKED_STRIDED: pred.store.continue60:
	; DISABLED_MASKED_STRIDED-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 8			; DISABLED_MASKED_STRIDED-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 8
	; DISABLED_MASKED_STRIDED-NEXT: [[VEC_IND_NEXT]] = add <8 x i32> [[VEC_IND]], <i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8>			; DISABLED_MASKED_STRIDED-NEXT: [[VEC_IND_NEXT]] = add <8 x i32> [[VEC_IND]], <i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8>
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP166:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]			; DISABLED_MASKED_STRIDED-NEXT: [[TMP165:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP166]], label [[FOR_END]], label [[VECTOR_BODY]], !llvm.loop [[LOOP11:![0-9]+]]			; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP165]], label [[FOR_END]], label [[VECTOR_BODY]], !llvm.loop [[LOOP11:![0-9]+]]
	; DISABLED_MASKED_STRIDED: for.end:			; DISABLED_MASKED_STRIDED: for.end:
	; DISABLED_MASKED_STRIDED-NEXT: ret void			; DISABLED_MASKED_STRIDED-NEXT: ret void
	;			;
	; ENABLED_MASKED_STRIDED-LABEL: @unconditional_masked_strided2_unknown_tc(			; ENABLED_MASKED_STRIDED-LABEL: @unconditional_masked_strided2_unknown_tc(
	; ENABLED_MASKED_STRIDED-NEXT: entry:			; ENABLED_MASKED_STRIDED-NEXT: entry:
	; ENABLED_MASKED_STRIDED-NEXT: [[CMP20:%.]] = icmp sgt i32 [[N:%.]], 0			; ENABLED_MASKED_STRIDED-NEXT: [[CMP20:%.]] = icmp sgt i32 [[N:%.]], 0
	; ENABLED_MASKED_STRIDED-NEXT: br i1 [[CMP20]], label [[VECTOR_PH:%.]], label [[FOR_END:%.]]			; ENABLED_MASKED_STRIDED-NEXT: br i1 [[CMP20]], label [[VECTOR_PH:%.]], label [[FOR_END:%.]]
	; ENABLED_MASKED_STRIDED: vector.ph:			; ENABLED_MASKED_STRIDED: vector.ph:
	▲ Show 20 Lines • Show All 67 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/if-pred-stores.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -S -vectorize-num-stores-pred=1 -force-vector-width=1 -force-vector-interleave=2 -loop-vectorize -verify-loop-info -simplifycfg -simplifycfg-require-and-preserve-domtree=1 < %s \| FileCheck %s --check-prefix=UNROLL			; RUN: opt -S -force-vector-width=1 -force-vector-interleave=2 -loop-vectorize -verify-loop-info -simplifycfg -simplifycfg-require-and-preserve-domtree=1 < %s \| FileCheck %s --check-prefix=UNROLL
	; RUN: opt -S -vectorize-num-stores-pred=1 -force-vector-width=1 -force-vector-interleave=2 -loop-vectorize -verify-loop-info < %s \| FileCheck %s --check-prefix=UNROLL-NOSIMPLIFY			; RUN: opt -S -force-vector-width=1 -force-vector-interleave=2 -loop-vectorize -verify-loop-info < %s \| FileCheck %s --check-prefix=UNROLL-NOSIMPLIFY
	; RUN: opt -S -vectorize-num-stores-pred=1 -force-vector-width=2 -force-vector-interleave=1 -loop-vectorize -verify-loop-info -simplifycfg -simplifycfg-require-and-preserve-domtree=1 < %s \| FileCheck %s --check-prefix=VEC			; RUN: opt -S -force-vector-width=2 -force-vector-interleave=1 -loop-vectorize -verify-loop-info -simplifycfg -simplifycfg-require-and-preserve-domtree=1 < %s \| FileCheck %s --check-prefix=VEC

	target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"

	; Test predication of stores.			; Test predication of stores.
	define i32 @test(i32* nocapture %f) #0 {			define i32 @test(i32* nocapture %f) #0 {
	; UNROLL-LABEL: @test(			; UNROLL-LABEL: @test(
	; UNROLL-NEXT: entry:			; UNROLL-NEXT: entry:
	; UNROLL-NEXT: br label [[VECTOR_BODY:%.*]]			; UNROLL-NEXT: br label [[VECTOR_BODY:%.*]]
	▲ Show 20 Lines • Show All 773 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/memdep-fold-tail.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -loop-vectorize -vectorize-num-stores-pred=2 -prefer-predicate-over-epilogue=predicate-dont-vectorize -S \| FileCheck %s			; RUN: opt < %s -loop-vectorize -prefer-predicate-over-epilogue=predicate-dont-vectorize -S \| FileCheck %s

	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"			target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"

	; Vectorization with dependence checks.			; Vectorization with dependence checks.

	; Check that a non-power-of-2 MaxVF, calculated based on maximum safe distance,			; Check that a non-power-of-2 MaxVF, calculated based on maximum safe distance,
	; does not lead fold-tail to think that no tail will be generated for any chosen			; does not lead fold-tail to think that no tail will be generated for any chosen
	; (power of 2) VF.			; (power of 2) VF.
	▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP12:%.*]] = extractelement <2 x i32> [[TMP7]], i32 1			; CHECK-NEXT: [[TMP12:%.*]] = extractelement <2 x i32> [[TMP7]], i32 1
	; CHECK-NEXT: [[TMP13:%.]] = getelementptr inbounds [18 x i8], [18 x i8] @a, i32 0, i32 [[TMP12]]			; CHECK-NEXT: [[TMP13:%.]] = getelementptr inbounds [18 x i8], [18 x i8] @a, i32 0, i32 [[TMP12]]
	; CHECK-NEXT: store i8 7, i8* [[TMP13]], align 8			; CHECK-NEXT: store i8 7, i8* [[TMP13]], align 8
	; CHECK-NEXT: br label [[PRED_STORE_CONTINUE6]]			; CHECK-NEXT: br label [[PRED_STORE_CONTINUE6]]
	; CHECK: pred.store.continue6:			; CHECK: pred.store.continue6:
	; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 2			; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 2
	; CHECK-NEXT: [[VEC_IND_NEXT]] = add <2 x i32> [[VEC_IND]], <i32 2, i32 2>			; CHECK-NEXT: [[VEC_IND_NEXT]] = add <2 x i32> [[VEC_IND]], <i32 2, i32 2>
	; CHECK-NEXT: [[TMP14:%.*]] = icmp eq i32 [[INDEX_NEXT]], 16			; CHECK-NEXT: [[TMP14:%.*]] = icmp eq i32 [[INDEX_NEXT]], 16
	; CHECK-NEXT: br i1 [[TMP14]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !0			; CHECK-NEXT: br i1 [[TMP14]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ 16, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ 16, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[J:%.]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[J_NEXT:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[J:%.]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[J_NEXT:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[AJ:%.]] = getelementptr inbounds [18 x i8], [18 x i8] @a, i32 0, i32 [[J]]			; CHECK-NEXT: [[AJ:%.]] = getelementptr inbounds [18 x i8], [18 x i8] @a, i32 0, i32 [[J]]
	; CHECK-NEXT: store i8 69, i8* [[AJ]], align 8			; CHECK-NEXT: store i8 69, i8* [[AJ]], align 8
	; CHECK-NEXT: [[JP3:%.*]] = add nuw nsw i32 3, [[J]]			; CHECK-NEXT: [[JP3:%.*]] = add nuw nsw i32 3, [[J]]
	; CHECK-NEXT: [[AJP3:%.]] = getelementptr inbounds [18 x i8], [18 x i8] @a, i32 0, i32 [[JP3]]			; CHECK-NEXT: [[AJP3:%.]] = getelementptr inbounds [18 x i8], [18 x i8] @a, i32 0, i32 [[JP3]]
	; CHECK-NEXT: store i8 7, i8* [[AJP3]], align 8			; CHECK-NEXT: store i8 7, i8* [[AJP3]], align 8
	; CHECK-NEXT: [[J_NEXT]] = add nuw nsw i32 [[J]], 1			; CHECK-NEXT: [[J_NEXT]] = add nuw nsw i32 [[J]], 1
	; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[J_NEXT]], 15			; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[J_NEXT]], 15
	; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop !2			; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP2:![0-9]+]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%j = phi i32 [ 0, %entry ], [ %j.next, %for.body ]			%j = phi i32 [ 0, %entry ], [ %j.next, %for.body ]
	Show All 15 Lines

llvm/test/Transforms/LoopVectorize/optsize.ll

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt < %s -enable-new-pm=0 -loop-vectorize -S \| FileCheck %s -check-prefixes=DEFAULT,PGSO
				; RUN: opt < %s -enable-new-pm=0 -loop-vectorize -pgso -S \| FileCheck %s -check-prefixes=DEFAULT,PGSO
				; RUN: opt < %s -enable-new-pm=0 -loop-vectorize -pgso=false -S \| FileCheck %s -check-prefixes=DEFAULT,NPGSO
				; RUN: opt < %s -passes='require<profile-summary>,loop-vectorize' -S \| FileCheck %s -check-prefixes=DEFAULT,PGSO
				; RUN: opt < %s -passes='require<profile-summary>,loop-vectorize' -pgso -S \| FileCheck %s -check-prefixes=DEFAULT,PGSO
				; RUN: opt < %s -passes='require<profile-summary>,loop-vectorize' -pgso=false -S \| FileCheck %s -check-prefixes=DEFAULT,NPGSO

				; REQUIRES: asserts

	; This test verifies that the loop vectorizer will NOT produce a tail			; This test verifies that the loop vectorizer will NOT produce a tail
	; loop with the optimize for size or the minimize size attributes.			; loop with the optimize for size or the minimize size attributes.
	; REQUIRES: asserts
	; RUN: opt < %s -enable-new-pm=0 -loop-vectorize -S \| FileCheck %s
	; RUN: opt < %s -enable-new-pm=0 -loop-vectorize -pgso -S \| FileCheck %s -check-prefix=PGSO
	; RUN: opt < %s -enable-new-pm=0 -loop-vectorize -pgso=false -S \| FileCheck %s -check-prefix=NPGSO
	; RUN: opt < %s -passes='require<profile-summary>,loop-vectorize' -S \| FileCheck %s
	; RUN: opt < %s -passes='require<profile-summary>,loop-vectorize' -pgso -S \| FileCheck %s -check-prefix=PGSO
	; RUN: opt < %s -passes='require<profile-summary>,loop-vectorize' -pgso=false -S \| FileCheck %s -check-prefix=NPGSO

	target datalayout = "E-m:e-p:32:32-i64:32-f64:32:64-a:0:32-n32-S128"			target datalayout = "E-m:e-p:32:32-i64:32-f64:32:64-a:0:32-n32-S128"

	@tab = common global [32 x i8] zeroinitializer, align 1			@tab = common global [32 x i8] zeroinitializer, align 1

	define i32 @foo_optsize() #0 {			define i32 @foo_optsize() #0 {
	; CHECK-LABEL: @foo_optsize(			; DEFAULT-LABEL: @foo_optsize(
	; CHECK-NOT: <2 x i8>			; DEFAULT-NEXT: entry:
	; CHECK-NOT: <4 x i8>			; DEFAULT-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; DEFAULT: vector.ph:
				; DEFAULT-NEXT: br label [[VECTOR_BODY:%.*]]
				; DEFAULT: vector.body:
				; DEFAULT-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[PRED_STORE_CONTINUE12:%.*]] ]
				; DEFAULT-NEXT: [[VEC_IND:%.]] = phi <4 x i32> [ <i32 0, i32 1, i32 2, i32 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[PRED_STORE_CONTINUE12]] ]
				; DEFAULT-NEXT: [[TMP0:%.*]] = add i32 [[INDEX]], 0
				; DEFAULT-NEXT: [[TMP1:%.*]] = add i32 [[INDEX]], 1
				; DEFAULT-NEXT: [[TMP2:%.*]] = add i32 [[INDEX]], 2
				; DEFAULT-NEXT: [[TMP3:%.*]] = add i32 [[INDEX]], 3
				; DEFAULT-NEXT: [[TMP4:%.*]] = icmp ule <4 x i32> [[VEC_IND]], <i32 202, i32 202, i32 202, i32 202>
				; DEFAULT-NEXT: [[TMP5:%.]] = getelementptr inbounds [32 x i8], [32 x i8] @tab, i32 0, i32 [[TMP0]]
				; DEFAULT-NEXT: [[TMP6:%.]] = getelementptr inbounds [32 x i8], [32 x i8] @tab, i32 0, i32 [[TMP1]]
				; DEFAULT-NEXT: [[TMP7:%.]] = getelementptr inbounds [32 x i8], [32 x i8] @tab, i32 0, i32 [[TMP2]]
				; DEFAULT-NEXT: [[TMP8:%.]] = getelementptr inbounds [32 x i8], [32 x i8] @tab, i32 0, i32 [[TMP3]]
				; DEFAULT-NEXT: [[TMP9:%.*]] = extractelement <4 x i1> [[TMP4]], i32 0
				; DEFAULT-NEXT: br i1 [[TMP9]], label [[PRED_LOAD_IF:%.]], label [[PRED_LOAD_CONTINUE:%.]]
				; DEFAULT: pred.load.if:
				; DEFAULT-NEXT: [[TMP10:%.]] = load i8, i8 [[TMP5]], align 1
				; DEFAULT-NEXT: [[TMP11:%.*]] = insertelement <4 x i8> poison, i8 [[TMP10]], i32 0
				; DEFAULT-NEXT: br label [[PRED_LOAD_CONTINUE]]
				; DEFAULT: pred.load.continue:
				; DEFAULT-NEXT: [[TMP12:%.*]] = phi <4 x i8> [ poison, [[VECTOR_BODY]] ], [ [[TMP11]], [[PRED_LOAD_IF]] ]
				; DEFAULT-NEXT: [[TMP13:%.*]] = extractelement <4 x i1> [[TMP4]], i32 1
				; DEFAULT-NEXT: br i1 [[TMP13]], label [[PRED_LOAD_IF1:%.]], label [[PRED_LOAD_CONTINUE2:%.]]
				; DEFAULT: pred.load.if1:
				; DEFAULT-NEXT: [[TMP14:%.]] = load i8, i8 [[TMP6]], align 1
				; DEFAULT-NEXT: [[TMP15:%.*]] = insertelement <4 x i8> [[TMP12]], i8 [[TMP14]], i32 1
				; DEFAULT-NEXT: br label [[PRED_LOAD_CONTINUE2]]
				; DEFAULT: pred.load.continue2:
				; DEFAULT-NEXT: [[TMP16:%.*]] = phi <4 x i8> [ [[TMP12]], [[PRED_LOAD_CONTINUE]] ], [ [[TMP15]], [[PRED_LOAD_IF1]] ]
				; DEFAULT-NEXT: [[TMP17:%.*]] = extractelement <4 x i1> [[TMP4]], i32 2
				; DEFAULT-NEXT: br i1 [[TMP17]], label [[PRED_LOAD_IF3:%.]], label [[PRED_LOAD_CONTINUE4:%.]]
				; DEFAULT: pred.load.if3:
				; DEFAULT-NEXT: [[TMP18:%.]] = load i8, i8 [[TMP7]], align 1
				; DEFAULT-NEXT: [[TMP19:%.*]] = insertelement <4 x i8> [[TMP16]], i8 [[TMP18]], i32 2
				; DEFAULT-NEXT: br label [[PRED_LOAD_CONTINUE4]]
				; DEFAULT: pred.load.continue4:
				; DEFAULT-NEXT: [[TMP20:%.*]] = phi <4 x i8> [ [[TMP16]], [[PRED_LOAD_CONTINUE2]] ], [ [[TMP19]], [[PRED_LOAD_IF3]] ]
				; DEFAULT-NEXT: [[TMP21:%.*]] = extractelement <4 x i1> [[TMP4]], i32 3
				; DEFAULT-NEXT: br i1 [[TMP21]], label [[PRED_LOAD_IF5:%.]], label [[PRED_LOAD_CONTINUE6:%.]]
				; DEFAULT: pred.load.if5:
				; DEFAULT-NEXT: [[TMP22:%.]] = load i8, i8 [[TMP8]], align 1
				; DEFAULT-NEXT: [[TMP23:%.*]] = insertelement <4 x i8> [[TMP20]], i8 [[TMP22]], i32 3
				; DEFAULT-NEXT: br label [[PRED_LOAD_CONTINUE6]]
				; DEFAULT: pred.load.continue6:
				; DEFAULT-NEXT: [[TMP24:%.*]] = phi <4 x i8> [ [[TMP20]], [[PRED_LOAD_CONTINUE4]] ], [ [[TMP23]], [[PRED_LOAD_IF5]] ]
				; DEFAULT-NEXT: [[TMP25:%.*]] = icmp eq <4 x i8> [[TMP24]], zeroinitializer
				; DEFAULT-NEXT: [[TMP26:%.*]] = select <4 x i1> [[TMP25]], <4 x i8> <i8 2, i8 2, i8 2, i8 2>, <4 x i8> <i8 1, i8 1, i8 1, i8 1>
				; DEFAULT-NEXT: [[TMP27:%.*]] = extractelement <4 x i1> [[TMP4]], i32 0
				; DEFAULT-NEXT: br i1 [[TMP27]], label [[PRED_STORE_IF:%.]], label [[PRED_STORE_CONTINUE:%.]]
				; DEFAULT: pred.store.if:
				; DEFAULT-NEXT: [[TMP28:%.*]] = extractelement <4 x i8> [[TMP26]], i32 0
				; DEFAULT-NEXT: store i8 [[TMP28]], i8* [[TMP5]], align 1
				; DEFAULT-NEXT: br label [[PRED_STORE_CONTINUE]]
				; DEFAULT: pred.store.continue:
				; DEFAULT-NEXT: [[TMP29:%.*]] = extractelement <4 x i1> [[TMP4]], i32 1
				; DEFAULT-NEXT: br i1 [[TMP29]], label [[PRED_STORE_IF7:%.]], label [[PRED_STORE_CONTINUE8:%.]]
				; DEFAULT: pred.store.if7:
				; DEFAULT-NEXT: [[TMP30:%.*]] = extractelement <4 x i8> [[TMP26]], i32 1
				; DEFAULT-NEXT: store i8 [[TMP30]], i8* [[TMP6]], align 1
				; DEFAULT-NEXT: br label [[PRED_STORE_CONTINUE8]]
				; DEFAULT: pred.store.continue8:
				; DEFAULT-NEXT: [[TMP31:%.*]] = extractelement <4 x i1> [[TMP4]], i32 2
				; DEFAULT-NEXT: br i1 [[TMP31]], label [[PRED_STORE_IF9:%.]], label [[PRED_STORE_CONTINUE10:%.]]
				; DEFAULT: pred.store.if9:
				; DEFAULT-NEXT: [[TMP32:%.*]] = extractelement <4 x i8> [[TMP26]], i32 2
				; DEFAULT-NEXT: store i8 [[TMP32]], i8* [[TMP7]], align 1
				; DEFAULT-NEXT: br label [[PRED_STORE_CONTINUE10]]
				; DEFAULT: pred.store.continue10:
				; DEFAULT-NEXT: [[TMP33:%.*]] = extractelement <4 x i1> [[TMP4]], i32 3
				; DEFAULT-NEXT: br i1 [[TMP33]], label [[PRED_STORE_IF11:%.*]], label [[PRED_STORE_CONTINUE12]]
				; DEFAULT: pred.store.if11:
				; DEFAULT-NEXT: [[TMP34:%.*]] = extractelement <4 x i8> [[TMP26]], i32 3
				; DEFAULT-NEXT: store i8 [[TMP34]], i8* [[TMP8]], align 1
				; DEFAULT-NEXT: br label [[PRED_STORE_CONTINUE12]]
				; DEFAULT: pred.store.continue12:
				; DEFAULT-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 4
				; DEFAULT-NEXT: [[VEC_IND_NEXT]] = add <4 x i32> [[VEC_IND]], <i32 4, i32 4, i32 4, i32 4>
				; DEFAULT-NEXT: [[TMP35:%.*]] = icmp eq i32 [[INDEX_NEXT]], 204
				; DEFAULT-NEXT: br i1 [[TMP35]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP14:![0-9]+]]
				; DEFAULT: middle.block:
				; DEFAULT-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]
				; DEFAULT: scalar.ph:
				; DEFAULT-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ 204, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
				; DEFAULT-NEXT: br label [[FOR_BODY:%.*]]
				; DEFAULT: for.body:
				; DEFAULT-NEXT: [[I_08:%.]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INC:%.]], [[FOR_BODY]] ]
				; DEFAULT-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [32 x i8], [32 x i8] @tab, i32 0, i32 [[I_08]]
				; DEFAULT-NEXT: [[TMP36:%.]] = load i8, i8 [[ARRAYIDX]], align 1
				; DEFAULT-NEXT: [[CMP1:%.*]] = icmp eq i8 [[TMP36]], 0
				; DEFAULT-NEXT: [[DOT:%.*]] = select i1 [[CMP1]], i8 2, i8 1
				; DEFAULT-NEXT: store i8 [[DOT]], i8* [[ARRAYIDX]], align 1
				; DEFAULT-NEXT: [[INC]] = add nsw i32 [[I_08]], 1
				; DEFAULT-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[I_08]], 202
				; DEFAULT-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP16:![0-9]+]]
				; DEFAULT: for.end:
				; DEFAULT-NEXT: ret i32 0
				;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %for.body, %entry			for.body: ; preds = %for.body, %entry
	%i.08 = phi i32 [ 0, %entry ], [ %inc, %for.body ]			%i.08 = phi i32 [ 0, %entry ], [ %inc, %for.body ]
	%arrayidx = getelementptr inbounds [32 x i8], [32 x i8]* @tab, i32 0, i32 %i.08			%arrayidx = getelementptr inbounds [32 x i8], [32 x i8]* @tab, i32 0, i32 %i.08
	%0 = load i8, i8* %arrayidx, align 1			%0 = load i8, i8* %arrayidx, align 1
	%cmp1 = icmp eq i8 %0, 0			%cmp1 = icmp eq i8 %0, 0
	%. = select i1 %cmp1, i8 2, i8 1			%. = select i1 %cmp1, i8 2, i8 1
	store i8 %., i8* %arrayidx, align 1			store i8 %., i8* %arrayidx, align 1
	%inc = add nsw i32 %i.08, 1			%inc = add nsw i32 %i.08, 1
	%exitcond = icmp eq i32 %i.08, 202			%exitcond = icmp eq i32 %i.08, 202
	br i1 %exitcond, label %for.end, label %for.body			br i1 %exitcond, label %for.end, label %for.body

	for.end: ; preds = %for.body			for.end: ; preds = %for.body
	ret i32 0			ret i32 0
	}			}

	attributes #0 = { optsize }			attributes #0 = { optsize }

	define i32 @foo_minsize() #1 {			define i32 @foo_minsize() #1 {
	; CHECK-LABEL: @foo_minsize(			; DEFAULT-LABEL: @foo_minsize(
	; CHECK-NOT: <2 x i8>			; DEFAULT-NEXT: entry:
	; CHECK-NOT: <4 x i8>			; DEFAULT-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; CHECK-LABEL: @foo_pgso(			; DEFAULT: vector.ph:
				; DEFAULT-NEXT: br label [[VECTOR_BODY:%.*]]
				; DEFAULT: vector.body:
				; DEFAULT-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[PRED_STORE_CONTINUE12:%.*]] ]
				; DEFAULT-NEXT: [[VEC_IND:%.]] = phi <4 x i32> [ <i32 0, i32 1, i32 2, i32 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[PRED_STORE_CONTINUE12]] ]
				; DEFAULT-NEXT: [[TMP0:%.*]] = add i32 [[INDEX]], 0
				; DEFAULT-NEXT: [[TMP1:%.*]] = add i32 [[INDEX]], 1
				; DEFAULT-NEXT: [[TMP2:%.*]] = add i32 [[INDEX]], 2
				; DEFAULT-NEXT: [[TMP3:%.*]] = add i32 [[INDEX]], 3
				; DEFAULT-NEXT: [[TMP4:%.*]] = icmp ule <4 x i32> [[VEC_IND]], <i32 202, i32 202, i32 202, i32 202>
				; DEFAULT-NEXT: [[TMP5:%.]] = getelementptr inbounds [32 x i8], [32 x i8] @tab, i32 0, i32 [[TMP0]]
				; DEFAULT-NEXT: [[TMP6:%.]] = getelementptr inbounds [32 x i8], [32 x i8] @tab, i32 0, i32 [[TMP1]]
				; DEFAULT-NEXT: [[TMP7:%.]] = getelementptr inbounds [32 x i8], [32 x i8] @tab, i32 0, i32 [[TMP2]]
				; DEFAULT-NEXT: [[TMP8:%.]] = getelementptr inbounds [32 x i8], [32 x i8] @tab, i32 0, i32 [[TMP3]]
				; DEFAULT-NEXT: [[TMP9:%.*]] = extractelement <4 x i1> [[TMP4]], i32 0
				; DEFAULT-NEXT: br i1 [[TMP9]], label [[PRED_LOAD_IF:%.]], label [[PRED_LOAD_CONTINUE:%.]]
				; DEFAULT: pred.load.if:
				; DEFAULT-NEXT: [[TMP10:%.]] = load i8, i8 [[TMP5]], align 1
				; DEFAULT-NEXT: [[TMP11:%.*]] = insertelement <4 x i8> poison, i8 [[TMP10]], i32 0
				; DEFAULT-NEXT: br label [[PRED_LOAD_CONTINUE]]
				; DEFAULT: pred.load.continue:
				; DEFAULT-NEXT: [[TMP12:%.*]] = phi <4 x i8> [ poison, [[VECTOR_BODY]] ], [ [[TMP11]], [[PRED_LOAD_IF]] ]
				; DEFAULT-NEXT: [[TMP13:%.*]] = extractelement <4 x i1> [[TMP4]], i32 1
				; DEFAULT-NEXT: br i1 [[TMP13]], label [[PRED_LOAD_IF1:%.]], label [[PRED_LOAD_CONTINUE2:%.]]
				; DEFAULT: pred.load.if1:
				; DEFAULT-NEXT: [[TMP14:%.]] = load i8, i8 [[TMP6]], align 1
				; DEFAULT-NEXT: [[TMP15:%.*]] = insertelement <4 x i8> [[TMP12]], i8 [[TMP14]], i32 1
				; DEFAULT-NEXT: br label [[PRED_LOAD_CONTINUE2]]
				; DEFAULT: pred.load.continue2:
				; DEFAULT-NEXT: [[TMP16:%.*]] = phi <4 x i8> [ [[TMP12]], [[PRED_LOAD_CONTINUE]] ], [ [[TMP15]], [[PRED_LOAD_IF1]] ]
				; DEFAULT-NEXT: [[TMP17:%.*]] = extractelement <4 x i1> [[TMP4]], i32 2
				; DEFAULT-NEXT: br i1 [[TMP17]], label [[PRED_LOAD_IF3:%.]], label [[PRED_LOAD_CONTINUE4:%.]]
				; DEFAULT: pred.load.if3:
				; DEFAULT-NEXT: [[TMP18:%.]] = load i8, i8 [[TMP7]], align 1
				; DEFAULT-NEXT: [[TMP19:%.*]] = insertelement <4 x i8> [[TMP16]], i8 [[TMP18]], i32 2
				; DEFAULT-NEXT: br label [[PRED_LOAD_CONTINUE4]]
				; DEFAULT: pred.load.continue4:
				; DEFAULT-NEXT: [[TMP20:%.*]] = phi <4 x i8> [ [[TMP16]], [[PRED_LOAD_CONTINUE2]] ], [ [[TMP19]], [[PRED_LOAD_IF3]] ]
				; DEFAULT-NEXT: [[TMP21:%.*]] = extractelement <4 x i1> [[TMP4]], i32 3
				; DEFAULT-NEXT: br i1 [[TMP21]], label [[PRED_LOAD_IF5:%.]], label [[PRED_LOAD_CONTINUE6:%.]]
				; DEFAULT: pred.load.if5:
				; DEFAULT-NEXT: [[TMP22:%.]] = load i8, i8 [[TMP8]], align 1
				; DEFAULT-NEXT: [[TMP23:%.*]] = insertelement <4 x i8> [[TMP20]], i8 [[TMP22]], i32 3
				; DEFAULT-NEXT: br label [[PRED_LOAD_CONTINUE6]]
				; DEFAULT: pred.load.continue6:
				; DEFAULT-NEXT: [[TMP24:%.*]] = phi <4 x i8> [ [[TMP20]], [[PRED_LOAD_CONTINUE4]] ], [ [[TMP23]], [[PRED_LOAD_IF5]] ]
				; DEFAULT-NEXT: [[TMP25:%.*]] = icmp eq <4 x i8> [[TMP24]], zeroinitializer
				; DEFAULT-NEXT: [[TMP26:%.*]] = select <4 x i1> [[TMP25]], <4 x i8> <i8 2, i8 2, i8 2, i8 2>, <4 x i8> <i8 1, i8 1, i8 1, i8 1>
				; DEFAULT-NEXT: [[TMP27:%.*]] = extractelement <4 x i1> [[TMP4]], i32 0
				; DEFAULT-NEXT: br i1 [[TMP27]], label [[PRED_STORE_IF:%.]], label [[PRED_STORE_CONTINUE:%.]]
				; DEFAULT: pred.store.if:
				; DEFAULT-NEXT: [[TMP28:%.*]] = extractelement <4 x i8> [[TMP26]], i32 0
				; DEFAULT-NEXT: store i8 [[TMP28]], i8* [[TMP5]], align 1
				; DEFAULT-NEXT: br label [[PRED_STORE_CONTINUE]]
				; DEFAULT: pred.store.continue:
				; DEFAULT-NEXT: [[TMP29:%.*]] = extractelement <4 x i1> [[TMP4]], i32 1
				; DEFAULT-NEXT: br i1 [[TMP29]], label [[PRED_STORE_IF7:%.]], label [[PRED_STORE_CONTINUE8:%.]]
				; DEFAULT: pred.store.if7:
				; DEFAULT-NEXT: [[TMP30:%.*]] = extractelement <4 x i8> [[TMP26]], i32 1
				; DEFAULT-NEXT: store i8 [[TMP30]], i8* [[TMP6]], align 1
				; DEFAULT-NEXT: br label [[PRED_STORE_CONTINUE8]]
				; DEFAULT: pred.store.continue8:
				; DEFAULT-NEXT: [[TMP31:%.*]] = extractelement <4 x i1> [[TMP4]], i32 2
				; DEFAULT-NEXT: br i1 [[TMP31]], label [[PRED_STORE_IF9:%.]], label [[PRED_STORE_CONTINUE10:%.]]
				; DEFAULT: pred.store.if9:
				; DEFAULT-NEXT: [[TMP32:%.*]] = extractelement <4 x i8> [[TMP26]], i32 2
				; DEFAULT-NEXT: store i8 [[TMP32]], i8* [[TMP7]], align 1
				; DEFAULT-NEXT: br label [[PRED_STORE_CONTINUE10]]
				; DEFAULT: pred.store.continue10:
				; DEFAULT-NEXT: [[TMP33:%.*]] = extractelement <4 x i1> [[TMP4]], i32 3
				; DEFAULT-NEXT: br i1 [[TMP33]], label [[PRED_STORE_IF11:%.*]], label [[PRED_STORE_CONTINUE12]]
				; DEFAULT: pred.store.if11:
				; DEFAULT-NEXT: [[TMP34:%.*]] = extractelement <4 x i8> [[TMP26]], i32 3
				; DEFAULT-NEXT: store i8 [[TMP34]], i8* [[TMP8]], align 1
				; DEFAULT-NEXT: br label [[PRED_STORE_CONTINUE12]]
				; DEFAULT: pred.store.continue12:
				; DEFAULT-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 4
				; DEFAULT-NEXT: [[VEC_IND_NEXT]] = add <4 x i32> [[VEC_IND]], <i32 4, i32 4, i32 4, i32 4>
				; DEFAULT-NEXT: [[TMP35:%.*]] = icmp eq i32 [[INDEX_NEXT]], 204
				; DEFAULT-NEXT: br i1 [[TMP35]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP18:![0-9]+]]
				; DEFAULT: middle.block:
				; DEFAULT-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]
				; DEFAULT: scalar.ph:
				; DEFAULT-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ 204, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
				; DEFAULT-NEXT: br label [[FOR_BODY:%.*]]
				; DEFAULT: for.body:
				; DEFAULT-NEXT: [[I_08:%.]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INC:%.]], [[FOR_BODY]] ]
				; DEFAULT-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [32 x i8], [32 x i8] @tab, i32 0, i32 [[I_08]]
				; DEFAULT-NEXT: [[TMP36:%.]] = load i8, i8 [[ARRAYIDX]], align 1
				; DEFAULT-NEXT: [[CMP1:%.*]] = icmp eq i8 [[TMP36]], 0
				; DEFAULT-NEXT: [[DOT:%.*]] = select i1 [[CMP1]], i8 2, i8 1
				; DEFAULT-NEXT: store i8 [[DOT]], i8* [[ARRAYIDX]], align 1
				; DEFAULT-NEXT: [[INC]] = add nsw i32 [[I_08]], 1
				; DEFAULT-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[I_08]], 202
				; DEFAULT-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP19:![0-9]+]]
				; DEFAULT: for.end:
				; DEFAULT-NEXT: ret i32 0
				;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %for.body, %entry			for.body: ; preds = %for.body, %entry
	%i.08 = phi i32 [ 0, %entry ], [ %inc, %for.body ]			%i.08 = phi i32 [ 0, %entry ], [ %inc, %for.body ]
	%arrayidx = getelementptr inbounds [32 x i8], [32 x i8]* @tab, i32 0, i32 %i.08			%arrayidx = getelementptr inbounds [32 x i8], [32 x i8]* @tab, i32 0, i32 %i.08
	%0 = load i8, i8* %arrayidx, align 1			%0 = load i8, i8* %arrayidx, align 1
	%cmp1 = icmp eq i8 %0, 0			%cmp1 = icmp eq i8 %0, 0
	%. = select i1 %cmp1, i8 2, i8 1			%. = select i1 %cmp1, i8 2, i8 1
	store i8 %., i8* %arrayidx, align 1			store i8 %., i8* %arrayidx, align 1
	%inc = add nsw i32 %i.08, 1			%inc = add nsw i32 %i.08, 1
	%exitcond = icmp eq i32 %i.08, 202			%exitcond = icmp eq i32 %i.08, 202
	br i1 %exitcond, label %for.end, label %for.body			br i1 %exitcond, label %for.end, label %for.body

	for.end: ; preds = %for.body			for.end: ; preds = %for.body
	ret i32 0			ret i32 0
	}			}

	attributes #1 = { minsize }			attributes #1 = { minsize }

	define i32 @foo_pgso() !prof !14 {			define i32 @foo_pgso() !prof !14 {
	; PGSO-LABEL: @foo_pgso(			; PGSO-LABEL: @foo_pgso(
	; PGSO-NOT: <{{[0-9]+}} x i8>			; PGSO-NEXT: entry:
				; PGSO-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; PGSO: vector.ph:
				; PGSO-NEXT: br label [[VECTOR_BODY:%.*]]
				; PGSO: vector.body:
				; PGSO-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[PRED_STORE_CONTINUE12:%.*]] ]
				; PGSO-NEXT: [[VEC_IND:%.]] = phi <4 x i32> [ <i32 0, i32 1, i32 2, i32 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[PRED_STORE_CONTINUE12]] ]
				; PGSO-NEXT: [[TMP0:%.*]] = add i32 [[INDEX]], 0
				; PGSO-NEXT: [[TMP1:%.*]] = add i32 [[INDEX]], 1
				; PGSO-NEXT: [[TMP2:%.*]] = add i32 [[INDEX]], 2
				; PGSO-NEXT: [[TMP3:%.*]] = add i32 [[INDEX]], 3
				; PGSO-NEXT: [[TMP4:%.*]] = icmp ule <4 x i32> [[VEC_IND]], <i32 202, i32 202, i32 202, i32 202>
				; PGSO-NEXT: [[TMP5:%.]] = getelementptr inbounds [32 x i8], [32 x i8] @tab, i32 0, i32 [[TMP0]]
				; PGSO-NEXT: [[TMP6:%.]] = getelementptr inbounds [32 x i8], [32 x i8] @tab, i32 0, i32 [[TMP1]]
				; PGSO-NEXT: [[TMP7:%.]] = getelementptr inbounds [32 x i8], [32 x i8] @tab, i32 0, i32 [[TMP2]]
				; PGSO-NEXT: [[TMP8:%.]] = getelementptr inbounds [32 x i8], [32 x i8] @tab, i32 0, i32 [[TMP3]]
				; PGSO-NEXT: [[TMP9:%.*]] = extractelement <4 x i1> [[TMP4]], i32 0
				; PGSO-NEXT: br i1 [[TMP9]], label [[PRED_LOAD_IF:%.]], label [[PRED_LOAD_CONTINUE:%.]]
				; PGSO: pred.load.if:
				; PGSO-NEXT: [[TMP10:%.]] = load i8, i8 [[TMP5]], align 1
				; PGSO-NEXT: [[TMP11:%.*]] = insertelement <4 x i8> poison, i8 [[TMP10]], i32 0
				; PGSO-NEXT: br label [[PRED_LOAD_CONTINUE]]
				; PGSO: pred.load.continue:
				; PGSO-NEXT: [[TMP12:%.*]] = phi <4 x i8> [ poison, [[VECTOR_BODY]] ], [ [[TMP11]], [[PRED_LOAD_IF]] ]
				; PGSO-NEXT: [[TMP13:%.*]] = extractelement <4 x i1> [[TMP4]], i32 1
				; PGSO-NEXT: br i1 [[TMP13]], label [[PRED_LOAD_IF1:%.]], label [[PRED_LOAD_CONTINUE2:%.]]
				; PGSO: pred.load.if1:
				; PGSO-NEXT: [[TMP14:%.]] = load i8, i8 [[TMP6]], align 1
				; PGSO-NEXT: [[TMP15:%.*]] = insertelement <4 x i8> [[TMP12]], i8 [[TMP14]], i32 1
				; PGSO-NEXT: br label [[PRED_LOAD_CONTINUE2]]
				; PGSO: pred.load.continue2:
				; PGSO-NEXT: [[TMP16:%.*]] = phi <4 x i8> [ [[TMP12]], [[PRED_LOAD_CONTINUE]] ], [ [[TMP15]], [[PRED_LOAD_IF1]] ]
				; PGSO-NEXT: [[TMP17:%.*]] = extractelement <4 x i1> [[TMP4]], i32 2
				; PGSO-NEXT: br i1 [[TMP17]], label [[PRED_LOAD_IF3:%.]], label [[PRED_LOAD_CONTINUE4:%.]]
				; PGSO: pred.load.if3:
				; PGSO-NEXT: [[TMP18:%.]] = load i8, i8 [[TMP7]], align 1
				; PGSO-NEXT: [[TMP19:%.*]] = insertelement <4 x i8> [[TMP16]], i8 [[TMP18]], i32 2
				; PGSO-NEXT: br label [[PRED_LOAD_CONTINUE4]]
				; PGSO: pred.load.continue4:
				; PGSO-NEXT: [[TMP20:%.*]] = phi <4 x i8> [ [[TMP16]], [[PRED_LOAD_CONTINUE2]] ], [ [[TMP19]], [[PRED_LOAD_IF3]] ]
				; PGSO-NEXT: [[TMP21:%.*]] = extractelement <4 x i1> [[TMP4]], i32 3
				; PGSO-NEXT: br i1 [[TMP21]], label [[PRED_LOAD_IF5:%.]], label [[PRED_LOAD_CONTINUE6:%.]]
				; PGSO: pred.load.if5:
				; PGSO-NEXT: [[TMP22:%.]] = load i8, i8 [[TMP8]], align 1
				; PGSO-NEXT: [[TMP23:%.*]] = insertelement <4 x i8> [[TMP20]], i8 [[TMP22]], i32 3
				; PGSO-NEXT: br label [[PRED_LOAD_CONTINUE6]]
				; PGSO: pred.load.continue6:
				; PGSO-NEXT: [[TMP24:%.*]] = phi <4 x i8> [ [[TMP20]], [[PRED_LOAD_CONTINUE4]] ], [ [[TMP23]], [[PRED_LOAD_IF5]] ]
				; PGSO-NEXT: [[TMP25:%.*]] = icmp eq <4 x i8> [[TMP24]], zeroinitializer
				; PGSO-NEXT: [[TMP26:%.*]] = select <4 x i1> [[TMP25]], <4 x i8> <i8 2, i8 2, i8 2, i8 2>, <4 x i8> <i8 1, i8 1, i8 1, i8 1>
				; PGSO-NEXT: [[TMP27:%.*]] = extractelement <4 x i1> [[TMP4]], i32 0
				; PGSO-NEXT: br i1 [[TMP27]], label [[PRED_STORE_IF:%.]], label [[PRED_STORE_CONTINUE:%.]]
				; PGSO: pred.store.if:
				; PGSO-NEXT: [[TMP28:%.*]] = extractelement <4 x i8> [[TMP26]], i32 0
				; PGSO-NEXT: store i8 [[TMP28]], i8* [[TMP5]], align 1
				; PGSO-NEXT: br label [[PRED_STORE_CONTINUE]]
				; PGSO: pred.store.continue:
				; PGSO-NEXT: [[TMP29:%.*]] = extractelement <4 x i1> [[TMP4]], i32 1
				; PGSO-NEXT: br i1 [[TMP29]], label [[PRED_STORE_IF7:%.]], label [[PRED_STORE_CONTINUE8:%.]]
				; PGSO: pred.store.if7:
				; PGSO-NEXT: [[TMP30:%.*]] = extractelement <4 x i8> [[TMP26]], i32 1
				; PGSO-NEXT: store i8 [[TMP30]], i8* [[TMP6]], align 1
				; PGSO-NEXT: br label [[PRED_STORE_CONTINUE8]]
				; PGSO: pred.store.continue8:
				; PGSO-NEXT: [[TMP31:%.*]] = extractelement <4 x i1> [[TMP4]], i32 2
				; PGSO-NEXT: br i1 [[TMP31]], label [[PRED_STORE_IF9:%.]], label [[PRED_STORE_CONTINUE10:%.]]
				; PGSO: pred.store.if9:
				; PGSO-NEXT: [[TMP32:%.*]] = extractelement <4 x i8> [[TMP26]], i32 2
				; PGSO-NEXT: store i8 [[TMP32]], i8* [[TMP7]], align 1
				; PGSO-NEXT: br label [[PRED_STORE_CONTINUE10]]
				; PGSO: pred.store.continue10:
				; PGSO-NEXT: [[TMP33:%.*]] = extractelement <4 x i1> [[TMP4]], i32 3
				; PGSO-NEXT: br i1 [[TMP33]], label [[PRED_STORE_IF11:%.*]], label [[PRED_STORE_CONTINUE12]]
				; PGSO: pred.store.if11:
				; PGSO-NEXT: [[TMP34:%.*]] = extractelement <4 x i8> [[TMP26]], i32 3
				; PGSO-NEXT: store i8 [[TMP34]], i8* [[TMP8]], align 1
				; PGSO-NEXT: br label [[PRED_STORE_CONTINUE12]]
				; PGSO: pred.store.continue12:
				; PGSO-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 4
				; PGSO-NEXT: [[VEC_IND_NEXT]] = add <4 x i32> [[VEC_IND]], <i32 4, i32 4, i32 4, i32 4>
				; PGSO-NEXT: [[TMP35:%.*]] = icmp eq i32 [[INDEX_NEXT]], 204
				; PGSO-NEXT: br i1 [[TMP35]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP21:![0-9]+]]
				; PGSO: middle.block:
				; PGSO-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]
				; PGSO: scalar.ph:
				; PGSO-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ 204, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
				; PGSO-NEXT: br label [[FOR_BODY:%.*]]
				; PGSO: for.body:
				; PGSO-NEXT: [[I_08:%.]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INC:%.]], [[FOR_BODY]] ]
				; PGSO-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [32 x i8], [32 x i8] @tab, i32 0, i32 [[I_08]]
				; PGSO-NEXT: [[TMP36:%.]] = load i8, i8 [[ARRAYIDX]], align 1
				; PGSO-NEXT: [[CMP1:%.*]] = icmp eq i8 [[TMP36]], 0
				; PGSO-NEXT: [[DOT:%.*]] = select i1 [[CMP1]], i8 2, i8 1
				; PGSO-NEXT: store i8 [[DOT]], i8* [[ARRAYIDX]], align 1
				; PGSO-NEXT: [[INC]] = add nsw i32 [[I_08]], 1
				; PGSO-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[I_08]], 202
				; PGSO-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP22:![0-9]+]]
				; PGSO: for.end:
				; PGSO-NEXT: ret i32 0
				;
	; NPGSO-LABEL: @foo_pgso(			; NPGSO-LABEL: @foo_pgso(
	; NPGSO: <{{[0-9]+}} x i8>			; NPGSO-NEXT: entry:
				; NPGSO-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; NPGSO: vector.ph:
				; NPGSO-NEXT: br label [[VECTOR_BODY:%.*]]
				; NPGSO: vector.body:
				; NPGSO-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; NPGSO-NEXT: [[TMP0:%.*]] = add i32 [[INDEX]], 0
				; NPGSO-NEXT: [[TMP1:%.]] = getelementptr inbounds [32 x i8], [32 x i8] @tab, i32 0, i32 [[TMP0]]
				; NPGSO-NEXT: [[TMP2:%.]] = getelementptr inbounds i8, i8 [[TMP1]], i32 0
				; NPGSO-NEXT: [[TMP3:%.]] = bitcast i8 [[TMP2]] to <4 x i8>*
				; NPGSO-NEXT: [[WIDE_LOAD:%.]] = load <4 x i8>, <4 x i8> [[TMP3]], align 1
				; NPGSO-NEXT: [[TMP4:%.*]] = icmp eq <4 x i8> [[WIDE_LOAD]], zeroinitializer
				; NPGSO-NEXT: [[TMP5:%.*]] = select <4 x i1> [[TMP4]], <4 x i8> <i8 2, i8 2, i8 2, i8 2>, <4 x i8> <i8 1, i8 1, i8 1, i8 1>
				; NPGSO-NEXT: [[TMP6:%.]] = bitcast i8 [[TMP2]] to <4 x i8>*
				; NPGSO-NEXT: store <4 x i8> [[TMP5]], <4 x i8>* [[TMP6]], align 1
				; NPGSO-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 4
				; NPGSO-NEXT: [[TMP7:%.*]] = icmp eq i32 [[INDEX_NEXT]], 200
				; NPGSO-NEXT: br i1 [[TMP7]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP21:![0-9]+]]
				; NPGSO: middle.block:
				; NPGSO-NEXT: [[CMP_N:%.*]] = icmp eq i32 203, 200
				; NPGSO-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
				; NPGSO: scalar.ph:
				; NPGSO-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ 200, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
				; NPGSO-NEXT: br label [[FOR_BODY:%.*]]
				; NPGSO: for.body:
				; NPGSO-NEXT: [[I_08:%.]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INC:%.]], [[FOR_BODY]] ]
				; NPGSO-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [32 x i8], [32 x i8] @tab, i32 0, i32 [[I_08]]
				; NPGSO-NEXT: [[TMP8:%.]] = load i8, i8 [[ARRAYIDX]], align 1
				; NPGSO-NEXT: [[CMP1:%.*]] = icmp eq i8 [[TMP8]], 0
				; NPGSO-NEXT: [[DOT:%.*]] = select i1 [[CMP1]], i8 2, i8 1
				; NPGSO-NEXT: store i8 [[DOT]], i8* [[ARRAYIDX]], align 1
				; NPGSO-NEXT: [[INC]] = add nsw i32 [[I_08]], 1
				; NPGSO-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[I_08]], 202
				; NPGSO-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP22:![0-9]+]]
				; NPGSO: for.end:
				; NPGSO-NEXT: ret i32 0
				;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %for.body, %entry			for.body: ; preds = %for.body, %entry
	%i.08 = phi i32 [ 0, %entry ], [ %inc, %for.body ]			%i.08 = phi i32 [ 0, %entry ], [ %inc, %for.body ]
	%arrayidx = getelementptr inbounds [32 x i8], [32 x i8]* @tab, i32 0, i32 %i.08			%arrayidx = getelementptr inbounds [32 x i8], [32 x i8]* @tab, i32 0, i32 %i.08
	%0 = load i8, i8* %arrayidx, align 1			%0 = load i8, i8* %arrayidx, align 1
	%cmp1 = icmp eq i8 %0, 0			%cmp1 = icmp eq i8 %0, 0
	%. = select i1 %cmp1, i8 2, i8 1			%. = select i1 %cmp1, i8 2, i8 1
	store i8 %., i8* %arrayidx, align 1			store i8 %., i8* %arrayidx, align 1
	%inc = add nsw i32 %i.08, 1			%inc = add nsw i32 %i.08, 1
	%exitcond = icmp eq i32 %i.08, 202			%exitcond = icmp eq i32 %i.08, 202
	br i1 %exitcond, label %for.end, label %for.body			br i1 %exitcond, label %for.end, label %for.body

	for.end: ; preds = %for.body			for.end: ; preds = %for.body
	ret i32 0			ret i32 0
	}			}

	; PR43371: don't run into an assert due to emitting SCEV runtime checks			; PR43371: don't run into an assert due to emitting SCEV runtime checks
	; with OptForSize.			; with OptForSize.
	;			;
	@cm_array = external global [2592 x i16], align 1			@cm_array = external global [2592 x i16], align 1

	define void @pr43371() optsize {			define void @pr43371() optsize {
	;			; DEFAULT-LABEL: @pr43371(
	; CHECK-LABEL: @pr43371			; DEFAULT-NEXT: entry:
	; CHECK-NOT: vector.scevcheck			; DEFAULT-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	;			; DEFAULT: vector.ph:
	; We do not want to generate SCEV predicates when optimising for size, because			; DEFAULT-NEXT: br label [[VECTOR_BODY:%.*]]
	; that will lead to extra code generation such as the SCEV overflow runtime			; DEFAULT: vector.body:
	; checks. Not generating SCEV predicates can still result in vectorisation as			; DEFAULT-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; the non-consecutive loads/stores can be scalarized:			; DEFAULT-NEXT: [[VEC_IND:%.]] = phi <2 x i16> [ <i16 0, i16 1>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
	;			; DEFAULT-NEXT: [[TMP0:%.*]] = add <2 x i16> undef, [[VEC_IND]]
	; CHECK: vector.body:			; DEFAULT-NEXT: [[TMP1:%.*]] = zext <2 x i16> [[TMP0]] to <2 x i32>
	; CHECK: store i16 0, i16* %{{.*}}, align 1			; DEFAULT-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0
	; CHECK: store i16 0, i16* %{{.*}}, align 1			; DEFAULT-NEXT: [[TMP3:%.]] = getelementptr [2592 x i16], [2592 x i16] @cm_array, i32 0, i32 [[TMP2]]
	; CHECK: br i1 {{.*}}, label %vector.body			; DEFAULT-NEXT: [[TMP4:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1
				; DEFAULT-NEXT: [[TMP5:%.]] = getelementptr [2592 x i16], [2592 x i16] @cm_array, i32 0, i32 [[TMP4]]
				; DEFAULT-NEXT: store i16 0, i16* [[TMP3]], align 1
				; DEFAULT-NEXT: store i16 0, i16* [[TMP5]], align 1
				; DEFAULT-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 2
				; DEFAULT-NEXT: [[VEC_IND_NEXT]] = add <2 x i16> [[VEC_IND]], <i16 2, i16 2>
				; DEFAULT-NEXT: [[TMP6:%.*]] = icmp eq i32 [[INDEX_NEXT]], 756
				; DEFAULT-NEXT: br i1 [[TMP6]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP23:![0-9]+]]
				; DEFAULT: middle.block:
				; DEFAULT-NEXT: [[CMP_N:%.*]] = icmp eq i32 756, 756
				; DEFAULT-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP28:%.*]], label [[SCALAR_PH]]
				; DEFAULT: scalar.ph:
				; DEFAULT-NEXT: [[BC_RESUME_VAL:%.]] = phi i16 [ 756, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
				; DEFAULT-NEXT: br label [[FOR_BODY29:%.*]]
				; DEFAULT: for.cond.cleanup28:
				; DEFAULT-NEXT: unreachable
				; DEFAULT: for.body29:
				; DEFAULT-NEXT: [[I24_0170:%.]] = phi i16 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INC37:%.]], [[FOR_BODY29]] ]
				; DEFAULT-NEXT: [[ADD33:%.*]] = add i16 undef, [[I24_0170]]
				; DEFAULT-NEXT: [[IDXPROM34:%.*]] = zext i16 [[ADD33]] to i32
				; DEFAULT-NEXT: [[ARRAYIDX35:%.]] = getelementptr [2592 x i16], [2592 x i16] @cm_array, i32 0, i32 [[IDXPROM34]]
				; DEFAULT-NEXT: store i16 0, i16* [[ARRAYIDX35]], align 1
				; DEFAULT-NEXT: [[INC37]] = add i16 [[I24_0170]], 1
				; DEFAULT-NEXT: [[CMP26:%.*]] = icmp ult i16 [[INC37]], 756
				; DEFAULT-NEXT: br i1 [[CMP26]], label [[FOR_BODY29]], label [[FOR_COND_CLEANUP28]], !llvm.loop [[LOOP24:![0-9]+]]
	;			;
	entry:			entry:
	br label %for.body29			br label %for.body29

	for.cond.cleanup28:			for.cond.cleanup28:
	unreachable			unreachable

	for.body29:			for.body29:
	%i24.0170 = phi i16 [ 0, %entry], [ %inc37, %for.body29]			%i24.0170 = phi i16 [ 0, %entry], [ %inc37, %for.body29]
	%add33 = add i16 undef, %i24.0170			%add33 = add i16 undef, %i24.0170
	%idxprom34 = zext i16 %add33 to i32			%idxprom34 = zext i16 %add33 to i32
	%arrayidx35 = getelementptr [2592 x i16], [2592 x i16] * @cm_array, i32 0, i32 %idxprom34			%arrayidx35 = getelementptr [2592 x i16], [2592 x i16] * @cm_array, i32 0, i32 %idxprom34
	store i16 0, i16 * %arrayidx35, align 1			store i16 0, i16 * %arrayidx35, align 1
	%inc37 = add i16 %i24.0170, 1			%inc37 = add i16 %i24.0170, 1
	%cmp26 = icmp ult i16 %inc37, 756			%cmp26 = icmp ult i16 %inc37, 756
	br i1 %cmp26, label %for.body29, label %for.cond.cleanup28			br i1 %cmp26, label %for.body29, label %for.cond.cleanup28
	}			}

	define void @pr43371_pgso() !prof !14 {			define void @pr43371_pgso() !prof !14 {
				; PGSO-LABEL: @pr43371_pgso(
				; PGSO-NEXT: entry:
				; PGSO-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; PGSO: vector.ph:
				; PGSO-NEXT: br label [[VECTOR_BODY:%.*]]
				; PGSO: vector.body:
				; PGSO-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; PGSO-NEXT: [[VEC_IND:%.]] = phi <2 x i16> [ <i16 0, i16 1>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
				; PGSO-NEXT: [[TMP0:%.*]] = add <2 x i16> undef, [[VEC_IND]]
				; PGSO-NEXT: [[TMP1:%.*]] = zext <2 x i16> [[TMP0]] to <2 x i32>
				; PGSO-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0
				; PGSO-NEXT: [[TMP3:%.]] = getelementptr [2592 x i16], [2592 x i16] @cm_array, i32 0, i32 [[TMP2]]
				; PGSO-NEXT: [[TMP4:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1
				; PGSO-NEXT: [[TMP5:%.]] = getelementptr [2592 x i16], [2592 x i16] @cm_array, i32 0, i32 [[TMP4]]
				; PGSO-NEXT: store i16 0, i16* [[TMP3]], align 1
				; PGSO-NEXT: store i16 0, i16* [[TMP5]], align 1
				; PGSO-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 2
				; PGSO-NEXT: [[VEC_IND_NEXT]] = add <2 x i16> [[VEC_IND]], <i16 2, i16 2>
				; PGSO-NEXT: [[TMP6:%.*]] = icmp eq i32 [[INDEX_NEXT]], 756
				; PGSO-NEXT: br i1 [[TMP6]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP25:![0-9]+]]
				; PGSO: middle.block:
				; PGSO-NEXT: [[CMP_N:%.*]] = icmp eq i32 756, 756
				; PGSO-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP28:%.*]], label [[SCALAR_PH]]
				; PGSO: scalar.ph:
				; PGSO-NEXT: [[BC_RESUME_VAL:%.]] = phi i16 [ 756, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
				; PGSO-NEXT: br label [[FOR_BODY29:%.*]]
				; PGSO: for.cond.cleanup28:
				; PGSO-NEXT: unreachable
				; PGSO: for.body29:
				; PGSO-NEXT: [[I24_0170:%.]] = phi i16 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INC37:%.]], [[FOR_BODY29]] ]
				; PGSO-NEXT: [[ADD33:%.*]] = add i16 undef, [[I24_0170]]
				; PGSO-NEXT: [[IDXPROM34:%.*]] = zext i16 [[ADD33]] to i32
				; PGSO-NEXT: [[ARRAYIDX35:%.]] = getelementptr [2592 x i16], [2592 x i16] @cm_array, i32 0, i32 [[IDXPROM34]]
				; PGSO-NEXT: store i16 0, i16* [[ARRAYIDX35]], align 1
				; PGSO-NEXT: [[INC37]] = add i16 [[I24_0170]], 1
				; PGSO-NEXT: [[CMP26:%.*]] = icmp ult i16 [[INC37]], 756
				; PGSO-NEXT: br i1 [[CMP26]], label [[FOR_BODY29]], label [[FOR_COND_CLEANUP28]], !llvm.loop [[LOOP26:![0-9]+]]
	;			;
	; CHECK-LABEL: @pr43371_pgso			; NPGSO-LABEL: @pr43371_pgso(
	; CHECK-NOT: vector.scevcheck			; NPGSO-NEXT: entry:
	;			; NPGSO-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_SCEVCHECK:%.]]
	; We do not want to generate SCEV predicates when optimising for size, because			; NPGSO: vector.scevcheck:
	; that will lead to extra code generation such as the SCEV overflow runtime			; NPGSO-NEXT: br i1 undef, label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
	; checks. Not generating SCEV predicates can still result in vectorisation as			; NPGSO: vector.ph:
	; the non-consecutive loads/stores can be scalarized:			; NPGSO-NEXT: br label [[VECTOR_BODY:%.*]]
	;			; NPGSO: vector.body:
	; CHECK: vector.body:			; NPGSO-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK: store i16 0, i16* %{{.*}}, align 1			; NPGSO-NEXT: [[OFFSET_IDX:%.*]] = trunc i32 [[INDEX]] to i16
	; CHECK: store i16 0, i16* %{{.*}}, align 1			; NPGSO-NEXT: [[TMP0:%.*]] = add i16 [[OFFSET_IDX]], 0
	; CHECK: br i1 {{.*}}, label %vector.body			; NPGSO-NEXT: [[TMP1:%.*]] = add i16 undef, [[TMP0]]
				; NPGSO-NEXT: [[TMP2:%.*]] = zext i16 [[TMP1]] to i32
				; NPGSO-NEXT: [[TMP3:%.]] = getelementptr [2592 x i16], [2592 x i16] @cm_array, i32 0, i32 [[TMP2]]
				; NPGSO-NEXT: [[TMP4:%.]] = getelementptr i16, i16 [[TMP3]], i32 0
				; NPGSO-NEXT: [[TMP5:%.]] = bitcast i16 [[TMP4]] to <2 x i16>*
				; NPGSO-NEXT: store <2 x i16> zeroinitializer, <2 x i16>* [[TMP5]], align 1
				; NPGSO-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 2
				; NPGSO-NEXT: [[TMP6:%.*]] = icmp eq i32 [[INDEX_NEXT]], 756
				; NPGSO-NEXT: br i1 [[TMP6]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP25:![0-9]+]]
				; NPGSO: middle.block:
				; NPGSO-NEXT: [[CMP_N:%.*]] = icmp eq i32 756, 756
				; NPGSO-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP28:%.*]], label [[SCALAR_PH]]
				; NPGSO: scalar.ph:
				; NPGSO-NEXT: [[BC_RESUME_VAL:%.]] = phi i16 [ 756, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ], [ 0, [[VECTOR_SCEVCHECK]] ]
				; NPGSO-NEXT: br label [[FOR_BODY29:%.*]]
				; NPGSO: for.cond.cleanup28:
				; NPGSO-NEXT: unreachable
				; NPGSO: for.body29:
				; NPGSO-NEXT: [[I24_0170:%.]] = phi i16 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INC37:%.]], [[FOR_BODY29]] ]
				; NPGSO-NEXT: [[ADD33:%.*]] = add i16 undef, [[I24_0170]]
				; NPGSO-NEXT: [[IDXPROM34:%.*]] = zext i16 [[ADD33]] to i32
				; NPGSO-NEXT: [[ARRAYIDX35:%.]] = getelementptr [2592 x i16], [2592 x i16] @cm_array, i32 0, i32 [[IDXPROM34]]
				; NPGSO-NEXT: store i16 0, i16* [[ARRAYIDX35]], align 1
				; NPGSO-NEXT: [[INC37]] = add i16 [[I24_0170]], 1
				; NPGSO-NEXT: [[CMP26:%.*]] = icmp ult i16 [[INC37]], 756
				; NPGSO-NEXT: br i1 [[CMP26]], label [[FOR_BODY29]], label [[FOR_COND_CLEANUP28]], !llvm.loop [[LOOP26:![0-9]+]]
	;			;
	entry:			entry:
	br label %for.body29			br label %for.body29

	for.cond.cleanup28:			for.cond.cleanup28:
	unreachable			unreachable

	for.body29:			for.body29:
	%i24.0170 = phi i16 [ 0, %entry], [ %inc37, %for.body29]			%i24.0170 = phi i16 [ 0, %entry], [ %inc37, %for.body29]
	%add33 = add i16 undef, %i24.0170			%add33 = add i16 undef, %i24.0170
	%idxprom34 = zext i16 %add33 to i32			%idxprom34 = zext i16 %add33 to i32
	%arrayidx35 = getelementptr [2592 x i16], [2592 x i16] * @cm_array, i32 0, i32 %idxprom34			%arrayidx35 = getelementptr [2592 x i16], [2592 x i16] * @cm_array, i32 0, i32 %idxprom34
	store i16 0, i16 * %arrayidx35, align 1			store i16 0, i16 * %arrayidx35, align 1
	%inc37 = add i16 %i24.0170, 1			%inc37 = add i16 %i24.0170, 1
	%cmp26 = icmp ult i16 %inc37, 756			%cmp26 = icmp ult i16 %inc37, 756
	br i1 %cmp26, label %for.body29, label %for.cond.cleanup28			br i1 %cmp26, label %for.body29, label %for.cond.cleanup28
	}			}

	; PR45526: don't vectorize with fold-tail if first-order-recurrence is live-out.			; PR45526: don't vectorize with fold-tail if first-order-recurrence is live-out.
	;			;
	define i32 @pr45526() optsize {			define i32 @pr45526() optsize {
	;			; DEFAULT-LABEL: @pr45526(
	; CHECK-LABEL: @pr45526			; DEFAULT-NEXT: entry:
	; CHECK-NEXT: entry:			; DEFAULT-NEXT: br label [[LOOP:%.*]]
	; CHECK-NEXT: br label %loop			; DEFAULT: loop:
	; CHECK-EMPTY:			; DEFAULT-NEXT: [[PIV:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[PIVPLUS1:%.*]], [[LOOP]] ]
	; CHECK-NEXT: loop:			; DEFAULT-NEXT: [[FOR:%.*]] = phi i32 [ 5, [[ENTRY]] ], [ [[PIVPLUS1]], [[LOOP]] ]
	; CHECK-NEXT: %piv = phi i32 [ 0, %entry ], [ %pivPlus1, %loop ]			; DEFAULT-NEXT: [[PIVPLUS1]] = add nuw nsw i32 [[PIV]], 1
	; CHECK-NEXT: %for = phi i32 [ 5, %entry ], [ %pivPlus1, %loop ]			; DEFAULT-NEXT: [[COND:%.*]] = icmp ult i32 [[PIV]], 510
	; CHECK-NEXT: %pivPlus1 = add nuw nsw i32 %piv, 1			; DEFAULT-NEXT: br i1 [[COND]], label [[LOOP]], label [[EXIT:%.*]]
	; CHECK-NEXT: %cond = icmp ult i32 %piv, 510			; DEFAULT: exit:
	; CHECK-NEXT: br i1 %cond, label %loop, label %exit			; DEFAULT-NEXT: [[FOR_LCSSA:%.*]] = phi i32 [ [[FOR]], [[LOOP]] ]
	; CHECK-EMPTY:			; DEFAULT-NEXT: ret i32 [[FOR_LCSSA]]
	; CHECK-NEXT: exit:
	; CHECK-NEXT: %for.lcssa = phi i32 [ %for, %loop ]
	; CHECK-NEXT: ret i32 %for.lcssa
	;			;
	entry:			entry:
	br label %loop			br label %loop

	loop:			loop:
	%piv = phi i32 [ 0, %entry ], [ %pivPlus1, %loop ]			%piv = phi i32 [ 0, %entry ], [ %pivPlus1, %loop ]
	%for = phi i32 [ 5, %entry ], [ %pivPlus1, %loop ]			%for = phi i32 [ 5, %entry ], [ %pivPlus1, %loop ]
	%pivPlus1 = add nuw nsw i32 %piv, 1			%pivPlus1 = add nuw nsw i32 %piv, 1
	%cond = icmp ult i32 %piv, 510			%cond = icmp ult i32 %piv, 510
	br i1 %cond, label %loop, label %exit			br i1 %cond, label %loop, label %exit

	exit:			exit:
	ret i32 %for			ret i32 %for
	}			}

	define i32 @pr45526_pgso() !prof !14 {			define i32 @pr45526_pgso() !prof !14 {
				; PGSO-LABEL: @pr45526_pgso(
				; PGSO-NEXT: entry:
				; PGSO-NEXT: br label [[LOOP:%.*]]
				; PGSO: loop:
				; PGSO-NEXT: [[PIV:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[PIVPLUS1:%.*]], [[LOOP]] ]
				; PGSO-NEXT: [[FOR:%.*]] = phi i32 [ 5, [[ENTRY]] ], [ [[PIVPLUS1]], [[LOOP]] ]
				; PGSO-NEXT: [[PIVPLUS1]] = add nuw nsw i32 [[PIV]], 1
				; PGSO-NEXT: [[COND:%.*]] = icmp ult i32 [[PIV]], 510
				; PGSO-NEXT: br i1 [[COND]], label [[LOOP]], label [[EXIT:%.*]]
				; PGSO: exit:
				; PGSO-NEXT: [[FOR_LCSSA:%.*]] = phi i32 [ [[FOR]], [[LOOP]] ]
				; PGSO-NEXT: ret i32 [[FOR_LCSSA]]
	;			;
	; CHECK-LABEL: @pr45526_pgso			; NPGSO-LABEL: @pr45526_pgso(
	; CHECK-NEXT: entry:			; NPGSO-NEXT: entry:
	; CHECK-NEXT: br label %loop			; NPGSO-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; CHECK-EMPTY:			; NPGSO: vector.ph:
	; CHECK-NEXT: loop:			; NPGSO-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK-NEXT: %piv = phi i32 [ 0, %entry ], [ %pivPlus1, %loop ]			; NPGSO: vector.body:
	; CHECK-NEXT: %for = phi i32 [ 5, %entry ], [ %pivPlus1, %loop ]			; NPGSO-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: %pivPlus1 = add nuw nsw i32 %piv, 1			; NPGSO-NEXT: [[VEC_IND:%.]] = phi <4 x i32> [ <i32 0, i32 1, i32 2, i32 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: %cond = icmp ult i32 %piv, 510			; NPGSO-NEXT: [[VECTOR_RECUR:%.]] = phi <4 x i32> [ <i32 poison, i32 poison, i32 poison, i32 5>, [[VECTOR_PH]] ], [ [[TMP4:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: br i1 %cond, label %loop, label %exit			; NPGSO-NEXT: [[TMP0:%.*]] = add i32 [[INDEX]], 0
	; CHECK-EMPTY:			; NPGSO-NEXT: [[TMP1:%.*]] = add i32 [[INDEX]], 1
	; CHECK-NEXT: exit:			; NPGSO-NEXT: [[TMP2:%.*]] = add i32 [[INDEX]], 2
	; CHECK-NEXT: %for.lcssa = phi i32 [ %for, %loop ]			; NPGSO-NEXT: [[TMP3:%.*]] = add i32 [[INDEX]], 3
	; CHECK-NEXT: ret i32 %for.lcssa			; NPGSO-NEXT: [[TMP4]] = add nuw nsw <4 x i32> [[VEC_IND]], <i32 1, i32 1, i32 1, i32 1>
				; NPGSO-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[VECTOR_RECUR]], <4 x i32> [[TMP4]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
				; NPGSO-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 4
				; NPGSO-NEXT: [[VEC_IND_NEXT]] = add <4 x i32> [[VEC_IND]], <i32 4, i32 4, i32 4, i32 4>
				; NPGSO-NEXT: [[TMP6:%.*]] = icmp eq i32 [[INDEX_NEXT]], 508
				; NPGSO-NEXT: br i1 [[TMP6]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP27:![0-9]+]]
				; NPGSO: middle.block:
				; NPGSO-NEXT: [[CMP_N:%.*]] = icmp eq i32 511, 508
				; NPGSO-NEXT: [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <4 x i32> [[TMP4]], i32 3
				; NPGSO-NEXT: [[VECTOR_RECUR_EXTRACT_FOR_PHI:%.*]] = extractelement <4 x i32> [[TMP4]], i32 2
				; NPGSO-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
				; NPGSO: scalar.ph:
				; NPGSO-NEXT: [[SCALAR_RECUR_INIT:%.]] = phi i32 [ 5, [[ENTRY:%.]] ], [ [[VECTOR_RECUR_EXTRACT]], [[MIDDLE_BLOCK]] ]
				; NPGSO-NEXT: [[BC_RESUME_VAL:%.*]] = phi i32 [ 508, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY]] ]
				; NPGSO-NEXT: br label [[LOOP:%.*]]
				; NPGSO: loop:
				; NPGSO-NEXT: [[PIV:%.]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[PIVPLUS1:%.]], [[LOOP]] ]
				; NPGSO-NEXT: [[SCALAR_RECUR:%.*]] = phi i32 [ [[SCALAR_RECUR_INIT]], [[SCALAR_PH]] ], [ [[PIVPLUS1]], [[LOOP]] ]
				; NPGSO-NEXT: [[PIVPLUS1]] = add nuw nsw i32 [[PIV]], 1
				; NPGSO-NEXT: [[COND:%.*]] = icmp ult i32 [[PIV]], 510
				; NPGSO-NEXT: br i1 [[COND]], label [[LOOP]], label [[EXIT]], !llvm.loop [[LOOP28:![0-9]+]]
				; NPGSO: exit:
				; NPGSO-NEXT: [[FOR_LCSSA:%.*]] = phi i32 [ [[SCALAR_RECUR]], [[LOOP]] ], [ [[VECTOR_RECUR_EXTRACT_FOR_PHI]], [[MIDDLE_BLOCK]] ]
				; NPGSO-NEXT: ret i32 [[FOR_LCSSA]]
	;			;
	entry:			entry:
	br label %loop			br label %loop

	loop:			loop:
	%piv = phi i32 [ 0, %entry ], [ %pivPlus1, %loop ]			%piv = phi i32 [ 0, %entry ], [ %pivPlus1, %loop ]
	%for = phi i32 [ 5, %entry ], [ %pivPlus1, %loop ]			%for = phi i32 [ 5, %entry ], [ %pivPlus1, %loop ]
	%pivPlus1 = add nuw nsw i32 %piv, 1			%pivPlus1 = add nuw nsw i32 %piv, 1
	%cond = icmp ult i32 %piv, 510			%cond = icmp ult i32 %piv, 510
	br i1 %cond, label %loop, label %exit			br i1 %cond, label %loop, label %exit

	exit:			exit:
	ret i32 %for			ret i32 %for
	}			}

	; PR46228: Vectorize w/o versioning for unit stride under optsize and enabled			; PR46228: Vectorize w/o versioning for unit stride under optsize and enabled
	; vectorization.			; vectorization.

	; NOTE: Some assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Some assertions have been autogenerated by utils/update_test_checks.py
	define void @stride1(i16* noalias %B, i32 %BStride) optsize {			define void @stride1(i16* noalias %B, i32 %BStride) optsize {
	; CHECK-LABEL: @stride1(
	; CHECK-NEXT: entry:
	; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; CHECK: vector.ph:
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <2 x i32> poison, i32 [[BSTRIDE:%.]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <2 x i32> [[BROADCAST_SPLATINSERT]], <2 x i32> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[PRED_STORE_CONTINUE2:%.*]] ]
	; CHECK-NEXT: [[VEC_IND:%.]] = phi <2 x i32> [ <i32 0, i32 1>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[PRED_STORE_CONTINUE2]] ]
	; CHECK-NEXT: [[TMP1:%.*]] = icmp ule <2 x i32> [[VEC_IND]], <i32 1024, i32 1024>
	; CHECK-NEXT: [[TMP0:%.*]] = mul nsw <2 x i32> [[VEC_IND]], [[BROADCAST_SPLAT]]
	; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x i1> [[TMP1]], i32 0
	; CHECK-NEXT: br i1 [[TMP2]], label [[PRED_STORE_IF:%.]], label [[PRED_STORE_CONTINUE:%.]]
	; CHECK: pred.store.if:
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP0]], i32 0
	; CHECK-NEXT: [[TMP4:%.]] = getelementptr inbounds i16, i16 [[B:%.*]], i32 [[TMP3]]
	; CHECK-NEXT: store i16 42, i16* [[TMP4]], align 4
	; CHECK-NEXT: br label [[PRED_STORE_CONTINUE]]
	; CHECK: pred.store.continue:
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x i1> [[TMP1]], i32 1
	; CHECK-NEXT: br i1 [[TMP5]], label [[PRED_STORE_IF1:%.*]], label [[PRED_STORE_CONTINUE2]]
	; CHECK: pred.store.if1:
	; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x i32> [[TMP0]], i32 1
	; CHECK-NEXT: [[TMP7:%.]] = getelementptr inbounds i16, i16 [[B]], i32 [[TMP6]]
	; CHECK-NEXT: store i16 42, i16* [[TMP7]], align 4
	; CHECK-NEXT: br label [[PRED_STORE_CONTINUE2]]
	; CHECK: pred.store.continue2:
	; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 2
	; CHECK-NEXT: [[VEC_IND_NEXT]] = add <2 x i32> [[VEC_IND]], <i32 2, i32 2>
	; CHECK-NEXT: [[TMP8:%.*]] = icmp eq i32 [[INDEX_NEXT]], 1026
	; CHECK-NEXT: br i1 [[TMP8]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !21
	; CHECK: middle.block:
	; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:
	; CHECK: for.end:
	; CHECK-NEXT: ret void
	;
	; PGSO-LABEL: @stride1(			; PGSO-LABEL: @stride1(
	; PGSO-NEXT: entry:			; PGSO-NEXT: entry:
	; PGSO-NEXT: br i1 false, label %scalar.ph, label %vector.ph			; PGSO-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; PGSO: vector.ph:
				; PGSO-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <2 x i32> poison, i32 [[BSTRIDE:%.]], i32 0
				; PGSO-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <2 x i32> [[BROADCAST_SPLATINSERT]], <2 x i32> poison, <2 x i32> zeroinitializer
				; PGSO-NEXT: br label [[VECTOR_BODY:%.*]]
				; PGSO: vector.body:
				; PGSO-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[PRED_STORE_CONTINUE2:%.*]] ]
				; PGSO-NEXT: [[VEC_IND:%.]] = phi <2 x i32> [ <i32 0, i32 1>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[PRED_STORE_CONTINUE2]] ]
				; PGSO-NEXT: [[TMP0:%.*]] = icmp ule <2 x i32> [[VEC_IND]], <i32 1024, i32 1024>
				; PGSO-NEXT: [[TMP1:%.*]] = mul nsw <2 x i32> [[VEC_IND]], [[BROADCAST_SPLAT]]
				; PGSO-NEXT: [[TMP2:%.*]] = extractelement <2 x i1> [[TMP0]], i32 0
				; PGSO-NEXT: br i1 [[TMP2]], label [[PRED_STORE_IF:%.]], label [[PRED_STORE_CONTINUE:%.]]
				; PGSO: pred.store.if:
				; PGSO-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0
				; PGSO-NEXT: [[TMP4:%.]] = getelementptr inbounds i16, i16 [[B:%.*]], i32 [[TMP3]]
				; PGSO-NEXT: store i16 42, i16* [[TMP4]], align 4
				; PGSO-NEXT: br label [[PRED_STORE_CONTINUE]]
				; PGSO: pred.store.continue:
				; PGSO-NEXT: [[TMP5:%.*]] = extractelement <2 x i1> [[TMP0]], i32 1
				; PGSO-NEXT: br i1 [[TMP5]], label [[PRED_STORE_IF1:%.*]], label [[PRED_STORE_CONTINUE2]]
				; PGSO: pred.store.if1:
				; PGSO-NEXT: [[TMP6:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1
				; PGSO-NEXT: [[TMP7:%.]] = getelementptr inbounds i16, i16 [[B]], i32 [[TMP6]]
				; PGSO-NEXT: store i16 42, i16* [[TMP7]], align 4
				; PGSO-NEXT: br label [[PRED_STORE_CONTINUE2]]
				; PGSO: pred.store.continue2:
				; PGSO-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 2
				; PGSO-NEXT: [[VEC_IND_NEXT]] = add <2 x i32> [[VEC_IND]], <i32 2, i32 2>
				; PGSO-NEXT: [[TMP8:%.*]] = icmp eq i32 [[INDEX_NEXT]], 1026
				; PGSO-NEXT: br i1 [[TMP8]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP27:![0-9]+]]
				; PGSO: middle.block:
				; PGSO-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]
				; PGSO: scalar.ph:
				; PGSO-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ 1026, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
				; PGSO-NEXT: br label [[FOR_BODY:%.*]]
				; PGSO: for.body:
				; PGSO-NEXT: [[IV:%.]] = phi i32 [ [[IV_NEXT:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
				; PGSO-NEXT: [[MULB:%.*]] = mul nsw i32 [[IV]], [[BSTRIDE]]
				; PGSO-NEXT: [[GEPOFB:%.]] = getelementptr inbounds i16, i16 [[B]], i32 [[MULB]]
				; PGSO-NEXT: store i16 42, i16* [[GEPOFB]], align 4
				; PGSO-NEXT: [[IV_NEXT]] = add nuw nsw i32 [[IV]], 1
				; PGSO-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[IV_NEXT]], 1025
				; PGSO-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP28:![0-9]+]]
				; PGSO: for.end:
				; PGSO-NEXT: ret void
	;			;
	; NPGSO-LABEL: @stride1(			; NPGSO-LABEL: @stride1(
	; NPGSO-NEXT: entry:			; NPGSO-NEXT: entry:
	; NPGSO-NEXT: br i1 false, label %scalar.ph, label %vector.ph			; NPGSO-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; NPGSO: vector.ph:
				; NPGSO-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <2 x i32> poison, i32 [[BSTRIDE:%.]], i32 0
				; NPGSO-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <2 x i32> [[BROADCAST_SPLATINSERT]], <2 x i32> poison, <2 x i32> zeroinitializer
				; NPGSO-NEXT: br label [[VECTOR_BODY:%.*]]
				; NPGSO: vector.body:
				; NPGSO-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[PRED_STORE_CONTINUE2:%.*]] ]
				; NPGSO-NEXT: [[VEC_IND:%.]] = phi <2 x i32> [ <i32 0, i32 1>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[PRED_STORE_CONTINUE2]] ]
				; NPGSO-NEXT: [[TMP0:%.*]] = icmp ule <2 x i32> [[VEC_IND]], <i32 1024, i32 1024>
				; NPGSO-NEXT: [[TMP1:%.*]] = mul nsw <2 x i32> [[VEC_IND]], [[BROADCAST_SPLAT]]
				; NPGSO-NEXT: [[TMP2:%.*]] = extractelement <2 x i1> [[TMP0]], i32 0
				; NPGSO-NEXT: br i1 [[TMP2]], label [[PRED_STORE_IF:%.]], label [[PRED_STORE_CONTINUE:%.]]
				; NPGSO: pred.store.if:
				; NPGSO-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0
				; NPGSO-NEXT: [[TMP4:%.]] = getelementptr inbounds i16, i16 [[B:%.*]], i32 [[TMP3]]
				; NPGSO-NEXT: store i16 42, i16* [[TMP4]], align 4
				; NPGSO-NEXT: br label [[PRED_STORE_CONTINUE]]
				; NPGSO: pred.store.continue:
				; NPGSO-NEXT: [[TMP5:%.*]] = extractelement <2 x i1> [[TMP0]], i32 1
				; NPGSO-NEXT: br i1 [[TMP5]], label [[PRED_STORE_IF1:%.*]], label [[PRED_STORE_CONTINUE2]]
				; NPGSO: pred.store.if1:
				; NPGSO-NEXT: [[TMP6:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1
				; NPGSO-NEXT: [[TMP7:%.]] = getelementptr inbounds i16, i16 [[B]], i32 [[TMP6]]
				; NPGSO-NEXT: store i16 42, i16* [[TMP7]], align 4
				; NPGSO-NEXT: br label [[PRED_STORE_CONTINUE2]]
				; NPGSO: pred.store.continue2:
				; NPGSO-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 2
				; NPGSO-NEXT: [[VEC_IND_NEXT]] = add <2 x i32> [[VEC_IND]], <i32 2, i32 2>
				; NPGSO-NEXT: [[TMP8:%.*]] = icmp eq i32 [[INDEX_NEXT]], 1026
				; NPGSO-NEXT: br i1 [[TMP8]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP29:![0-9]+]]
				; NPGSO: middle.block:
				; NPGSO-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]
				; NPGSO: scalar.ph:
				; NPGSO-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ 1026, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
				; NPGSO-NEXT: br label [[FOR_BODY:%.*]]
				; NPGSO: for.body:
				; NPGSO-NEXT: [[IV:%.]] = phi i32 [ [[IV_NEXT:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
				; NPGSO-NEXT: [[MULB:%.*]] = mul nsw i32 [[IV]], [[BSTRIDE]]
				; NPGSO-NEXT: [[GEPOFB:%.]] = getelementptr inbounds i16, i16 [[B]], i32 [[MULB]]
				; NPGSO-NEXT: store i16 42, i16* [[GEPOFB]], align 4
				; NPGSO-NEXT: [[IV_NEXT]] = add nuw nsw i32 [[IV]], 1
				; NPGSO-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[IV_NEXT]], 1025
				; NPGSO-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP30:![0-9]+]]
				; NPGSO: for.end:
				; NPGSO-NEXT: ret void
				;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%iv = phi i32 [ %iv.next, %for.body ], [ 0, %entry ]			%iv = phi i32 [ %iv.next, %for.body ], [ 0, %entry ]
	%mulB = mul nsw i32 %iv, %BStride			%mulB = mul nsw i32 %iv, %BStride
	%gepOfB = getelementptr inbounds i16, i16* %B, i32 %mulB			%gepOfB = getelementptr inbounds i16, i16* %B, i32 %mulB
	store i16 42, i16* %gepOfB, align 4			store i16 42, i16* %gepOfB, align 4
	%iv.next = add nuw nsw i32 %iv, 1			%iv.next = add nuw nsw i32 %iv, 1
	%exitcond = icmp eq i32 %iv.next, 1025			%exitcond = icmp eq i32 %iv.next, 1025
	br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !15			br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !15

	for.end:			for.end:
	ret void			ret void
	}			}

	; Vectorize with versioning for unit stride for PGSO and enabled vectorization.			; Vectorize with versioning for unit stride for PGSO and enabled vectorization.
	;			;
	define void @stride1_pgso(i16* noalias %B, i32 %BStride) !prof !14 {			define void @stride1_pgso(i16* noalias %B, i32 %BStride) !prof !14 {
	; CHECK-LABEL: @stride1_pgso(
	; CHECK: vector.body
	;
	; PGSO-LABEL: @stride1_pgso(			; PGSO-LABEL: @stride1_pgso(
	; PGSO: vector.body			; PGSO-NEXT: entry:
				; PGSO-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_SCEVCHECK:%.]]
				; PGSO: vector.scevcheck:
				; PGSO-NEXT: [[IDENT_CHECK:%.]] = icmp ne i32 [[BSTRIDE:%.]], 1
				; PGSO-NEXT: br i1 [[IDENT_CHECK]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
				; PGSO: vector.ph:
				; PGSO-NEXT: br label [[VECTOR_BODY:%.*]]
				; PGSO: vector.body:
				; PGSO-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; PGSO-NEXT: [[TMP0:%.*]] = add i32 [[INDEX]], 0
				; PGSO-NEXT: [[TMP1:%.*]] = mul nsw i32 [[TMP0]], [[BSTRIDE]]
				; PGSO-NEXT: [[TMP2:%.]] = getelementptr inbounds i16, i16 [[B:%.*]], i32 [[TMP1]]
				; PGSO-NEXT: [[TMP3:%.]] = getelementptr inbounds i16, i16 [[TMP2]], i32 0
				; PGSO-NEXT: [[TMP4:%.]] = bitcast i16 [[TMP3]] to <2 x i16>*
				; PGSO-NEXT: store <2 x i16> <i16 42, i16 42>, <2 x i16>* [[TMP4]], align 4
				; PGSO-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 2
				; PGSO-NEXT: [[TMP5:%.*]] = icmp eq i32 [[INDEX_NEXT]], 1024
				; PGSO-NEXT: br i1 [[TMP5]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP29:![0-9]+]]
				; PGSO: middle.block:
				; PGSO-NEXT: [[CMP_N:%.*]] = icmp eq i32 1025, 1024
				; PGSO-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
				; PGSO: scalar.ph:
				; PGSO-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ 1024, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ], [ 0, [[VECTOR_SCEVCHECK]] ]
				; PGSO-NEXT: br label [[FOR_BODY:%.*]]
				; PGSO: for.body:
				; PGSO-NEXT: [[IV:%.]] = phi i32 [ [[IV_NEXT:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
				; PGSO-NEXT: [[MULB:%.*]] = mul nsw i32 [[IV]], [[BSTRIDE]]
				; PGSO-NEXT: [[GEPOFB:%.]] = getelementptr inbounds i16, i16 [[B]], i32 [[MULB]]
				; PGSO-NEXT: store i16 42, i16* [[GEPOFB]], align 4
				; PGSO-NEXT: [[IV_NEXT]] = add nuw nsw i32 [[IV]], 1
				; PGSO-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[IV_NEXT]], 1025
				; PGSO-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP30:![0-9]+]]
				; PGSO: for.end:
				; PGSO-NEXT: ret void
	;			;
	; NPGSO-LABEL: @stride1_pgso(			; NPGSO-LABEL: @stride1_pgso(
	; NPGSO: vector.body			; NPGSO-NEXT: entry:
				; NPGSO-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_SCEVCHECK:%.]]
				; NPGSO: vector.scevcheck:
				; NPGSO-NEXT: [[IDENT_CHECK:%.]] = icmp ne i32 [[BSTRIDE:%.]], 1
				; NPGSO-NEXT: br i1 [[IDENT_CHECK]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
				; NPGSO: vector.ph:
				; NPGSO-NEXT: br label [[VECTOR_BODY:%.*]]
				; NPGSO: vector.body:
				; NPGSO-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; NPGSO-NEXT: [[TMP0:%.*]] = add i32 [[INDEX]], 0
				; NPGSO-NEXT: [[TMP1:%.*]] = mul nsw i32 [[TMP0]], [[BSTRIDE]]
				; NPGSO-NEXT: [[TMP2:%.]] = getelementptr inbounds i16, i16 [[B:%.*]], i32 [[TMP1]]
				; NPGSO-NEXT: [[TMP3:%.]] = getelementptr inbounds i16, i16 [[TMP2]], i32 0
				; NPGSO-NEXT: [[TMP4:%.]] = bitcast i16 [[TMP3]] to <2 x i16>*
				; NPGSO-NEXT: store <2 x i16> <i16 42, i16 42>, <2 x i16>* [[TMP4]], align 4
				; NPGSO-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 2
				; NPGSO-NEXT: [[TMP5:%.*]] = icmp eq i32 [[INDEX_NEXT]], 1024
				; NPGSO-NEXT: br i1 [[TMP5]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP31:![0-9]+]]
				; NPGSO: middle.block:
				; NPGSO-NEXT: [[CMP_N:%.*]] = icmp eq i32 1025, 1024
				; NPGSO-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
				; NPGSO: scalar.ph:
				; NPGSO-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ 1024, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ], [ 0, [[VECTOR_SCEVCHECK]] ]
				; NPGSO-NEXT: br label [[FOR_BODY:%.*]]
				; NPGSO: for.body:
				; NPGSO-NEXT: [[IV:%.]] = phi i32 [ [[IV_NEXT:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
				; NPGSO-NEXT: [[MULB:%.*]] = mul nsw i32 [[IV]], [[BSTRIDE]]
				; NPGSO-NEXT: [[GEPOFB:%.]] = getelementptr inbounds i16, i16 [[B]], i32 [[MULB]]
				; NPGSO-NEXT: store i16 42, i16* [[GEPOFB]], align 4
				; NPGSO-NEXT: [[IV_NEXT]] = add nuw nsw i32 [[IV]], 1
				; NPGSO-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[IV_NEXT]], 1025
				; NPGSO-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP32:![0-9]+]]
				; NPGSO: for.end:
				; NPGSO-NEXT: ret void
				;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%iv = phi i32 [ %iv.next, %for.body ], [ 0, %entry ]			%iv = phi i32 [ %iv.next, %for.body ], [ 0, %entry ]
	%mulB = mul nsw i32 %iv, %BStride			%mulB = mul nsw i32 %iv, %BStride
	%gepOfB = getelementptr inbounds i16, i16* %B, i32 %mulB			%gepOfB = getelementptr inbounds i16, i16* %B, i32 %mulB
	store i16 42, i16* %gepOfB, align 4			store i16 42, i16* %gepOfB, align 4
	%iv.next = add nuw nsw i32 %iv, 1			%iv.next = add nuw nsw i32 %iv, 1
	%exitcond = icmp eq i32 %iv.next, 1025			%exitcond = icmp eq i32 %iv.next, 1025
	br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !15			br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !15

	for.end:			for.end:
	ret void			ret void
	}			}

	; PR46652: Check that the need for stride==1 check prevents vectorizing a loop			; PR46652: Check that the need for stride==1 check prevents vectorizing a loop
	; having tiny trip count, when compiling w/o -Os/-Oz.			; having tiny trip count, when compiling w/o -Os/-Oz.
	; CHECK-LABEL: @pr46652
	; CHECK-NOT: vector.scevcheck
	; CHECK-NOT: vector.body
	; CHECK-LABEL: for.body

	@g = external global [1 x i16], align 1			@g = external global [1 x i16], align 1

	define void @pr46652(i16 %stride) {			define void @pr46652(i16 %stride) {
				; DEFAULT-LABEL: @pr46652(
				; DEFAULT-NEXT: entry:
				; DEFAULT-NEXT: br label [[FOR_BODY:%.*]]
				; DEFAULT: for.body:
				; DEFAULT-NEXT: [[L1_02:%.]] = phi i16 [ 1, [[ENTRY:%.]] ], [ [[INC9:%.*]], [[FOR_BODY]] ]
				; DEFAULT-NEXT: [[MUL:%.]] = mul nsw i16 [[L1_02]], [[STRIDE:%.]]
				; DEFAULT-NEXT: [[ARRAYIDX6:%.]] = getelementptr inbounds [1 x i16], [1 x i16] @g, i16 0, i16 [[MUL]]
				; DEFAULT-NEXT: [[TMP0:%.]] = load i16, i16 [[ARRAYIDX6]], align 1
				; DEFAULT-NEXT: [[INC9]] = add nuw nsw i16 [[L1_02]], 1
				; DEFAULT-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i16 [[INC9]], 16
				; DEFAULT-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END:%.*]], label [[FOR_BODY]]
				; DEFAULT: for.end:
				; DEFAULT-NEXT: ret void
				;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %for.body, %entry			for.body: ; preds = %for.body, %entry
	%l1.02 = phi i16 [ 1, %entry ], [ %inc9, %for.body ]			%l1.02 = phi i16 [ 1, %entry ], [ %inc9, %for.body ]
	%mul = mul nsw i16 %l1.02, %stride			%mul = mul nsw i16 %l1.02, %stride
	%arrayidx6 = getelementptr inbounds [1 x i16], [1 x i16]* @g, i16 0, i16 %mul			%arrayidx6 = getelementptr inbounds [1 x i16], [1 x i16]* @g, i16 0, i16 %mul
	%0 = load i16, i16* %arrayidx6, align 1			%0 = load i16, i16* %arrayidx6, align 1
	%inc9 = add nuw nsw i16 %l1.02, 1			%inc9 = add nuw nsw i16 %l1.02, 1
	%exitcond.not = icmp eq i16 %inc9, 16			%exitcond.not = icmp eq i16 %inc9, 16
	br i1 %exitcond.not, label %for.end, label %for.body			br i1 %exitcond.not, label %for.end, label %for.body

	for.end: ; preds = %for.body			for.end: ; preds = %for.body
	ret void			ret void
	}			}

	; Make sure we do not crash while building the VPlan for the loop with the			; Make sure we do not crash while building the VPlan for the loop with the
	; select below.			; select below.
	define i32 @PR48142(i32* %ptr.start, i32* %ptr.end) optsize {			define i32 @PR48142(i32* %ptr.start, i32* %ptr.end) optsize {
	; CHECK-LABEL: PR48142			; DEFAULT-LABEL: @PR48142(
	; CHECK-NOT: vector.body			; DEFAULT-NEXT: entry:
				; DEFAULT-NEXT: br label [[FOR_BODY:%.*]]
				; DEFAULT: for.body:
				; DEFAULT-NEXT: [[I_014:%.]] = phi i32 [ 20, [[ENTRY:%.]] ], [ [[COND:%.*]], [[FOR_BODY]] ]
				; DEFAULT-NEXT: [[PTR_IV:%.]] = phi i32 [ [[PTR_START:%.]], [[ENTRY]] ], [ [[PTR_NEXT:%.]], [[FOR_BODY]] ]
				; DEFAULT-NEXT: [[CMP4:%.*]] = icmp slt i32 [[I_014]], 99
				; DEFAULT-NEXT: [[COND]] = select i1 [[CMP4]], i32 99, i32 [[I_014]]
				; DEFAULT-NEXT: store i32 0, i32* [[PTR_IV]], align 4
				; DEFAULT-NEXT: [[PTR_NEXT]] = getelementptr inbounds i32, i32* [[PTR_IV]], i64 1
				; DEFAULT-NEXT: [[CMP_NOT:%.]] = icmp eq i32 [[PTR_NEXT]], [[PTR_END:%.*]]
				; DEFAULT-NEXT: br i1 [[CMP_NOT]], label [[EXIT:%.*]], label [[FOR_BODY]]
				; DEFAULT: exit:
				; DEFAULT-NEXT: [[RES:%.*]] = phi i32 [ [[COND]], [[FOR_BODY]] ]
				; DEFAULT-NEXT: ret i32 [[RES]]
				;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%i.014 = phi i32 [ 20, %entry ], [ %cond, %for.body ]			%i.014 = phi i32 [ 20, %entry ], [ %cond, %for.body ]
	%ptr.iv = phi i32* [ %ptr.start, %entry ], [ %ptr.next, %for.body ]			%ptr.iv = phi i32* [ %ptr.start, %entry ], [ %ptr.next, %for.body ]
	%cmp4 = icmp slt i32 %i.014, 99			%cmp4 = icmp slt i32 %i.014, 99
	%cond = select i1 %cmp4, i32 99, i32 %i.014			%cond = select i1 %cmp4, i32 99, i32 %i.014
	Show All 28 Lines

llvm/test/Transforms/LoopVectorize/tripcount.ll

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; This test verifies that the loop vectorizer will not vectorizes low trip count			; This test verifies that the loop vectorizer will not vectorizes low trip count
	; loops that require runtime checks (Trip count is computed with profile info).			; loops that require runtime checks (Trip count is computed with profile info).
	; REQUIRES: asserts			; REQUIRES: asserts
	; RUN: opt < %s -loop-vectorize -loop-vectorize-with-block-frequency -S \| FileCheck %s			; RUN: opt < %s -loop-vectorize -loop-vectorize-with-block-frequency -S \| FileCheck %s

	target datalayout = "E-m:e-p:32:32-i64:32-f64:32:64-a:0:32-n32-S128"			target datalayout = "E-m:e-p:32:32-i64:32-f64:32:64-a:0:32-n32-S128"

	@tab = common global [32 x i8] zeroinitializer, align 1			@tab = common global [32 x i8] zeroinitializer, align 1

	define i32 @foo_low_trip_count1(i32 %bound) {			define i32 @foo_low_trip_count1(i32 %bound) {
	; Simple loop with low tripcount. Should not be vectorized.			; Simple loop with low tripcount. Should not be vectorized.

	; CHECK-LABEL: @foo_low_trip_count1(			; CHECK-LABEL: @foo_low_trip_count1(
	; CHECK-NOT: <{{[0-9]+}} x i8>			; CHECK-NEXT: entry:
				; CHECK-NEXT: [[TMP0:%.]] = add i32 [[BOUND:%.]], 1
				; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; CHECK: vector.ph:
				; CHECK-NEXT: [[N_RND_UP:%.*]] = add i32 [[TMP0]], 3
				; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i32 [[N_RND_UP]], 4
				; CHECK-NEXT: [[N_VEC:%.*]] = sub i32 [[N_RND_UP]], [[N_MOD_VF]]
				; CHECK-NEXT: [[TRIP_COUNT_MINUS_1:%.*]] = sub i32 [[TMP0]], 1
				; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x i32> poison, i32 [[TRIP_COUNT_MINUS_1]], i32 0
				; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT]], <4 x i32> poison, <4 x i32> zeroinitializer
				; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
				; CHECK: vector.body:
				; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[PRED_STORE_CONTINUE12:%.*]] ]
				; CHECK-NEXT: [[VEC_IND:%.]] = phi <4 x i32> [ <i32 0, i32 1, i32 2, i32 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[PRED_STORE_CONTINUE12]] ]
				; CHECK-NEXT: [[TMP1:%.*]] = add i32 [[INDEX]], 0
				; CHECK-NEXT: [[TMP2:%.*]] = add i32 [[INDEX]], 1
				; CHECK-NEXT: [[TMP3:%.*]] = add i32 [[INDEX]], 2
				; CHECK-NEXT: [[TMP4:%.*]] = add i32 [[INDEX]], 3
				; CHECK-NEXT: [[TMP5:%.*]] = icmp ule <4 x i32> [[VEC_IND]], [[BROADCAST_SPLAT]]
				; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds [32 x i8], [32 x i8] @tab, i32 0, i32 [[TMP1]]
				; CHECK-NEXT: [[TMP7:%.]] = getelementptr inbounds [32 x i8], [32 x i8] @tab, i32 0, i32 [[TMP2]]
				; CHECK-NEXT: [[TMP8:%.]] = getelementptr inbounds [32 x i8], [32 x i8] @tab, i32 0, i32 [[TMP3]]
				; CHECK-NEXT: [[TMP9:%.]] = getelementptr inbounds [32 x i8], [32 x i8] @tab, i32 0, i32 [[TMP4]]
				; CHECK-NEXT: [[TMP10:%.*]] = extractelement <4 x i1> [[TMP5]], i32 0
				; CHECK-NEXT: br i1 [[TMP10]], label [[PRED_LOAD_IF:%.]], label [[PRED_LOAD_CONTINUE:%.]]
				; CHECK: pred.load.if:
				; CHECK-NEXT: [[TMP11:%.]] = load i8, i8 [[TMP6]], align 1
				; CHECK-NEXT: [[TMP12:%.*]] = insertelement <4 x i8> poison, i8 [[TMP11]], i32 0
				; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE]]
				; CHECK: pred.load.continue:
				; CHECK-NEXT: [[TMP13:%.*]] = phi <4 x i8> [ poison, [[VECTOR_BODY]] ], [ [[TMP12]], [[PRED_LOAD_IF]] ]
				; CHECK-NEXT: [[TMP14:%.*]] = extractelement <4 x i1> [[TMP5]], i32 1
				; CHECK-NEXT: br i1 [[TMP14]], label [[PRED_LOAD_IF1:%.]], label [[PRED_LOAD_CONTINUE2:%.]]
				; CHECK: pred.load.if1:
				; CHECK-NEXT: [[TMP15:%.]] = load i8, i8 [[TMP7]], align 1
				; CHECK-NEXT: [[TMP16:%.*]] = insertelement <4 x i8> [[TMP13]], i8 [[TMP15]], i32 1
				; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE2]]
				; CHECK: pred.load.continue2:
				; CHECK-NEXT: [[TMP17:%.*]] = phi <4 x i8> [ [[TMP13]], [[PRED_LOAD_CONTINUE]] ], [ [[TMP16]], [[PRED_LOAD_IF1]] ]
				; CHECK-NEXT: [[TMP18:%.*]] = extractelement <4 x i1> [[TMP5]], i32 2
				; CHECK-NEXT: br i1 [[TMP18]], label [[PRED_LOAD_IF3:%.]], label [[PRED_LOAD_CONTINUE4:%.]]
				; CHECK: pred.load.if3:
				; CHECK-NEXT: [[TMP19:%.]] = load i8, i8 [[TMP8]], align 1
				; CHECK-NEXT: [[TMP20:%.*]] = insertelement <4 x i8> [[TMP17]], i8 [[TMP19]], i32 2
				; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE4]]
				; CHECK: pred.load.continue4:
				; CHECK-NEXT: [[TMP21:%.*]] = phi <4 x i8> [ [[TMP17]], [[PRED_LOAD_CONTINUE2]] ], [ [[TMP20]], [[PRED_LOAD_IF3]] ]
				; CHECK-NEXT: [[TMP22:%.*]] = extractelement <4 x i1> [[TMP5]], i32 3
				; CHECK-NEXT: br i1 [[TMP22]], label [[PRED_LOAD_IF5:%.]], label [[PRED_LOAD_CONTINUE6:%.]]
				; CHECK: pred.load.if5:
				; CHECK-NEXT: [[TMP23:%.]] = load i8, i8 [[TMP9]], align 1
				; CHECK-NEXT: [[TMP24:%.*]] = insertelement <4 x i8> [[TMP21]], i8 [[TMP23]], i32 3
				; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE6]]
				; CHECK: pred.load.continue6:
				; CHECK-NEXT: [[TMP25:%.*]] = phi <4 x i8> [ [[TMP21]], [[PRED_LOAD_CONTINUE4]] ], [ [[TMP24]], [[PRED_LOAD_IF5]] ]
				; CHECK-NEXT: [[TMP26:%.*]] = icmp eq <4 x i8> [[TMP25]], zeroinitializer
				; CHECK-NEXT: [[TMP27:%.*]] = select <4 x i1> [[TMP26]], <4 x i8> <i8 2, i8 2, i8 2, i8 2>, <4 x i8> <i8 1, i8 1, i8 1, i8 1>
				; CHECK-NEXT: [[TMP28:%.*]] = extractelement <4 x i1> [[TMP5]], i32 0
				; CHECK-NEXT: br i1 [[TMP28]], label [[PRED_STORE_IF:%.]], label [[PRED_STORE_CONTINUE:%.]]
				; CHECK: pred.store.if:
				; CHECK-NEXT: [[TMP29:%.*]] = extractelement <4 x i8> [[TMP27]], i32 0
				; CHECK-NEXT: store i8 [[TMP29]], i8* [[TMP6]], align 1
				; CHECK-NEXT: br label [[PRED_STORE_CONTINUE]]
				; CHECK: pred.store.continue:
				; CHECK-NEXT: [[TMP30:%.*]] = extractelement <4 x i1> [[TMP5]], i32 1
				; CHECK-NEXT: br i1 [[TMP30]], label [[PRED_STORE_IF7:%.]], label [[PRED_STORE_CONTINUE8:%.]]
				; CHECK: pred.store.if7:
				; CHECK-NEXT: [[TMP31:%.*]] = extractelement <4 x i8> [[TMP27]], i32 1
				; CHECK-NEXT: store i8 [[TMP31]], i8* [[TMP7]], align 1
				; CHECK-NEXT: br label [[PRED_STORE_CONTINUE8]]
				; CHECK: pred.store.continue8:
				; CHECK-NEXT: [[TMP32:%.*]] = extractelement <4 x i1> [[TMP5]], i32 2
				; CHECK-NEXT: br i1 [[TMP32]], label [[PRED_STORE_IF9:%.]], label [[PRED_STORE_CONTINUE10:%.]]
				; CHECK: pred.store.if9:
				; CHECK-NEXT: [[TMP33:%.*]] = extractelement <4 x i8> [[TMP27]], i32 2
				; CHECK-NEXT: store i8 [[TMP33]], i8* [[TMP8]], align 1
				; CHECK-NEXT: br label [[PRED_STORE_CONTINUE10]]
				; CHECK: pred.store.continue10:
				; CHECK-NEXT: [[TMP34:%.*]] = extractelement <4 x i1> [[TMP5]], i32 3
				; CHECK-NEXT: br i1 [[TMP34]], label [[PRED_STORE_IF11:%.*]], label [[PRED_STORE_CONTINUE12]]
				; CHECK: pred.store.if11:
				; CHECK-NEXT: [[TMP35:%.*]] = extractelement <4 x i8> [[TMP27]], i32 3
				; CHECK-NEXT: store i8 [[TMP35]], i8* [[TMP9]], align 1
				; CHECK-NEXT: br label [[PRED_STORE_CONTINUE12]]
				; CHECK: pred.store.continue12:
				; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 4
				; CHECK-NEXT: [[VEC_IND_NEXT]] = add <4 x i32> [[VEC_IND]], <i32 4, i32 4, i32 4, i32 4>
				; CHECK-NEXT: [[TMP36:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]
				; CHECK-NEXT: br i1 [[TMP36]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !prof [[PROF0:![0-9]+]], !llvm.loop [[LOOP1:![0-9]+]]
				; CHECK: middle.block:
				; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]
				; CHECK: scalar.ph:
				; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
				; CHECK-NEXT: br label [[FOR_BODY:%.*]]
				; CHECK: for.body:
				; CHECK-NEXT: [[I_08:%.]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INC:%.]], [[FOR_BODY]] ]
				; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [32 x i8], [32 x i8] @tab, i32 0, i32 [[I_08]]
				; CHECK-NEXT: [[TMP37:%.]] = load i8, i8 [[ARRAYIDX]], align 1
				; CHECK-NEXT: [[CMP1:%.*]] = icmp eq i8 [[TMP37]], 0
				; CHECK-NEXT: [[DOT:%.*]] = select i1 [[CMP1]], i8 2, i8 1
				; CHECK-NEXT: store i8 [[DOT]], i8* [[ARRAYIDX]], align 1
				; CHECK-NEXT: [[INC]] = add nsw i32 [[I_08]], 1
				; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[I_08]], [[BOUND]]
				; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], !prof [[PROF3:![0-9]+]], !llvm.loop [[LOOP4:![0-9]+]]
				; CHECK: for.end:
				; CHECK-NEXT: ret i32 0
				;

	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %for.body, %entry			for.body: ; preds = %for.body, %entry
	%i.08 = phi i32 [ 0, %entry ], [ %inc, %for.body ]			%i.08 = phi i32 [ 0, %entry ], [ %inc, %for.body ]
	%arrayidx = getelementptr inbounds [32 x i8], [32 x i8]* @tab, i32 0, i32 %i.08			%arrayidx = getelementptr inbounds [32 x i8], [32 x i8]* @tab, i32 0, i32 %i.08
	%0 = load i8, i8* %arrayidx, align 1			%0 = load i8, i8* %arrayidx, align 1
	%cmp1 = icmp eq i8 %0, 0			%cmp1 = icmp eq i8 %0, 0
	%. = select i1 %cmp1, i8 2, i8 1			%. = select i1 %cmp1, i8 2, i8 1
	store i8 %., i8* %arrayidx, align 1			store i8 %., i8* %arrayidx, align 1
	%inc = add nsw i32 %i.08, 1			%inc = add nsw i32 %i.08, 1
	%exitcond = icmp eq i32 %i.08, %bound			%exitcond = icmp eq i32 %i.08, %bound
	br i1 %exitcond, label %for.end, label %for.body, !prof !1			br i1 %exitcond, label %for.end, label %for.body, !prof !1

	for.end: ; preds = %for.body			for.end: ; preds = %for.body
	ret i32 0			ret i32 0
	}			}

	define i32 @foo_low_trip_count2(i32 %bound) !prof !0 {			define i32 @foo_low_trip_count2(i32 %bound) !prof !0 {
	; The loop has a same invocation count with the function, but has a low			; The loop has a same invocation count with the function, but has a low
	; trip_count per invocation and not worth to vectorize.			; trip_count per invocation and not worth to vectorize.

	; CHECK-LABEL: @foo_low_trip_count2(			; CHECK-LABEL: @foo_low_trip_count2(
	; CHECK-NOT: <{{[0-9]+}} x i8>			; CHECK-NEXT: entry:
				; CHECK-NEXT: [[TMP0:%.]] = add i32 [[BOUND:%.]], 1
				; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; CHECK: vector.ph:
				; CHECK-NEXT: [[N_RND_UP:%.*]] = add i32 [[TMP0]], 3
				; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i32 [[N_RND_UP]], 4
				; CHECK-NEXT: [[N_VEC:%.*]] = sub i32 [[N_RND_UP]], [[N_MOD_VF]]
				; CHECK-NEXT: [[TRIP_COUNT_MINUS_1:%.*]] = sub i32 [[TMP0]], 1
				; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x i32> poison, i32 [[TRIP_COUNT_MINUS_1]], i32 0
				; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT]], <4 x i32> poison, <4 x i32> zeroinitializer
				; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
				; CHECK: vector.body:
				; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[PRED_STORE_CONTINUE12:%.*]] ]
				; CHECK-NEXT: [[VEC_IND:%.]] = phi <4 x i32> [ <i32 0, i32 1, i32 2, i32 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[PRED_STORE_CONTINUE12]] ]
				; CHECK-NEXT: [[TMP1:%.*]] = add i32 [[INDEX]], 0
				; CHECK-NEXT: [[TMP2:%.*]] = add i32 [[INDEX]], 1
				; CHECK-NEXT: [[TMP3:%.*]] = add i32 [[INDEX]], 2
				; CHECK-NEXT: [[TMP4:%.*]] = add i32 [[INDEX]], 3
				; CHECK-NEXT: [[TMP5:%.*]] = icmp ule <4 x i32> [[VEC_IND]], [[BROADCAST_SPLAT]]
				; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds [32 x i8], [32 x i8] @tab, i32 0, i32 [[TMP1]]
				; CHECK-NEXT: [[TMP7:%.]] = getelementptr inbounds [32 x i8], [32 x i8] @tab, i32 0, i32 [[TMP2]]
				; CHECK-NEXT: [[TMP8:%.]] = getelementptr inbounds [32 x i8], [32 x i8] @tab, i32 0, i32 [[TMP3]]
				; CHECK-NEXT: [[TMP9:%.]] = getelementptr inbounds [32 x i8], [32 x i8] @tab, i32 0, i32 [[TMP4]]
				; CHECK-NEXT: [[TMP10:%.*]] = extractelement <4 x i1> [[TMP5]], i32 0
				; CHECK-NEXT: br i1 [[TMP10]], label [[PRED_LOAD_IF:%.]], label [[PRED_LOAD_CONTINUE:%.]]
				; CHECK: pred.load.if:
				; CHECK-NEXT: [[TMP11:%.]] = load i8, i8 [[TMP6]], align 1
				; CHECK-NEXT: [[TMP12:%.*]] = insertelement <4 x i8> poison, i8 [[TMP11]], i32 0
				; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE]]
				; CHECK: pred.load.continue:
				; CHECK-NEXT: [[TMP13:%.*]] = phi <4 x i8> [ poison, [[VECTOR_BODY]] ], [ [[TMP12]], [[PRED_LOAD_IF]] ]
				; CHECK-NEXT: [[TMP14:%.*]] = extractelement <4 x i1> [[TMP5]], i32 1
				; CHECK-NEXT: br i1 [[TMP14]], label [[PRED_LOAD_IF1:%.]], label [[PRED_LOAD_CONTINUE2:%.]]
				; CHECK: pred.load.if1:
				; CHECK-NEXT: [[TMP15:%.]] = load i8, i8 [[TMP7]], align 1
				; CHECK-NEXT: [[TMP16:%.*]] = insertelement <4 x i8> [[TMP13]], i8 [[TMP15]], i32 1
				; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE2]]
				; CHECK: pred.load.continue2:
				; CHECK-NEXT: [[TMP17:%.*]] = phi <4 x i8> [ [[TMP13]], [[PRED_LOAD_CONTINUE]] ], [ [[TMP16]], [[PRED_LOAD_IF1]] ]
				; CHECK-NEXT: [[TMP18:%.*]] = extractelement <4 x i1> [[TMP5]], i32 2
				; CHECK-NEXT: br i1 [[TMP18]], label [[PRED_LOAD_IF3:%.]], label [[PRED_LOAD_CONTINUE4:%.]]
				; CHECK: pred.load.if3:
				; CHECK-NEXT: [[TMP19:%.]] = load i8, i8 [[TMP8]], align 1
				; CHECK-NEXT: [[TMP20:%.*]] = insertelement <4 x i8> [[TMP17]], i8 [[TMP19]], i32 2
				; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE4]]
				; CHECK: pred.load.continue4:
				; CHECK-NEXT: [[TMP21:%.*]] = phi <4 x i8> [ [[TMP17]], [[PRED_LOAD_CONTINUE2]] ], [ [[TMP20]], [[PRED_LOAD_IF3]] ]
				; CHECK-NEXT: [[TMP22:%.*]] = extractelement <4 x i1> [[TMP5]], i32 3
				; CHECK-NEXT: br i1 [[TMP22]], label [[PRED_LOAD_IF5:%.]], label [[PRED_LOAD_CONTINUE6:%.]]
				; CHECK: pred.load.if5:
				; CHECK-NEXT: [[TMP23:%.]] = load i8, i8 [[TMP9]], align 1
				; CHECK-NEXT: [[TMP24:%.*]] = insertelement <4 x i8> [[TMP21]], i8 [[TMP23]], i32 3
				; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE6]]
				; CHECK: pred.load.continue6:
				; CHECK-NEXT: [[TMP25:%.*]] = phi <4 x i8> [ [[TMP21]], [[PRED_LOAD_CONTINUE4]] ], [ [[TMP24]], [[PRED_LOAD_IF5]] ]
				; CHECK-NEXT: [[TMP26:%.*]] = icmp eq <4 x i8> [[TMP25]], zeroinitializer
				; CHECK-NEXT: [[TMP27:%.*]] = select <4 x i1> [[TMP26]], <4 x i8> <i8 2, i8 2, i8 2, i8 2>, <4 x i8> <i8 1, i8 1, i8 1, i8 1>
				; CHECK-NEXT: [[TMP28:%.*]] = extractelement <4 x i1> [[TMP5]], i32 0
				; CHECK-NEXT: br i1 [[TMP28]], label [[PRED_STORE_IF:%.]], label [[PRED_STORE_CONTINUE:%.]]
				; CHECK: pred.store.if:
				; CHECK-NEXT: [[TMP29:%.*]] = extractelement <4 x i8> [[TMP27]], i32 0
				; CHECK-NEXT: store i8 [[TMP29]], i8* [[TMP6]], align 1
				; CHECK-NEXT: br label [[PRED_STORE_CONTINUE]]
				; CHECK: pred.store.continue:
				; CHECK-NEXT: [[TMP30:%.*]] = extractelement <4 x i1> [[TMP5]], i32 1
				; CHECK-NEXT: br i1 [[TMP30]], label [[PRED_STORE_IF7:%.]], label [[PRED_STORE_CONTINUE8:%.]]
				; CHECK: pred.store.if7:
				; CHECK-NEXT: [[TMP31:%.*]] = extractelement <4 x i8> [[TMP27]], i32 1
				; CHECK-NEXT: store i8 [[TMP31]], i8* [[TMP7]], align 1
				; CHECK-NEXT: br label [[PRED_STORE_CONTINUE8]]
				; CHECK: pred.store.continue8:
				; CHECK-NEXT: [[TMP32:%.*]] = extractelement <4 x i1> [[TMP5]], i32 2
				; CHECK-NEXT: br i1 [[TMP32]], label [[PRED_STORE_IF9:%.]], label [[PRED_STORE_CONTINUE10:%.]]
				; CHECK: pred.store.if9:
				; CHECK-NEXT: [[TMP33:%.*]] = extractelement <4 x i8> [[TMP27]], i32 2
				; CHECK-NEXT: store i8 [[TMP33]], i8* [[TMP8]], align 1
				; CHECK-NEXT: br label [[PRED_STORE_CONTINUE10]]
				; CHECK: pred.store.continue10:
				; CHECK-NEXT: [[TMP34:%.*]] = extractelement <4 x i1> [[TMP5]], i32 3
				; CHECK-NEXT: br i1 [[TMP34]], label [[PRED_STORE_IF11:%.*]], label [[PRED_STORE_CONTINUE12]]
				; CHECK: pred.store.if11:
				; CHECK-NEXT: [[TMP35:%.*]] = extractelement <4 x i8> [[TMP27]], i32 3
				; CHECK-NEXT: store i8 [[TMP35]], i8* [[TMP9]], align 1
				; CHECK-NEXT: br label [[PRED_STORE_CONTINUE12]]
				; CHECK: pred.store.continue12:
				; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 4
				; CHECK-NEXT: [[VEC_IND_NEXT]] = add <4 x i32> [[VEC_IND]], <i32 4, i32 4, i32 4, i32 4>
				; CHECK-NEXT: [[TMP36:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]
				; CHECK-NEXT: br i1 [[TMP36]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !prof [[PROF0]], !llvm.loop [[LOOP7:![0-9]+]]
				; CHECK: middle.block:
				; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]
				; CHECK: scalar.ph:
				; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
				; CHECK-NEXT: br label [[FOR_BODY:%.*]]
				; CHECK: for.body:
				; CHECK-NEXT: [[I_08:%.]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INC:%.]], [[FOR_BODY]] ]
				; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [32 x i8], [32 x i8] @tab, i32 0, i32 [[I_08]]
				; CHECK-NEXT: [[TMP37:%.]] = load i8, i8 [[ARRAYIDX]], align 1
				; CHECK-NEXT: [[CMP1:%.*]] = icmp eq i8 [[TMP37]], 0
				; CHECK-NEXT: [[DOT:%.*]] = select i1 [[CMP1]], i8 2, i8 1
				; CHECK-NEXT: store i8 [[DOT]], i8* [[ARRAYIDX]], align 1
				; CHECK-NEXT: [[INC]] = add nsw i32 [[I_08]], 1
				; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[I_08]], [[BOUND]]
				; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], !prof [[PROF3]], !llvm.loop [[LOOP8:![0-9]+]]
				; CHECK: for.end:
				; CHECK-NEXT: ret i32 0
				;

	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %for.body, %entry			for.body: ; preds = %for.body, %entry
	%i.08 = phi i32 [ 0, %entry ], [ %inc, %for.body ]			%i.08 = phi i32 [ 0, %entry ], [ %inc, %for.body ]
	%arrayidx = getelementptr inbounds [32 x i8], [32 x i8]* @tab, i32 0, i32 %i.08			%arrayidx = getelementptr inbounds [32 x i8], [32 x i8]* @tab, i32 0, i32 %i.08
	%0 = load i8, i8* %arrayidx, align 1			%0 = load i8, i8* %arrayidx, align 1
	%cmp1 = icmp eq i8 %0, 0			%cmp1 = icmp eq i8 %0, 0
	%. = select i1 %cmp1, i8 2, i8 1			%. = select i1 %cmp1, i8 2, i8 1
	store i8 %., i8* %arrayidx, align 1			store i8 %., i8* %arrayidx, align 1
	%inc = add nsw i32 %i.08, 1			%inc = add nsw i32 %i.08, 1
	%exitcond = icmp eq i32 %i.08, %bound			%exitcond = icmp eq i32 %i.08, %bound
	br i1 %exitcond, label %for.end, label %for.body, !prof !1			br i1 %exitcond, label %for.end, label %for.body, !prof !1

	for.end: ; preds = %for.body			for.end: ; preds = %for.body
	ret i32 0			ret i32 0
	}			}

	define i32 @foo_low_trip_count3(i1 %cond, i32 %bound) !prof !0 {			define i32 @foo_low_trip_count3(i1 %cond, i32 %bound) !prof !0 {
	; The loop has low invocation count compare to the function invocation count,			; The loop has low invocation count compare to the function invocation count,
	; but has a high trip count per invocation. Vectorize it.			; but has a high trip count per invocation. Vectorize it.

	; CHECK-LABEL: @foo_low_trip_count3(			; CHECK-LABEL: @foo_low_trip_count3(
	; CHECK: [[VECTOR_BODY:vector\.body]]:			; CHECK-NEXT: entry:
	; CHECK: br i1 [[TMP9:%.]], label [[MIDDLE_BLOCK:%.]], label %[[VECTOR_BODY]], !prof [[LP3:\!.*]],			; CHECK-NEXT: br i1 [[COND:%.]], label [[FOR_PREHEADER:%.]], label [[FOR_END:%.*]], !prof [[PROF9:![0-9]+]]
	; CHECK: [[FOR_BODY:for\.body]]:			; CHECK: for.preheader:
	; CHECK: br i1 [[EXITCOND:%.]], label [[FOR_END_LOOPEXIT:%.]], label %[[FOR_BODY]], !prof [[LP6:\!.*]],			; CHECK-NEXT: [[TMP0:%.]] = add i32 [[BOUND:%.]], 1
				; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i32 [[TMP0]], 4
				; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; CHECK: vector.ph:
				; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i32 [[TMP0]], 4
				; CHECK-NEXT: [[N_VEC:%.*]] = sub i32 [[TMP0]], [[N_MOD_VF]]
				; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
				; CHECK: vector.body:
				; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[TMP1:%.*]] = add i32 [[INDEX]], 0
				; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds [32 x i8], [32 x i8] @tab, i32 0, i32 [[TMP1]]
				; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds i8, i8 [[TMP2]], i32 0
				; CHECK-NEXT: [[TMP4:%.]] = bitcast i8 [[TMP3]] to <4 x i8>*
				; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <4 x i8>, <4 x i8> [[TMP4]], align 1
				; CHECK-NEXT: [[TMP5:%.*]] = icmp eq <4 x i8> [[WIDE_LOAD]], zeroinitializer
				; CHECK-NEXT: [[TMP6:%.*]] = select <4 x i1> [[TMP5]], <4 x i8> <i8 2, i8 2, i8 2, i8 2>, <4 x i8> <i8 1, i8 1, i8 1, i8 1>
				; CHECK-NEXT: [[TMP7:%.]] = bitcast i8 [[TMP3]] to <4 x i8>*
				; CHECK-NEXT: store <4 x i8> [[TMP6]], <4 x i8>* [[TMP7]], align 1
				; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 4
				; CHECK-NEXT: [[TMP8:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]
				; CHECK-NEXT: br i1 [[TMP8]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !prof [[PROF10:![0-9]+]], !llvm.loop [[LOOP11:![0-9]+]]
				; CHECK: middle.block:
				; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 [[TMP0]], [[N_VEC]]
				; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END_LOOPEXIT:%.*]], label [[SCALAR_PH]]
				; CHECK: scalar.ph:
				; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i32 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[FOR_PREHEADER]] ]
				; CHECK-NEXT: br label [[FOR_BODY:%.*]]
				; CHECK: for.body:
				; CHECK-NEXT: [[I_08:%.]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INC:%.]], [[FOR_BODY]] ]
				; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [32 x i8], [32 x i8] @tab, i32 0, i32 [[I_08]]
				; CHECK-NEXT: [[TMP9:%.]] = load i8, i8 [[ARRAYIDX]], align 1
				; CHECK-NEXT: [[CMP1:%.*]] = icmp eq i8 [[TMP9]], 0
				; CHECK-NEXT: [[DOT:%.*]] = select i1 [[CMP1]], i8 2, i8 1
				; CHECK-NEXT: store i8 [[DOT]], i8* [[ARRAYIDX]], align 1
				; CHECK-NEXT: [[INC]] = add nsw i32 [[I_08]], 1
				; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[I_08]], [[BOUND]]
				; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END_LOOPEXIT]], label [[FOR_BODY]], !prof [[PROF12:![0-9]+]], !llvm.loop [[LOOP13:![0-9]+]]
				; CHECK: for.end.loopexit:
				; CHECK-NEXT: br label [[FOR_END]]
				; CHECK: for.end:
				; CHECK-NEXT: ret i32 0
				;
	entry:			entry:
	br i1 %cond, label %for.preheader, label %for.end, !prof !2			br i1 %cond, label %for.preheader, label %for.end, !prof !2

	for.preheader:			for.preheader:
	br label %for.body			br label %for.body

	for.body: ; preds = %for.body, %entry			for.body: ; preds = %for.body, %entry
	%i.08 = phi i32 [ 0, %for.preheader ], [ %inc, %for.body ]			%i.08 = phi i32 [ 0, %for.preheader ], [ %inc, %for.body ]
	%arrayidx = getelementptr inbounds [32 x i8], [32 x i8]* @tab, i32 0, i32 %i.08			%arrayidx = getelementptr inbounds [32 x i8], [32 x i8]* @tab, i32 0, i32 %i.08
	%0 = load i8, i8* %arrayidx, align 1			%0 = load i8, i8* %arrayidx, align 1
	%cmp1 = icmp eq i8 %0, 0			%cmp1 = icmp eq i8 %0, 0
	%. = select i1 %cmp1, i8 2, i8 1			%. = select i1 %cmp1, i8 2, i8 1
	store i8 %., i8* %arrayidx, align 1			store i8 %., i8* %arrayidx, align 1
	%inc = add nsw i32 %i.08, 1			%inc = add nsw i32 %i.08, 1
	%exitcond = icmp eq i32 %i.08, %bound			%exitcond = icmp eq i32 %i.08, %bound
	br i1 %exitcond, label %for.end, label %for.body, !prof !3			br i1 %exitcond, label %for.end, label %for.body, !prof !3

	for.end: ; preds = %for.body			for.end: ; preds = %for.body
	ret i32 0			ret i32 0
	}			}

	define i32 @foo_low_trip_count_icmp_sgt(i32 %bound) {			define i32 @foo_low_trip_count_icmp_sgt(i32 %bound) {
	; Simple loop with low tripcount and inequality test for exit.			; Simple loop with low tripcount and inequality test for exit.
	; Should not be vectorized.			; Should not be vectorized.

	; CHECK-LABEL: @foo_low_trip_count_icmp_sgt(			; CHECK-LABEL: @foo_low_trip_count_icmp_sgt(
	; CHECK-NOT: <{{[0-9]+}} x i8>			; CHECK-NEXT: entry:
				; CHECK-NEXT: [[SMAX:%.]] = call i32 @llvm.smax.i32(i32 [[BOUND:%.]], i32 -1)
				; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[SMAX]], 2
				; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; CHECK: vector.ph:
				; CHECK-NEXT: [[N_RND_UP:%.*]] = add i32 [[TMP0]], 3
				; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i32 [[N_RND_UP]], 4
				; CHECK-NEXT: [[N_VEC:%.*]] = sub i32 [[N_RND_UP]], [[N_MOD_VF]]
				; CHECK-NEXT: [[TRIP_COUNT_MINUS_1:%.*]] = sub i32 [[TMP0]], 1
				; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x i32> poison, i32 [[TRIP_COUNT_MINUS_1]], i32 0
				; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT]], <4 x i32> poison, <4 x i32> zeroinitializer
				; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
				; CHECK: vector.body:
				; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[PRED_STORE_CONTINUE12:%.*]] ]
				; CHECK-NEXT: [[VEC_IND:%.]] = phi <4 x i32> [ <i32 0, i32 1, i32 2, i32 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[PRED_STORE_CONTINUE12]] ]
				; CHECK-NEXT: [[TMP1:%.*]] = add i32 [[INDEX]], 0
				; CHECK-NEXT: [[TMP2:%.*]] = add i32 [[INDEX]], 1
				; CHECK-NEXT: [[TMP3:%.*]] = add i32 [[INDEX]], 2
				; CHECK-NEXT: [[TMP4:%.*]] = add i32 [[INDEX]], 3
				; CHECK-NEXT: [[TMP5:%.*]] = icmp ule <4 x i32> [[VEC_IND]], [[BROADCAST_SPLAT]]
				; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds [32 x i8], [32 x i8] @tab, i32 0, i32 [[TMP1]]
				; CHECK-NEXT: [[TMP7:%.]] = getelementptr inbounds [32 x i8], [32 x i8] @tab, i32 0, i32 [[TMP2]]
				; CHECK-NEXT: [[TMP8:%.]] = getelementptr inbounds [32 x i8], [32 x i8] @tab, i32 0, i32 [[TMP3]]
				; CHECK-NEXT: [[TMP9:%.]] = getelementptr inbounds [32 x i8], [32 x i8] @tab, i32 0, i32 [[TMP4]]
				; CHECK-NEXT: [[TMP10:%.*]] = extractelement <4 x i1> [[TMP5]], i32 0
				; CHECK-NEXT: br i1 [[TMP10]], label [[PRED_LOAD_IF:%.]], label [[PRED_LOAD_CONTINUE:%.]]
				; CHECK: pred.load.if:
				; CHECK-NEXT: [[TMP11:%.]] = load i8, i8 [[TMP6]], align 1
				; CHECK-NEXT: [[TMP12:%.*]] = insertelement <4 x i8> poison, i8 [[TMP11]], i32 0
				; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE]]
				; CHECK: pred.load.continue:
				; CHECK-NEXT: [[TMP13:%.*]] = phi <4 x i8> [ poison, [[VECTOR_BODY]] ], [ [[TMP12]], [[PRED_LOAD_IF]] ]
				; CHECK-NEXT: [[TMP14:%.*]] = extractelement <4 x i1> [[TMP5]], i32 1
				; CHECK-NEXT: br i1 [[TMP14]], label [[PRED_LOAD_IF1:%.]], label [[PRED_LOAD_CONTINUE2:%.]]
				; CHECK: pred.load.if1:
				; CHECK-NEXT: [[TMP15:%.]] = load i8, i8 [[TMP7]], align 1
				; CHECK-NEXT: [[TMP16:%.*]] = insertelement <4 x i8> [[TMP13]], i8 [[TMP15]], i32 1
				; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE2]]
				; CHECK: pred.load.continue2:
				; CHECK-NEXT: [[TMP17:%.*]] = phi <4 x i8> [ [[TMP13]], [[PRED_LOAD_CONTINUE]] ], [ [[TMP16]], [[PRED_LOAD_IF1]] ]
				; CHECK-NEXT: [[TMP18:%.*]] = extractelement <4 x i1> [[TMP5]], i32 2
				; CHECK-NEXT: br i1 [[TMP18]], label [[PRED_LOAD_IF3:%.]], label [[PRED_LOAD_CONTINUE4:%.]]
				; CHECK: pred.load.if3:
				; CHECK-NEXT: [[TMP19:%.]] = load i8, i8 [[TMP8]], align 1
				; CHECK-NEXT: [[TMP20:%.*]] = insertelement <4 x i8> [[TMP17]], i8 [[TMP19]], i32 2
				; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE4]]
				; CHECK: pred.load.continue4:
				; CHECK-NEXT: [[TMP21:%.*]] = phi <4 x i8> [ [[TMP17]], [[PRED_LOAD_CONTINUE2]] ], [ [[TMP20]], [[PRED_LOAD_IF3]] ]
				; CHECK-NEXT: [[TMP22:%.*]] = extractelement <4 x i1> [[TMP5]], i32 3
				; CHECK-NEXT: br i1 [[TMP22]], label [[PRED_LOAD_IF5:%.]], label [[PRED_LOAD_CONTINUE6:%.]]
				; CHECK: pred.load.if5:
				; CHECK-NEXT: [[TMP23:%.]] = load i8, i8 [[TMP9]], align 1
				; CHECK-NEXT: [[TMP24:%.*]] = insertelement <4 x i8> [[TMP21]], i8 [[TMP23]], i32 3
				; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE6]]
				; CHECK: pred.load.continue6:
				; CHECK-NEXT: [[TMP25:%.*]] = phi <4 x i8> [ [[TMP21]], [[PRED_LOAD_CONTINUE4]] ], [ [[TMP24]], [[PRED_LOAD_IF5]] ]
				; CHECK-NEXT: [[TMP26:%.*]] = icmp eq <4 x i8> [[TMP25]], zeroinitializer
				; CHECK-NEXT: [[TMP27:%.*]] = select <4 x i1> [[TMP26]], <4 x i8> <i8 2, i8 2, i8 2, i8 2>, <4 x i8> <i8 1, i8 1, i8 1, i8 1>
				; CHECK-NEXT: [[TMP28:%.*]] = extractelement <4 x i1> [[TMP5]], i32 0
				; CHECK-NEXT: br i1 [[TMP28]], label [[PRED_STORE_IF:%.]], label [[PRED_STORE_CONTINUE:%.]]
				; CHECK: pred.store.if:
				; CHECK-NEXT: [[TMP29:%.*]] = extractelement <4 x i8> [[TMP27]], i32 0
				; CHECK-NEXT: store i8 [[TMP29]], i8* [[TMP6]], align 1
				; CHECK-NEXT: br label [[PRED_STORE_CONTINUE]]
				; CHECK: pred.store.continue:
				; CHECK-NEXT: [[TMP30:%.*]] = extractelement <4 x i1> [[TMP5]], i32 1
				; CHECK-NEXT: br i1 [[TMP30]], label [[PRED_STORE_IF7:%.]], label [[PRED_STORE_CONTINUE8:%.]]
				; CHECK: pred.store.if7:
				; CHECK-NEXT: [[TMP31:%.*]] = extractelement <4 x i8> [[TMP27]], i32 1
				; CHECK-NEXT: store i8 [[TMP31]], i8* [[TMP7]], align 1
				; CHECK-NEXT: br label [[PRED_STORE_CONTINUE8]]
				; CHECK: pred.store.continue8:
				; CHECK-NEXT: [[TMP32:%.*]] = extractelement <4 x i1> [[TMP5]], i32 2
				; CHECK-NEXT: br i1 [[TMP32]], label [[PRED_STORE_IF9:%.]], label [[PRED_STORE_CONTINUE10:%.]]
				; CHECK: pred.store.if9:
				; CHECK-NEXT: [[TMP33:%.*]] = extractelement <4 x i8> [[TMP27]], i32 2
				; CHECK-NEXT: store i8 [[TMP33]], i8* [[TMP8]], align 1
				; CHECK-NEXT: br label [[PRED_STORE_CONTINUE10]]
				; CHECK: pred.store.continue10:
				; CHECK-NEXT: [[TMP34:%.*]] = extractelement <4 x i1> [[TMP5]], i32 3
				; CHECK-NEXT: br i1 [[TMP34]], label [[PRED_STORE_IF11:%.*]], label [[PRED_STORE_CONTINUE12]]
				; CHECK: pred.store.if11:
				; CHECK-NEXT: [[TMP35:%.*]] = extractelement <4 x i8> [[TMP27]], i32 3
				; CHECK-NEXT: store i8 [[TMP35]], i8* [[TMP9]], align 1
				; CHECK-NEXT: br label [[PRED_STORE_CONTINUE12]]
				; CHECK: pred.store.continue12:
				; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 4
				; CHECK-NEXT: [[VEC_IND_NEXT]] = add <4 x i32> [[VEC_IND]], <i32 4, i32 4, i32 4, i32 4>
				; CHECK-NEXT: [[TMP36:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]
				; CHECK-NEXT: br i1 [[TMP36]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !prof [[PROF0]], !llvm.loop [[LOOP14:![0-9]+]]
				; CHECK: middle.block:
				; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]
				; CHECK: scalar.ph:
				; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
				; CHECK-NEXT: br label [[FOR_BODY:%.*]]
				; CHECK: for.body:
				; CHECK-NEXT: [[I_08:%.]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INC:%.]], [[FOR_BODY]] ]
				; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [32 x i8], [32 x i8] @tab, i32 0, i32 [[I_08]]
				; CHECK-NEXT: [[TMP37:%.]] = load i8, i8 [[ARRAYIDX]], align 1
				; CHECK-NEXT: [[CMP1:%.*]] = icmp eq i8 [[TMP37]], 0
				; CHECK-NEXT: [[DOT:%.*]] = select i1 [[CMP1]], i8 2, i8 1
				; CHECK-NEXT: store i8 [[DOT]], i8* [[ARRAYIDX]], align 1
				; CHECK-NEXT: [[INC]] = add nsw i32 [[I_08]], 1
				; CHECK-NEXT: [[EXITCOND:%.*]] = icmp sgt i32 [[I_08]], [[BOUND]]
				; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], !prof [[PROF3]], !llvm.loop [[LOOP15:![0-9]+]]
				; CHECK: for.end:
				; CHECK-NEXT: ret i32 0
				;

	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %for.body, %entry			for.body: ; preds = %for.body, %entry
	%i.08 = phi i32 [ 0, %entry ], [ %inc, %for.body ]			%i.08 = phi i32 [ 0, %entry ], [ %inc, %for.body ]
	%arrayidx = getelementptr inbounds [32 x i8], [32 x i8]* @tab, i32 0, i32 %i.08			%arrayidx = getelementptr inbounds [32 x i8], [32 x i8]* @tab, i32 0, i32 %i.08
	%0 = load i8, i8* %arrayidx, align 1			%0 = load i8, i8* %arrayidx, align 1
	%cmp1 = icmp eq i8 %0, 0			%cmp1 = icmp eq i8 %0, 0
	%. = select i1 %cmp1, i8 2, i8 1			%. = select i1 %cmp1, i8 2, i8 1
	store i8 %., i8* %arrayidx, align 1			store i8 %., i8* %arrayidx, align 1
	%inc = add nsw i32 %i.08, 1			%inc = add nsw i32 %i.08, 1
	%exitcond = icmp sgt i32 %i.08, %bound			%exitcond = icmp sgt i32 %i.08, %bound
	br i1 %exitcond, label %for.end, label %for.body, !prof !1			br i1 %exitcond, label %for.end, label %for.body, !prof !1

	for.end: ; preds = %for.body			for.end: ; preds = %for.body
	ret i32 0			ret i32 0
	}			}

	define i32 @const_low_trip_count() {			define i32 @const_low_trip_count() {
	; Simple loop with constant, small trip count and no profiling info.			; Simple loop with constant, small trip count and no profiling info.
				; CHECK-LABEL: @const_low_trip_count(
	; CHECK-LABEL: @const_low_trip_count			; CHECK-NEXT: entry:
	; CHECK-NOT: <{{[0-9]+}} x i8>			; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; CHECK: vector.ph:
				; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
				; CHECK: vector.body:
				; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[PRED_STORE_CONTINUE12:%.*]] ]
				; CHECK-NEXT: [[VEC_IND:%.]] = phi <4 x i32> [ <i32 0, i32 1, i32 2, i32 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[PRED_STORE_CONTINUE12]] ]
				; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[INDEX]], 0
				; CHECK-NEXT: [[TMP1:%.*]] = add i32 [[INDEX]], 1
				; CHECK-NEXT: [[TMP2:%.*]] = add i32 [[INDEX]], 2
				; CHECK-NEXT: [[TMP3:%.*]] = add i32 [[INDEX]], 3
				; CHECK-NEXT: [[TMP4:%.*]] = icmp ule <4 x i32> [[VEC_IND]], <i32 2, i32 2, i32 2, i32 2>
				; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds [32 x i8], [32 x i8] @tab, i32 0, i32 [[TMP0]]
				; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds [32 x i8], [32 x i8] @tab, i32 0, i32 [[TMP1]]
				; CHECK-NEXT: [[TMP7:%.]] = getelementptr inbounds [32 x i8], [32 x i8] @tab, i32 0, i32 [[TMP2]]
				; CHECK-NEXT: [[TMP8:%.]] = getelementptr inbounds [32 x i8], [32 x i8] @tab, i32 0, i32 [[TMP3]]
				; CHECK-NEXT: [[TMP9:%.*]] = extractelement <4 x i1> [[TMP4]], i32 0
				; CHECK-NEXT: br i1 [[TMP9]], label [[PRED_LOAD_IF:%.]], label [[PRED_LOAD_CONTINUE:%.]]
				; CHECK: pred.load.if:
				; CHECK-NEXT: [[TMP10:%.]] = load i8, i8 [[TMP5]], align 1
				; CHECK-NEXT: [[TMP11:%.*]] = insertelement <4 x i8> poison, i8 [[TMP10]], i32 0
				; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE]]
				; CHECK: pred.load.continue:
				; CHECK-NEXT: [[TMP12:%.*]] = phi <4 x i8> [ poison, [[VECTOR_BODY]] ], [ [[TMP11]], [[PRED_LOAD_IF]] ]
				; CHECK-NEXT: [[TMP13:%.*]] = extractelement <4 x i1> [[TMP4]], i32 1
				; CHECK-NEXT: br i1 [[TMP13]], label [[PRED_LOAD_IF1:%.]], label [[PRED_LOAD_CONTINUE2:%.]]
				; CHECK: pred.load.if1:
				; CHECK-NEXT: [[TMP14:%.]] = load i8, i8 [[TMP6]], align 1
				; CHECK-NEXT: [[TMP15:%.*]] = insertelement <4 x i8> [[TMP12]], i8 [[TMP14]], i32 1
				; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE2]]
				; CHECK: pred.load.continue2:
				; CHECK-NEXT: [[TMP16:%.*]] = phi <4 x i8> [ [[TMP12]], [[PRED_LOAD_CONTINUE]] ], [ [[TMP15]], [[PRED_LOAD_IF1]] ]
				; CHECK-NEXT: [[TMP17:%.*]] = extractelement <4 x i1> [[TMP4]], i32 2
				; CHECK-NEXT: br i1 [[TMP17]], label [[PRED_LOAD_IF3:%.]], label [[PRED_LOAD_CONTINUE4:%.]]
				; CHECK: pred.load.if3:
				; CHECK-NEXT: [[TMP18:%.]] = load i8, i8 [[TMP7]], align 1
				; CHECK-NEXT: [[TMP19:%.*]] = insertelement <4 x i8> [[TMP16]], i8 [[TMP18]], i32 2
				; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE4]]
				; CHECK: pred.load.continue4:
				; CHECK-NEXT: [[TMP20:%.*]] = phi <4 x i8> [ [[TMP16]], [[PRED_LOAD_CONTINUE2]] ], [ [[TMP19]], [[PRED_LOAD_IF3]] ]
				; CHECK-NEXT: [[TMP21:%.*]] = extractelement <4 x i1> [[TMP4]], i32 3
				; CHECK-NEXT: br i1 [[TMP21]], label [[PRED_LOAD_IF5:%.]], label [[PRED_LOAD_CONTINUE6:%.]]
				; CHECK: pred.load.if5:
				; CHECK-NEXT: [[TMP22:%.]] = load i8, i8 [[TMP8]], align 1
				; CHECK-NEXT: [[TMP23:%.*]] = insertelement <4 x i8> [[TMP20]], i8 [[TMP22]], i32 3
				; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE6]]
				; CHECK: pred.load.continue6:
				; CHECK-NEXT: [[TMP24:%.*]] = phi <4 x i8> [ [[TMP20]], [[PRED_LOAD_CONTINUE4]] ], [ [[TMP23]], [[PRED_LOAD_IF5]] ]
				; CHECK-NEXT: [[TMP25:%.*]] = icmp eq <4 x i8> [[TMP24]], zeroinitializer
				; CHECK-NEXT: [[TMP26:%.*]] = select <4 x i1> [[TMP25]], <4 x i8> <i8 2, i8 2, i8 2, i8 2>, <4 x i8> <i8 1, i8 1, i8 1, i8 1>
				; CHECK-NEXT: [[TMP27:%.*]] = extractelement <4 x i1> [[TMP4]], i32 0
				; CHECK-NEXT: br i1 [[TMP27]], label [[PRED_STORE_IF:%.]], label [[PRED_STORE_CONTINUE:%.]]
				; CHECK: pred.store.if:
				; CHECK-NEXT: [[TMP28:%.*]] = extractelement <4 x i8> [[TMP26]], i32 0
				; CHECK-NEXT: store i8 [[TMP28]], i8* [[TMP5]], align 1
				; CHECK-NEXT: br label [[PRED_STORE_CONTINUE]]
				; CHECK: pred.store.continue:
				; CHECK-NEXT: [[TMP29:%.*]] = extractelement <4 x i1> [[TMP4]], i32 1
				; CHECK-NEXT: br i1 [[TMP29]], label [[PRED_STORE_IF7:%.]], label [[PRED_STORE_CONTINUE8:%.]]
				; CHECK: pred.store.if7:
				; CHECK-NEXT: [[TMP30:%.*]] = extractelement <4 x i8> [[TMP26]], i32 1
				; CHECK-NEXT: store i8 [[TMP30]], i8* [[TMP6]], align 1
				; CHECK-NEXT: br label [[PRED_STORE_CONTINUE8]]
				; CHECK: pred.store.continue8:
				; CHECK-NEXT: [[TMP31:%.*]] = extractelement <4 x i1> [[TMP4]], i32 2
				; CHECK-NEXT: br i1 [[TMP31]], label [[PRED_STORE_IF9:%.]], label [[PRED_STORE_CONTINUE10:%.]]
				; CHECK: pred.store.if9:
				; CHECK-NEXT: [[TMP32:%.*]] = extractelement <4 x i8> [[TMP26]], i32 2
				; CHECK-NEXT: store i8 [[TMP32]], i8* [[TMP7]], align 1
				; CHECK-NEXT: br label [[PRED_STORE_CONTINUE10]]
				; CHECK: pred.store.continue10:
				; CHECK-NEXT: [[TMP33:%.*]] = extractelement <4 x i1> [[TMP4]], i32 3
				; CHECK-NEXT: br i1 [[TMP33]], label [[PRED_STORE_IF11:%.*]], label [[PRED_STORE_CONTINUE12]]
				; CHECK: pred.store.if11:
				; CHECK-NEXT: [[TMP34:%.*]] = extractelement <4 x i8> [[TMP26]], i32 3
				; CHECK-NEXT: store i8 [[TMP34]], i8* [[TMP8]], align 1
				; CHECK-NEXT: br label [[PRED_STORE_CONTINUE12]]
				; CHECK: pred.store.continue12:
				; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 4
				; CHECK-NEXT: [[VEC_IND_NEXT]] = add <4 x i32> [[VEC_IND]], <i32 4, i32 4, i32 4, i32 4>
				; CHECK-NEXT: [[TMP35:%.*]] = icmp eq i32 [[INDEX_NEXT]], 4
				; CHECK-NEXT: br i1 [[TMP35]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP16:![0-9]+]]
				; CHECK: middle.block:
				; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]
				; CHECK: scalar.ph:
				; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ 4, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
				; CHECK-NEXT: br label [[FOR_BODY:%.*]]
				; CHECK: for.body:
				; CHECK-NEXT: [[I_08:%.]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INC:%.]], [[FOR_BODY]] ]
				; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [32 x i8], [32 x i8] @tab, i32 0, i32 [[I_08]]
				; CHECK-NEXT: [[TMP36:%.]] = load i8, i8 [[ARRAYIDX]], align 1
				; CHECK-NEXT: [[CMP1:%.*]] = icmp eq i8 [[TMP36]], 0
				; CHECK-NEXT: [[DOT:%.*]] = select i1 [[CMP1]], i8 2, i8 1
				; CHECK-NEXT: store i8 [[DOT]], i8* [[ARRAYIDX]], align 1
				; CHECK-NEXT: [[INC]] = add nsw i32 [[I_08]], 1
				; CHECK-NEXT: [[EXITCOND:%.*]] = icmp slt i32 [[I_08]], 2
				; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_BODY]], label [[FOR_END]], !llvm.loop [[LOOP17:![0-9]+]]
				; CHECK: for.end:
				; CHECK-NEXT: ret i32 0
				;

	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %for.body, %entry			for.body: ; preds = %for.body, %entry
	%i.08 = phi i32 [ 0, %entry ], [ %inc, %for.body ]			%i.08 = phi i32 [ 0, %entry ], [ %inc, %for.body ]
	%arrayidx = getelementptr inbounds [32 x i8], [32 x i8]* @tab, i32 0, i32 %i.08			%arrayidx = getelementptr inbounds [32 x i8], [32 x i8]* @tab, i32 0, i32 %i.08
	%0 = load i8, i8* %arrayidx, align 1			%0 = load i8, i8* %arrayidx, align 1
	%cmp1 = icmp eq i8 %0, 0			%cmp1 = icmp eq i8 %0, 0
	%. = select i1 %cmp1, i8 2, i8 1			%. = select i1 %cmp1, i8 2, i8 1
	store i8 %., i8* %arrayidx, align 1			store i8 %., i8* %arrayidx, align 1
	%inc = add nsw i32 %i.08, 1			%inc = add nsw i32 %i.08, 1
	%exitcond = icmp slt i32 %i.08, 2			%exitcond = icmp slt i32 %i.08, 2
	br i1 %exitcond, label %for.body, label %for.end			br i1 %exitcond, label %for.body, label %for.end

	for.end: ; preds = %for.body			for.end: ; preds = %for.body
	ret i32 0			ret i32 0
	}			}

	define i32 @const_large_trip_count() {			define i32 @const_large_trip_count() {
	; Simple loop with constant large trip count and no profiling info.			; Simple loop with constant large trip count and no profiling info.
				; CHECK-LABEL: @const_large_trip_count(
	; CHECK-LABEL: @const_large_trip_count			; CHECK-NEXT: entry:
	; CHECK: <{{[0-9]+}} x i8>			; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; CHECK: vector.ph:
				; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
				; CHECK: vector.body:
				; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[INDEX]], 0
				; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds [32 x i8], [32 x i8] @tab, i32 0, i32 [[TMP0]]
				; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i8, i8 [[TMP1]], i32 0
				; CHECK-NEXT: [[TMP3:%.]] = bitcast i8 [[TMP2]] to <4 x i8>*
				; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <4 x i8>, <4 x i8> [[TMP3]], align 1
				; CHECK-NEXT: [[TMP4:%.*]] = icmp eq <4 x i8> [[WIDE_LOAD]], zeroinitializer
				; CHECK-NEXT: [[TMP5:%.*]] = select <4 x i1> [[TMP4]], <4 x i8> <i8 2, i8 2, i8 2, i8 2>, <4 x i8> <i8 1, i8 1, i8 1, i8 1>
				; CHECK-NEXT: [[TMP6:%.]] = bitcast i8 [[TMP2]] to <4 x i8>*
				; CHECK-NEXT: store <4 x i8> [[TMP5]], <4 x i8>* [[TMP6]], align 1
				; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 4
				; CHECK-NEXT: [[TMP7:%.*]] = icmp eq i32 [[INDEX_NEXT]], 1000
				; CHECK-NEXT: br i1 [[TMP7]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP18:![0-9]+]]
				; CHECK: middle.block:
				; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 1001, 1000
				; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
				; CHECK: scalar.ph:
				; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ 1000, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
				; CHECK-NEXT: br label [[FOR_BODY:%.*]]
				; CHECK: for.body:
				; CHECK-NEXT: [[I_08:%.]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INC:%.]], [[FOR_BODY]] ]
				; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [32 x i8], [32 x i8] @tab, i32 0, i32 [[I_08]]
				; CHECK-NEXT: [[TMP8:%.]] = load i8, i8 [[ARRAYIDX]], align 1
				; CHECK-NEXT: [[CMP1:%.*]] = icmp eq i8 [[TMP8]], 0
				; CHECK-NEXT: [[DOT:%.*]] = select i1 [[CMP1]], i8 2, i8 1
				; CHECK-NEXT: store i8 [[DOT]], i8* [[ARRAYIDX]], align 1
				; CHECK-NEXT: [[INC]] = add nsw i32 [[I_08]], 1
				; CHECK-NEXT: [[EXITCOND:%.*]] = icmp slt i32 [[I_08]], 1000
				; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_BODY]], label [[FOR_END]], !llvm.loop [[LOOP19:![0-9]+]]
				; CHECK: for.end:
				; CHECK-NEXT: ret i32 0
				;

	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %for.body, %entry			for.body: ; preds = %for.body, %entry
	%i.08 = phi i32 [ 0, %entry ], [ %inc, %for.body ]			%i.08 = phi i32 [ 0, %entry ], [ %inc, %for.body ]
	%arrayidx = getelementptr inbounds [32 x i8], [32 x i8]* @tab, i32 0, i32 %i.08			%arrayidx = getelementptr inbounds [32 x i8], [32 x i8]* @tab, i32 0, i32 %i.08
	%0 = load i8, i8* %arrayidx, align 1			%0 = load i8, i8* %arrayidx, align 1
	%cmp1 = icmp eq i8 %0, 0			%cmp1 = icmp eq i8 %0, 0
	%. = select i1 %cmp1, i8 2, i8 1			%. = select i1 %cmp1, i8 2, i8 1
	store i8 %., i8* %arrayidx, align 1			store i8 %., i8* %arrayidx, align 1
	%inc = add nsw i32 %i.08, 1			%inc = add nsw i32 %i.08, 1
	%exitcond = icmp slt i32 %i.08, 1000			%exitcond = icmp slt i32 %i.08, 1000
	br i1 %exitcond, label %for.body, label %for.end			br i1 %exitcond, label %for.body, label %for.end

	for.end: ; preds = %for.body			for.end: ; preds = %for.body
	ret i32 0			ret i32 0
	}			}

	define i32 @const_small_trip_count_step() {			define i32 @const_small_trip_count_step() {
	; Simple loop with static, small trip count and no profiling info.			; Simple loop with static, small trip count and no profiling info.
				; CHECK-LABEL: @const_small_trip_count_step(
	; CHECK-LABEL: @const_small_trip_count_step			; CHECK-NEXT: entry:
	; CHECK-NOT: <{{[0-9]+}} x i8>			; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; CHECK: vector.ph:
				; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
				; CHECK: vector.body:
				; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[PRED_STORE_CONTINUE12:%.*]] ]
				; CHECK-NEXT: [[OFFSET_IDX:%.*]] = mul i32 [[INDEX]], 5
				; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[OFFSET_IDX]], 0
				; CHECK-NEXT: [[TMP1:%.*]] = add i32 [[OFFSET_IDX]], 5
				; CHECK-NEXT: [[TMP2:%.*]] = add i32 [[OFFSET_IDX]], 10
				; CHECK-NEXT: [[TMP3:%.*]] = add i32 [[OFFSET_IDX]], 15
				; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x i32> poison, i32 [[INDEX]], i32 0
				; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT]], <4 x i32> poison, <4 x i32> zeroinitializer
				; CHECK-NEXT: [[VEC_IV:%.*]] = add <4 x i32> [[BROADCAST_SPLAT]], <i32 0, i32 1, i32 2, i32 3>
				; CHECK-NEXT: [[TMP4:%.*]] = icmp ule <4 x i32> [[VEC_IV]], <i32 2, i32 2, i32 2, i32 2>
				; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds [32 x i8], [32 x i8] @tab, i32 0, i32 [[TMP0]]
				; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds [32 x i8], [32 x i8] @tab, i32 0, i32 [[TMP1]]
				; CHECK-NEXT: [[TMP7:%.]] = getelementptr inbounds [32 x i8], [32 x i8] @tab, i32 0, i32 [[TMP2]]
				; CHECK-NEXT: [[TMP8:%.]] = getelementptr inbounds [32 x i8], [32 x i8] @tab, i32 0, i32 [[TMP3]]
				; CHECK-NEXT: [[TMP9:%.*]] = extractelement <4 x i1> [[TMP4]], i32 0
				; CHECK-NEXT: br i1 [[TMP9]], label [[PRED_LOAD_IF:%.]], label [[PRED_LOAD_CONTINUE:%.]]
				; CHECK: pred.load.if:
				; CHECK-NEXT: [[TMP10:%.]] = load i8, i8 [[TMP5]], align 1
				; CHECK-NEXT: [[TMP11:%.*]] = insertelement <4 x i8> poison, i8 [[TMP10]], i32 0
				; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE]]
				; CHECK: pred.load.continue:
				; CHECK-NEXT: [[TMP12:%.*]] = phi <4 x i8> [ poison, [[VECTOR_BODY]] ], [ [[TMP11]], [[PRED_LOAD_IF]] ]
				; CHECK-NEXT: [[TMP13:%.*]] = extractelement <4 x i1> [[TMP4]], i32 1
				; CHECK-NEXT: br i1 [[TMP13]], label [[PRED_LOAD_IF1:%.]], label [[PRED_LOAD_CONTINUE2:%.]]
				; CHECK: pred.load.if1:
				; CHECK-NEXT: [[TMP14:%.]] = load i8, i8 [[TMP6]], align 1
				; CHECK-NEXT: [[TMP15:%.*]] = insertelement <4 x i8> [[TMP12]], i8 [[TMP14]], i32 1
				; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE2]]
				; CHECK: pred.load.continue2:
				; CHECK-NEXT: [[TMP16:%.*]] = phi <4 x i8> [ [[TMP12]], [[PRED_LOAD_CONTINUE]] ], [ [[TMP15]], [[PRED_LOAD_IF1]] ]
				; CHECK-NEXT: [[TMP17:%.*]] = extractelement <4 x i1> [[TMP4]], i32 2
				; CHECK-NEXT: br i1 [[TMP17]], label [[PRED_LOAD_IF3:%.]], label [[PRED_LOAD_CONTINUE4:%.]]
				; CHECK: pred.load.if3:
				; CHECK-NEXT: [[TMP18:%.]] = load i8, i8 [[TMP7]], align 1
				; CHECK-NEXT: [[TMP19:%.*]] = insertelement <4 x i8> [[TMP16]], i8 [[TMP18]], i32 2
				; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE4]]
				; CHECK: pred.load.continue4:
				; CHECK-NEXT: [[TMP20:%.*]] = phi <4 x i8> [ [[TMP16]], [[PRED_LOAD_CONTINUE2]] ], [ [[TMP19]], [[PRED_LOAD_IF3]] ]
				; CHECK-NEXT: [[TMP21:%.*]] = extractelement <4 x i1> [[TMP4]], i32 3
				; CHECK-NEXT: br i1 [[TMP21]], label [[PRED_LOAD_IF5:%.]], label [[PRED_LOAD_CONTINUE6:%.]]
				; CHECK: pred.load.if5:
				; CHECK-NEXT: [[TMP22:%.]] = load i8, i8 [[TMP8]], align 1
				; CHECK-NEXT: [[TMP23:%.*]] = insertelement <4 x i8> [[TMP20]], i8 [[TMP22]], i32 3
				; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE6]]
				; CHECK: pred.load.continue6:
				; CHECK-NEXT: [[TMP24:%.*]] = phi <4 x i8> [ [[TMP20]], [[PRED_LOAD_CONTINUE4]] ], [ [[TMP23]], [[PRED_LOAD_IF5]] ]
				; CHECK-NEXT: [[TMP25:%.*]] = icmp eq <4 x i8> [[TMP24]], zeroinitializer
				; CHECK-NEXT: [[TMP26:%.*]] = select <4 x i1> [[TMP25]], <4 x i8> <i8 2, i8 2, i8 2, i8 2>, <4 x i8> <i8 1, i8 1, i8 1, i8 1>
				; CHECK-NEXT: [[TMP27:%.*]] = extractelement <4 x i1> [[TMP4]], i32 0
				; CHECK-NEXT: br i1 [[TMP27]], label [[PRED_STORE_IF:%.]], label [[PRED_STORE_CONTINUE:%.]]
				; CHECK: pred.store.if:
				; CHECK-NEXT: [[TMP28:%.*]] = extractelement <4 x i8> [[TMP26]], i32 0
				; CHECK-NEXT: store i8 [[TMP28]], i8* [[TMP5]], align 1
				; CHECK-NEXT: br label [[PRED_STORE_CONTINUE]]
				; CHECK: pred.store.continue:
				; CHECK-NEXT: [[TMP29:%.*]] = extractelement <4 x i1> [[TMP4]], i32 1
				; CHECK-NEXT: br i1 [[TMP29]], label [[PRED_STORE_IF7:%.]], label [[PRED_STORE_CONTINUE8:%.]]
				; CHECK: pred.store.if7:
				; CHECK-NEXT: [[TMP30:%.*]] = extractelement <4 x i8> [[TMP26]], i32 1
				; CHECK-NEXT: store i8 [[TMP30]], i8* [[TMP6]], align 1
				; CHECK-NEXT: br label [[PRED_STORE_CONTINUE8]]
				; CHECK: pred.store.continue8:
				; CHECK-NEXT: [[TMP31:%.*]] = extractelement <4 x i1> [[TMP4]], i32 2
				; CHECK-NEXT: br i1 [[TMP31]], label [[PRED_STORE_IF9:%.]], label [[PRED_STORE_CONTINUE10:%.]]
				; CHECK: pred.store.if9:
				; CHECK-NEXT: [[TMP32:%.*]] = extractelement <4 x i8> [[TMP26]], i32 2
				; CHECK-NEXT: store i8 [[TMP32]], i8* [[TMP7]], align 1
				; CHECK-NEXT: br label [[PRED_STORE_CONTINUE10]]
				; CHECK: pred.store.continue10:
				; CHECK-NEXT: [[TMP33:%.*]] = extractelement <4 x i1> [[TMP4]], i32 3
				; CHECK-NEXT: br i1 [[TMP33]], label [[PRED_STORE_IF11:%.*]], label [[PRED_STORE_CONTINUE12]]
				; CHECK: pred.store.if11:
				; CHECK-NEXT: [[TMP34:%.*]] = extractelement <4 x i8> [[TMP26]], i32 3
				; CHECK-NEXT: store i8 [[TMP34]], i8* [[TMP8]], align 1
				; CHECK-NEXT: br label [[PRED_STORE_CONTINUE12]]
				; CHECK: pred.store.continue12:
				; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 4
				; CHECK-NEXT: [[TMP35:%.*]] = icmp eq i32 [[INDEX_NEXT]], 4
				; CHECK-NEXT: br i1 [[TMP35]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP20:![0-9]+]]
				; CHECK: middle.block:
				; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]
				; CHECK: scalar.ph:
				; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ 20, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
				; CHECK-NEXT: br label [[FOR_BODY:%.*]]
				; CHECK: for.body:
				; CHECK-NEXT: [[I_08:%.]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INC:%.]], [[FOR_BODY]] ]
				; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [32 x i8], [32 x i8] @tab, i32 0, i32 [[I_08]]
				; CHECK-NEXT: [[TMP36:%.]] = load i8, i8 [[ARRAYIDX]], align 1
				; CHECK-NEXT: [[CMP1:%.*]] = icmp eq i8 [[TMP36]], 0
				; CHECK-NEXT: [[DOT:%.*]] = select i1 [[CMP1]], i8 2, i8 1
				; CHECK-NEXT: store i8 [[DOT]], i8* [[ARRAYIDX]], align 1
				; CHECK-NEXT: [[INC]] = add nsw i32 [[I_08]], 5
				; CHECK-NEXT: [[EXITCOND:%.*]] = icmp slt i32 [[I_08]], 10
				; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_BODY]], label [[FOR_END]], !llvm.loop [[LOOP21:![0-9]+]]
				; CHECK: for.end:
				; CHECK-NEXT: ret i32 0
				;

	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %for.body, %entry			for.body: ; preds = %for.body, %entry
	%i.08 = phi i32 [ 0, %entry ], [ %inc, %for.body ]			%i.08 = phi i32 [ 0, %entry ], [ %inc, %for.body ]
	%arrayidx = getelementptr inbounds [32 x i8], [32 x i8]* @tab, i32 0, i32 %i.08			%arrayidx = getelementptr inbounds [32 x i8], [32 x i8]* @tab, i32 0, i32 %i.08
	%0 = load i8, i8* %arrayidx, align 1			%0 = load i8, i8* %arrayidx, align 1
	%cmp1 = icmp eq i8 %0, 0			%cmp1 = icmp eq i8 %0, 0
	%. = select i1 %cmp1, i8 2, i8 1			%. = select i1 %cmp1, i8 2, i8 1
	store i8 %., i8* %arrayidx, align 1			store i8 %., i8* %arrayidx, align 1
	%inc = add nsw i32 %i.08, 5			%inc = add nsw i32 %i.08, 5
	%exitcond = icmp slt i32 %i.08, 10			%exitcond = icmp slt i32 %i.08, 10
	br i1 %exitcond, label %for.body, label %for.end			br i1 %exitcond, label %for.body, label %for.end

	for.end: ; preds = %for.body			for.end: ; preds = %for.body
	ret i32 0			ret i32 0
	}			}

	define i32 @const_trip_over_profile() {			define i32 @const_trip_over_profile() {
	; constant trip count takes precedence over profile data			; constant trip count takes precedence over profile data
				; CHECK-LABEL: @const_trip_over_profile(
	; CHECK-LABEL: @const_trip_over_profile			; CHECK-NEXT: entry:
	; CHECK: <{{[0-9]+}} x i8>			; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; CHECK: vector.ph:
				; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
				; CHECK: vector.body:
				; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[INDEX]], 0
				; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds [32 x i8], [32 x i8] @tab, i32 0, i32 [[TMP0]]
				; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i8, i8 [[TMP1]], i32 0
				; CHECK-NEXT: [[TMP3:%.]] = bitcast i8 [[TMP2]] to <4 x i8>*
				; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <4 x i8>, <4 x i8> [[TMP3]], align 1
				; CHECK-NEXT: [[TMP4:%.*]] = icmp eq <4 x i8> [[WIDE_LOAD]], zeroinitializer
				; CHECK-NEXT: [[TMP5:%.*]] = select <4 x i1> [[TMP4]], <4 x i8> <i8 2, i8 2, i8 2, i8 2>, <4 x i8> <i8 1, i8 1, i8 1, i8 1>
				; CHECK-NEXT: [[TMP6:%.]] = bitcast i8 [[TMP2]] to <4 x i8>*
				; CHECK-NEXT: store <4 x i8> [[TMP5]], <4 x i8>* [[TMP6]], align 1
				; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 4
				; CHECK-NEXT: [[TMP7:%.*]] = icmp eq i32 [[INDEX_NEXT]], 1000
				; CHECK-NEXT: br i1 [[TMP7]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP22:![0-9]+]]
				; CHECK: middle.block:
				; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 1001, 1000
				; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
				; CHECK: scalar.ph:
				; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ 1000, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
				; CHECK-NEXT: br label [[FOR_BODY:%.*]]
				; CHECK: for.body:
				; CHECK-NEXT: [[I_08:%.]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INC:%.]], [[FOR_BODY]] ]
				; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [32 x i8], [32 x i8] @tab, i32 0, i32 [[I_08]]
				; CHECK-NEXT: [[TMP8:%.]] = load i8, i8 [[ARRAYIDX]], align 1
				; CHECK-NEXT: [[CMP1:%.*]] = icmp eq i8 [[TMP8]], 0
				; CHECK-NEXT: [[DOT:%.*]] = select i1 [[CMP1]], i8 2, i8 1
				; CHECK-NEXT: store i8 [[DOT]], i8* [[ARRAYIDX]], align 1
				; CHECK-NEXT: [[INC]] = add nsw i32 [[I_08]], 1
				; CHECK-NEXT: [[EXITCOND:%.*]] = icmp slt i32 [[I_08]], 1000
				; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_BODY]], label [[FOR_END]], !prof [[PROF3]], !llvm.loop [[LOOP23:![0-9]+]]
				; CHECK: for.end:
				; CHECK-NEXT: ret i32 0
				;

	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %for.body, %entry			for.body: ; preds = %for.body, %entry
	%i.08 = phi i32 [ 0, %entry ], [ %inc, %for.body ]			%i.08 = phi i32 [ 0, %entry ], [ %inc, %for.body ]
	%arrayidx = getelementptr inbounds [32 x i8], [32 x i8]* @tab, i32 0, i32 %i.08			%arrayidx = getelementptr inbounds [32 x i8], [32 x i8]* @tab, i32 0, i32 %i.08
	%0 = load i8, i8* %arrayidx, align 1			%0 = load i8, i8* %arrayidx, align 1
	%cmp1 = icmp eq i8 %0, 0			%cmp1 = icmp eq i8 %0, 0
	%. = select i1 %cmp1, i8 2, i8 1			%. = select i1 %cmp1, i8 2, i8 1
	store i8 %., i8* %arrayidx, align 1			store i8 %., i8* %arrayidx, align 1
	%inc = add nsw i32 %i.08, 1			%inc = add nsw i32 %i.08, 1
	%exitcond = icmp slt i32 %i.08, 1000			%exitcond = icmp slt i32 %i.08, 1000
	br i1 %exitcond, label %for.body, label %for.end, !prof !1			br i1 %exitcond, label %for.body, label %for.end, !prof !1

	for.end: ; preds = %for.body			for.end: ; preds = %for.body
	ret i32 0			ret i32 0
	}			}

	; CHECK: [[LP3]] = !{!"branch_weights", i32 10, i32 2490}
	; CHECK: [[LP6]] = !{!"branch_weights", i32 10, i32 0}
	; original loop has latchExitWeight=10 and backedgeTakenWeight=10,000,			; original loop has latchExitWeight=10 and backedgeTakenWeight=10,000,
	; therefore estimatedBackedgeTakenCount=1,000 and estimatedTripCount=1,001.			; therefore estimatedBackedgeTakenCount=1,000 and estimatedTripCount=1,001.
	; Vectorizing by 4 produces estimatedTripCounts of 1,001/4=250 and 1,001%4=1			; Vectorizing by 4 produces estimatedTripCounts of 1,001/4=250 and 1,001%4=1
	; for vectorized and remainder loops, respectively, therefore their			; for vectorized and remainder loops, respectively, therefore their
	; estimatedBackedgeTakenCounts are 249 and 0, and so the weights recorded with			; estimatedBackedgeTakenCounts are 249 and 0, and so the weights recorded with
	; loop invocation weights of 10 are the above {10, 2490} and {10, 0}.			; loop invocation weights of 10 are the above {10, 2490} and {10, 0}.

	!0 = !{!"function_entry_count", i64 100}			!0 = !{!"function_entry_count", i64 100}
	!1 = !{!"branch_weights", i32 100, i32 0}			!1 = !{!"branch_weights", i32 100, i32 0}
	!2 = !{!"branch_weights", i32 10, i32 90}			!2 = !{!"branch_weights", i32 10, i32 90}
	!3 = !{!"branch_weights", i32 10, i32 10000}			!3 = !{!"branch_weights", i32 10, i32 10000}

llvm/test/Transforms/LoopVectorize/vplan-sink-scalars-and-merge.ll

	Show First 20 Lines • Show All 696 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: }			; CHECK-NEXT: }
	; CHECK-NEXT: Successor(s): loop.3			; CHECK-NEXT: Successor(s): loop.3
	; CHECK-EMPTY:			; CHECK-EMPTY:
	; CHECK-NEXT: loop.3:			; CHECK-NEXT: loop.3:
	; CHECK-NEXT: WIDEN ir<%c.0> = icmp ir<%iv>, ir<%j>			; CHECK-NEXT: WIDEN ir<%c.0> = icmp ir<%iv>, ir<%j>
	; CHECK-NEXT: Successor(s): then.0			; CHECK-NEXT: Successor(s): then.0
	; CHECK-EMPTY:			; CHECK-EMPTY:
	; CHECK-NEXT: then.0:			; CHECK-NEXT: then.0:
	; CHECK-NEXT: WIDEN ir<%mul> = mul vp<[[PRED1]]>, vp<[[PRED2]]>
	; CHECK-NEXT: EMIT vp<[[MASK2:%.+]]> = select vp<[[MASK]]> ir<%c.0> ir<false>			; CHECK-NEXT: EMIT vp<[[MASK2:%.+]]> = select vp<[[MASK]]> ir<%c.0> ir<false>
	; CHECK-NEXT: Successor(s): pred.store			; CHECK-NEXT: Successor(s): pred.store
	; CHECK-EMPTY:			; CHECK-EMPTY:
	; CHECK-NEXT: <xVFxUF> pred.store: {			; CHECK-NEXT: <xVFxUF> pred.store: {
	; CHECK-NEXT: pred.store.entry:			; CHECK-NEXT: pred.store.entry:
	; CHECK-NEXT: BRANCH-ON-MASK vp<[[MASK2]]>			; CHECK-NEXT: BRANCH-ON-MASK vp<[[MASK2]]>
	; CHECK-NEXT: Successor(s): pred.store.if, pred.store.continue			; CHECK-NEXT: Successor(s): pred.store.if, pred.store.continue
	; CHECK-NEXT: CondBit: vp<[[MASK2]]> (then.0)			; CHECK-NEXT: CondBit: vp<[[MASK2]]> (then.0)
	; CHECK-EMPTY:			; CHECK-EMPTY:
	; CHECK-NEXT: pred.store.if:			; CHECK-NEXT: pred.store.if:
				; CHECK-NEXT: REPLICATE ir<%mul> = mul vp<[[PRED1]]>, vp<[[PRED2]]>
	; CHECK-NEXT: REPLICATE ir<%gep.c.1> = getelementptr ir<@c>, ir<0>, ir<%iv>			; CHECK-NEXT: REPLICATE ir<%gep.c.1> = getelementptr ir<@c>, ir<0>, ir<%iv>
	; CHECK-NEXT: REPLICATE store ir<%mul>, ir<%gep.c.1>			; CHECK-NEXT: REPLICATE store ir<%mul>, ir<%gep.c.1>
	; CHECK-NEXT: Successor(s): pred.store.continue			; CHECK-NEXT: Successor(s): pred.store.continue
	; CHECK-EMPTY:			; CHECK-EMPTY:
	; CHECK-NEXT: pred.store.continue:			; CHECK-NEXT: pred.store.continue:
	; CHECK-NEXT: No successors			; CHECK-NEXT: No successors
	; CHECK-NEXT: }			; CHECK-NEXT: }
	; CHECK-NEXT: Successor(s): then.0.0			; CHECK-NEXT: Successor(s): then.0.0
	; CHECK-EMPTY:			; CHECK-EMPTY:
	; CHECK-NEXT: then.0.0:			; CHECK-NEXT: then.0.0:
	; CHECK-NEXT: Successor(s): latch			; CHECK-NEXT: Successor(s): latch
	; CHECK-EMPTY:			; CHECK-EMPTY:
	; CHECK-NEXT: latch:			; CHECK-NEXT: latch:
	; CHECK-NEXT: CLONE ir<%large> = icmp ir<%iv>, ir<8>			; CHECK-NEXT: CLONE ir<%large> = icmp ir<%iv>, ir<8>
	; CHECK-NEXT: CLONE ir<%exitcond> = icmp ir<%iv>, ir<%k>			; CHECK-NEXT: CLONE ir<%exitcond> = icmp ir<%iv>, ir<%k>
	; CHECK-NEXT: EMIT vp<[[CAN_IV_NEXT:%.+]]> = VF * UF + vp<[[CAN_IV]]>			; CHECK-NEXT: EMIT vp<[[CAN_IV_NEXT:%.+]]> = VF * UF + vp<[[CAN_IV]]>
	; CHECK-NEXT: EMIT branch-on-count vp<[[CAN_IV_NEXT]]> vp<[[VEC_TC]]>			; CHECK-NEXT: EMIT branch-on-count vp<[[CAN_IV_NEXT]]> vp<[[VEC_TC]]>
	; CHECK-NEXT: No successors			; CHECK-NEXT: No successors
	; CHECK-NEXT: }			; CHECK-NEXT: }
				; CHECK-NEXT: No successors
				; CHECK-NEXT: }
	;			;
	entry:			entry:
	br label %loop			br label %loop

	loop:			loop:
	%iv = phi i32 [ 0, %entry ], [ %iv.next, %latch ]			%iv = phi i32 [ 0, %entry ], [ %iv.next, %latch ]
	%gep.a = getelementptr inbounds [2048 x i32], [2048 x i32]* @a, i32 0, i32 %iv			%gep.a = getelementptr inbounds [2048 x i32], [2048 x i32]* @a, i32 0, i32 %iv
	%lv.a = load i32, i32* %gep.a, align 4			%lv.a = load i32, i32* %gep.a, align 4
	▲ Show 20 Lines • Show All 320 Lines • Show Last 20 Lines