This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
1
InstructionCombining.cpp
-
test/Transforms/
-
Transforms/
-
InstCombine/
-
gep-combine-loop-invariant.ll
-
gep-custom-dl.ll
-
getelementptr.ll
-
select-gep.ll
-
shift.ll
-
LoopVectorize/
-
AArch64/
-
sve-vector-reverse.ll
-
vector-reverse-mask4.ll
-
ARM/
-
mve-reduction-predselect.ll
-
mve-reductions.ll
-
X86/
-
x86-interleaved-accesses-masked-group.ll
-
consecutive-ptr-uniforms.ll
-
interleaved-accesses.ll

Differential D106450

[InstCombine] Fold (gep (oneuse(gep Ptr, Idx0)), Idx1) -> (gep Ptr, (add Idx0, Idx1)) (PR51069)
ClosedPublic

Authored by RKSimon on Jul 21 2021, 8:18 AM.

Download Raw Diff

Details

Reviewers

reames
spatel
lebedev.ri

Commits

rG10c982e0b3e6: Revert rG1c9bec727ab5c53fa060560dc8d346a911142170 : [InstCombine] Fold (gep…
rG1c9bec727ab5: [InstCombine] Fold (gep (oneuse(gep Ptr, Idx0)), Idx1) -> (gep Ptr, (add Idx0…

Summary

As noticed on D106352, after we've folded "(select C, (gep Ptr, Idx), Ptr) -> (gep Ptr, (select C, Idx, 0))" if the inner Ptr was also a (now one use) gep we could then merge the geps, using the sum of the indices instead.

I've limited this to basic 2-op geps - a more general case further down InstCombinerImpl.visitGetElementPtrInst doesn't have the one-use limitation but only creates a add if it can be created via SimplifyAddInst.

https://alive2.llvm.org/ce/z/f8pLfD (Thanks Roman!)

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

RKSimon created this revision.Jul 21 2021, 8:18 AM

Herald added subscribers: dmgreen, hiraditya. · View Herald TranscriptJul 21 2021, 8:18 AM

RKSimon requested review of this revision.Jul 21 2021, 8:18 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 21 2021, 8:18 AM

lebedev.ri added inline comments.Jul 21 2021, 8:33 AM

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
2134	I believe this `add` is `nsw` iff `NewGEP` is `inbounds`. alive2 timeouts/memouts even with 10min/64gb limits, but that means it fails to disprove it, which is a good sign.

Set nsw if the new gep is inbounds

LG, thank you.

This revision is now accepted and ready to land.Jul 21 2021, 9:04 AM

Harbormaster completed remote builds in B115342: Diff 360492.Jul 21 2021, 11:02 AM

This revision was landed with ongoing or failed builds.Jul 22 2021, 3:16 AM

Closed by commit rG1c9bec727ab5: [InstCombine] Fold (gep (oneuse(gep Ptr, Idx0)), Idx1) -> (gep Ptr, (add Idx0… (authored by RKSimon). · Explain Why

This revision was automatically updated to reflect the committed changes.

RKSimon added a commit: rG1c9bec727ab5: [InstCombine] Fold (gep (oneuse(gep Ptr, Idx0)), Idx1) -> (gep Ptr, (add Idx0….

FWIW i think the obvious generalization here is that only the outer GEP needs to have two operands,
we just need to fold it's index operand into the last operand of inner one-use GEP:
https://alive2.llvm.org/ce/z/UTBPnG

Or the other way around: the previous GEP could have only two operands,
and then we just need to fold it into the first index:
https://alive2.llvm.org/ce/z/xWVW8J

But i guess these aren't really interesting for the select-of-gep's, outside of the bigger pattern at least.

So i think this should be something like

if (Src->hasOneUse() &&
    GEP.getOperand(1)->getType() ==
        Src->getOperand(Src->getNumOperands() - 1)->getType()) {
  // Fold (gep(gep(Ptr,Idx0),Idx1) -> gep(Ptr,add(Idx0,Idx1))
  bool NewInBounds = isMergedGEPInBounds(*Src, *cast<GEPOperator>(&GEP));
  SmallVector<Value *, 8> Indices;
  Indices.reserve(Src->getNumIndices() + GEP.getNumIndices() - 1);
  append_range(Indices, Src->indices());
  Indices.back() = Builder.CreateAdd(
      *GEP.idx_begin(), Indices.back(), GEP.getName() + ".idx",
      /*HasNUW*/ false, /*HasNSW*/ NewInBounds);
  append_range(Indices, make_range(GEP.idx_begin() + 1, GEP.idx_end()));
  auto *NewGEP = GetElementPtrInst::Create(
      Src->getSourceElementType(), Src->getPointerOperand(), Indices);
  NewGEP->setIsInBounds(NewInBounds);
  return NewGEP;
}

but i don't think i will pursue it further than this snippet.

Cheers - I'll take a look at adding the fold

This change is causing performance degradation on one of our important benchmarks. Here is an example to show the potential reason:

@block = global i32 zeroinitializer, align 4

define void @foo(i32* %j, i32 %k) {
entry:
  br label %for.body

for.body:
  %i.01 = phi i32 [ 0, %entry ], [ %inc, %for.body ]
  %invariant0 = getelementptr i32, i32* @block, i32 %k
  %x = getelementptr i32, i32* %invariant0, i32 %i.01
  store i32 2, i32* %x, align 4
  %invariant1 = getelementptr i32, i32* @block, i32 %k
  %0 = load i32, i32* %j
  %y = getelementptr i32, i32* %invariant1, i32 %0
  store i32 1, i32* %y, align 4
  %inc = add nsw i32 %i.01, 1
  %cmp = icmp slt i32 %inc, 100
  br i1 %cmp, label %for.body, label %for.end

for.end:
  ret void
}

Without this change, %invariant0 and %invariant1 are CSE. And %invariant0 is LICM out of the loop.

bash-5.0$ opt t.ll -S -early-cse -licm
@block = global i32 0, align 4

define void @foo(i32* %j, i32 %k) {
entry:
  %invariant0 = getelementptr i32, i32* @block, i32 %k
  br label %for.body

for.body:                                         ; preds = %for.body, %entry
  %i.01 = phi i32 [ 0, %entry ], [ %inc, %for.body ]
  %x = getelementptr i32, i32* %invariant0, i32 %i.01
  store i32 2, i32* %x, align 4
  %0 = load i32, i32* %j, align 4
  %y = getelementptr i32, i32* %invariant0, i32 %0
  store i32 1, i32* %y, align 4
  %inc = add nsw i32 %i.01, 1
  %cmp = icmp slt i32 %inc, 100
  br i1 %cmp, label %for.body, label %for.end

for.end:                                          ; preds = %for.body
  ret void
}

After this change, %invariant0 and %invariant1 are inst-combined and turn into add instructions. There is no longer invariance instruction to LICM or common subexpressions to eliminate.

bash-5.0$ opt t.ll -S -instcombine -early-cse -licm
@block = global i32 0, align 4

define void @foo(i32* %j, i32 %k) {
entry:
  %0 = sext i32 %k to i64
  br label %for.body

for.body:                                         ; preds = %for.body, %entry
  %i.01 = phi i32 [ 0, %entry ], [ %inc, %for.body ]
  %1 = zext i32 %i.01 to i64
  %x.idx = add nsw i64 %1, %0
  %x = getelementptr i32, i32* @block, i64 %x.idx
  store i32 2, i32* %x, align 4
  %2 = load i32, i32* %j, align 4
  %3 = sext i32 %2 to i64
  %y.idx = add nsw i64 %3, %0
  %y = getelementptr i32, i32* @block, i64 %y.idx
  store i32 1, i32* %y, align 4
  %inc = add nuw nsw i32 %i.01, 1
  %cmp = icmp ult i32 %i.01, 99
  br i1 %cmp, label %for.body, label %for.end

for.end:                                          ; preds = %for.body
  ret void
}

The problem is more significant when the instructions are in a deeply nested loop, or the loop has a lot of iterations.

Can we either revert this change or guard it under an option and disable by default, until there is a solution?

In D106450#2914386, @Whitney wrote:

This change is causing performance degradation on one of our important benchmarks. Here is an example to show the potential reason:

@block = global i32 zeroinitializer, align 4

define void @foo(i32* %j, i32 %k) {
entry:
  br label %for.body

for.body:
  %i.01 = phi i32 [ 0, %entry ], [ %inc, %for.body ]
  %invariant0 = getelementptr i32, i32* @block, i32 %k
  %x = getelementptr i32, i32* %invariant0, i32 %i.01
  store i32 2, i32* %x, align 4
  %invariant1 = getelementptr i32, i32* @block, i32 %k
  %0 = load i32, i32* %j
  %y = getelementptr i32, i32* %invariant1, i32 %0
  store i32 1, i32* %y, align 4
  %inc = add nsw i32 %i.01, 1
  %cmp = icmp slt i32 %inc, 100
  br i1 %cmp, label %for.body, label %for.end

for.end:
  ret void
}

Without this change, %invariant0 and %invariant1 are CSE. And %invariant0 is LICM out of the loop.

bash-5.0$ opt t.ll -S -early-cse -licm
@block = global i32 0, align 4

define void @foo(i32* %j, i32 %k) {
entry:
  %invariant0 = getelementptr i32, i32* @block, i32 %k
  br label %for.body

for.body:                                         ; preds = %for.body, %entry
  %i.01 = phi i32 [ 0, %entry ], [ %inc, %for.body ]
  %x = getelementptr i32, i32* %invariant0, i32 %i.01
  store i32 2, i32* %x, align 4
  %0 = load i32, i32* %j, align 4
  %y = getelementptr i32, i32* %invariant0, i32 %0
  store i32 1, i32* %y, align 4
  %inc = add nsw i32 %i.01, 1
  %cmp = icmp slt i32 %inc, 100
  br i1 %cmp, label %for.body, label %for.end

for.end:                                          ; preds = %for.body
  ret void
}

After this change, %invariant0 and %invariant1 are inst-combined and turn into add instructions. There is no longer invariance instruction to LICM or common subexpressions to eliminate.

bash-5.0$ opt t.ll -S -instcombine -early-cse -licm
@block = global i32 0, align 4

define void @foo(i32* %j, i32 %k) {
entry:
  %0 = sext i32 %k to i64
  br label %for.body

for.body:                                         ; preds = %for.body, %entry
  %i.01 = phi i32 [ 0, %entry ], [ %inc, %for.body ]
  %1 = zext i32 %i.01 to i64
  %x.idx = add nsw i64 %1, %0
  %x = getelementptr i32, i32* @block, i64 %x.idx
  store i32 2, i32* %x, align 4
  %2 = load i32, i32* %j, align 4
  %3 = sext i32 %2 to i64
  %y.idx = add nsw i64 %3, %0
  %y = getelementptr i32, i32* @block, i64 %y.idx
  store i32 1, i32* %y, align 4
  %inc = add nuw nsw i32 %i.01, 1
  %cmp = icmp ult i32 %i.01, 99
  br i1 %cmp, label %for.body, label %for.end

for.end:                                          ; preds = %for.body
  ret void
}

The problem is more significant when the instructions are in a deeply nested loop, or the loop has a lot of iterations.

Can we either revert this change or guard it under an option and disable by default, until there is a solution?

Generally, i do wonder if we are missing something like SeparateConstOffsetFromGEPPass from the pipelin.
But here, can you please provide an end-to-end test?
As far as i can see we always run EarlyCSE before InstCombine.

@Whitney which target is this?

bmahjour added a subscriber: bmahjour.Jul 29 2021, 12:31 PM

In D106450#2914397, @RKSimon wrote:

@Whitney which target is this?

I was testing on a Power AIX machine.

@Whitney we need the end-to-end test case to have a better understanding of where your original IR is appearing and why cse isn't solving this - running it through -O3 seems to be fine: https://c.godbolt.org/z/jWYcra73x (vs just -instcombine) which suggests a phase ordering issue is at play

In D106450#2916523, @RKSimon wrote:

@Whitney we need the end-to-end test case to have a better understanding of where your original IR is appearing and why cse isn't solving this - running it through -O3 seems to be fine: https://c.godbolt.org/z/jWYcra73x (vs just -instcombine) which suggests a phase ordering issue is at play

We are working on another reduced LLVM IR test case that shows the problem with -O3. Unfortunately the important benchmark is huge and we are not allowed to share.

@RKSimon we've discovered that this commit is causing substantial performance regressions (~11% runtime) of some of our benchmarks on different x86 microarchitectures (haswell, skylake). @wmi is trying to get an isolated test.

In D106450#2952679, @alexfh wrote:

@RKSimon we've discovered that this commit is causing substantial performance regressions (~11% runtime) of some of our benchmarks on different x86 microarchitectures (haswell, skylake). @wmi is trying to get an isolated test.

OK - do you know offhand if its another CSE / loop invariant screwup?

I suspect while we want to apply D107935, it's only half of the story - what if LICM didn't run yet?
We need some kind of an undo transform. (and no, SeparateConstOffsetFromGEP doesn't help, i tried)

Got a testcase and put it here: https://bugs.llvm.org/show_bug.cgi?id=51540

Given that we have a test case and multiple people have reported regressions, can we revert in the meantime?

In D106450#2955083, @aeubanks wrote:

Given that we have a test case and multiple people have reported regressions, can we revert in the meantime?

Especially if there's no obvious and clear forward fix, I'd appreciate if you could unblock us by reverting for now. Thanks!

mnadeem added a subscriber: mnadeem.Aug 19 2021, 5:29 PM

In D106450#2956001, @alexfh wrote:

In D106450#2955083, @aeubanks wrote:

Given that we have a test case and multiple people have reported regressions, can we revert in the meantime?

Especially if there's no obvious and clear forward fix, I'd appreciate if you could unblock us by reverting for now. Thanks!

Except that regresses other benchmarks that I was addressing with this patch - bullet etc.

In D106450#2956763, @RKSimon wrote:

In D106450#2956001, @alexfh wrote:

In D106450#2955083, @aeubanks wrote:

Given that we have a test case and multiple people have reported regressions, can we revert in the meantime?

Especially if there's no obvious and clear forward fix, I'd appreciate if you could unblock us by reverting for now. Thanks!

Except that regresses other benchmarks that I was addressing with this patch - bullet etc.

Is the regression fixed by this patch more serious than the one introduced by it? Is it clear on how to fix the new regression?

In D106450#2960400, @alexfh wrote:

In D106450#2956763, @RKSimon wrote:

In D106450#2956001, @alexfh wrote:

In D106450#2955083, @aeubanks wrote:

Given that we have a test case and multiple people have reported regressions, can we revert in the meantime?

Especially if there's no obvious and clear forward fix, I'd appreciate if you could unblock us by reverting for now. Thanks!

Except that regresses other benchmarks that I was addressing with this patch - bullet etc.

Is the regression fixed by this patch more serious than the one introduced by it? Is it clear on how to fix the new regression?

I believe Bullet > “two internal *micro*benchmarks”

I'm going to be on PTO again soon so I'm going to revert this change to unblock you guys, I'll revisit this when I get back - for x86 at least I believe we'll be able to handle it with basic CMOV handling inside X86DAGToDAGISel::matchAddressRecursively.

But for @Whitney's AIX case I'm still not sure.

In D106450#2960467, @xbolva00 wrote:

In D106450#2960400, @alexfh wrote:

In D106450#2956763, @RKSimon wrote:

In D106450#2956001, @alexfh wrote:

In D106450#2955083, @aeubanks wrote:

Given that we have a test case and multiple people have reported regressions, can we revert in the meantime?

Especially if there's no obvious and clear forward fix, I'd appreciate if you could unblock us by reverting for now. Thanks!

Except that regresses other benchmarks that I was addressing with this patch - bullet etc.

Is the regression fixed by this patch more serious than the one introduced by it? Is it clear on how to fix the new regression?

I believe Bullet > “two internal *micro*benchmarks”

FWIW it's not a micro benchmark per se. It's basically compression/decompression algorithms around this code base: https://github.com/google/snappy, the particular data corpus is internal, but it was significant enough and widespread across a number of other benchmarks as well. In addition it seemed to exist across multiple microarchitectures (and we've seen it on Power above and I'm going to hypothesize we'd see it on ARM). Simon's comments feel right around where the problem is likely coming from as well.

Thanks for more details, yes - snappy is important.

RKSimon added a commit: rG10c982e0b3e6: Revert rG1c9bec727ab5c53fa060560dc8d346a911142170 : [InstCombine] Fold (gep….Aug 23 2021, 1:09 PM

RKSimon added a reverting change: rG10c982e0b3e6: Revert rG1c9bec727ab5c53fa060560dc8d346a911142170 : [InstCombine] Fold (gep….

Please can somebody confirm that rG10c982e0b3e6d46d1fe288d7dbe0a393c65a640f reverts the regressions you were seeing

In D106450#2960937, @RKSimon wrote:

Please can somebody confirm that rG10c982e0b3e6d46d1fe288d7dbe0a393c65a640f reverts the regressions you were seeing

Thanks! We'll get performance testing results tomorrow.

In D106450#2960937, @RKSimon wrote:

Please can somebody confirm that rG10c982e0b3e6d46d1fe288d7dbe0a393c65a640f reverts the regressions you were seeing

Confirming that performance regressions are gone after rG10c982e0b3e6d46d1fe288d7dbe0a393c65a640f. Thanks again!

lebedev.ri mentioned this in D125845: [InstCombine] Canonicalize GEP of GEP by swapping constant-indexed GEP to the back.Nov 28 2022, 10:52 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

InstCombine/

InstructionCombining.cpp

104 lines

test/

Transforms/

InstCombine/

gep-combine-loop-invariant.ll

12 lines

4 lines

4 lines

12 lines

4 lines

LoopVectorize/

AArch64/

sve-vector-reverse.ll

100 lines

vector-reverse-mask4.ll

54 lines

ARM/

mve-reduction-predselect.ll

40 lines

mve-reductions.ll

26 lines

X86/

x86-interleaved-accesses-masked-group.ll

60 lines

consecutive-ptr-uniforms.ll

4 lines

interleaved-accesses.ll

62 lines

Diff 360752

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp

Show First 20 Lines • Show All 2,070 Lines • ▼ Show 20 Lines	Instruction *InstCombinerImpl::visitGetElementPtrInst(GetElementPtrInst &GEP) {

// Combine Indices - If the source pointer to this getelementptr instruction		// Combine Indices - If the source pointer to this getelementptr instruction
// is a getelementptr instruction, combine the indices of the two		// is a getelementptr instruction, combine the indices of the two
// getelementptr instructions into a single instruction.		// getelementptr instructions into a single instruction.
if (auto *Src = dyn_cast<GEPOperator>(PtrOp)) {		if (auto *Src = dyn_cast<GEPOperator>(PtrOp)) {
if (!shouldMergeGEPs(cast<GEPOperator>(&GEP), Src))		if (!shouldMergeGEPs(cast<GEPOperator>(&GEP), Src))
return nullptr;		return nullptr;

// Try to reassociate loop invariant GEP chains to enable LICM.		if (Src->getNumOperands() == 2 && GEP.getNumOperands() == 2 &&
if (LI && Src->getNumOperands() == 2 && GEP.getNumOperands() == 2 &&
Src->hasOneUse()) {		Src->hasOneUse()) {
if (Loop *L = LI->getLoopFor(GEP.getParent())) {
Value *GO1 = GEP.getOperand(1);		Value *GO1 = GEP.getOperand(1);
Value *SO1 = Src->getOperand(1);		Value *SO1 = Src->getOperand(1);

		if (LI) {
		// Try to reassociate loop invariant GEP chains to enable LICM.
		if (Loop *L = LI->getLoopFor(GEP.getParent())) {
// Reassociate the two GEPs if SO1 is variant in the loop and GO1 is		// Reassociate the two GEPs if SO1 is variant in the loop and GO1 is
// invariant: this breaks the dependence between GEPs and allows LICM		// invariant: this breaks the dependence between GEPs and allows LICM
// to hoist the invariant part out of the loop.		// to hoist the invariant part out of the loop.
if (L->isLoopInvariant(GO1) && !L->isLoopInvariant(SO1)) {		if (L->isLoopInvariant(GO1) && !L->isLoopInvariant(SO1)) {
// We have to be careful here.		// We have to be careful here.
// We have something like:		// We have something like:
// %src = getelementptr <ty>, <ty>* %base, <ty> %idx		// %src = getelementptr <ty>, <ty>* %base, <ty> %idx
// %gep = getelementptr <ty>, <ty>* %src, <ty> %idx2		// %gep = getelementptr <ty>, <ty>* %src, <ty> %idx2
// If we just swap idx & idx2 then we could inadvertantly		// If we just swap idx & idx2 then we could inadvertantly
// change %src from a vector to a scalar, or vice versa.		// change %src from a vector to a scalar, or vice versa.
// Cases:		// Cases:
// 1) %base a scalar & idx a scalar & idx2 a vector		// 1) %base a scalar & idx a scalar & idx2 a vector
// => Swapping idx & idx2 turns %src into a vector type.		// => Swapping idx & idx2 turns %src into a vector type.
// 2) %base a scalar & idx a vector & idx2 a scalar		// 2) %base a scalar & idx a vector & idx2 a scalar
// => Swapping idx & idx2 turns %src in a scalar type		// => Swapping idx & idx2 turns %src in a scalar type
// 3) %base, %idx, and %idx2 are scalars		// 3) %base, %idx, and %idx2 are scalars
// => %src & %gep are scalars		// => %src & %gep are scalars
// => swapping idx & idx2 is safe		// => swapping idx & idx2 is safe
// 4) %base a vector		// 4) %base a vector
// => %src is a vector		// => %src is a vector
// => swapping idx & idx2 is safe.		// => swapping idx & idx2 is safe.
auto *SO0 = Src->getOperand(0);		auto *SO0 = Src->getOperand(0);
auto *SO0Ty = SO0->getType();		auto *SO0Ty = SO0->getType();
if (!isa<VectorType>(GEPType) \|\| // case 3		if (!isa<VectorType>(GEPType) \|\| // case 3
isa<VectorType>(SO0Ty)) { // case 4		isa<VectorType>(SO0Ty)) { // case 4
Src->setOperand(1, GO1);		Src->setOperand(1, GO1);
GEP.setOperand(1, SO1);		GEP.setOperand(1, SO1);
return &GEP;		return &GEP;
} else {		} else {
// Case 1 or 2		// Case 1 or 2
// -- have to recreate %src & %gep		// -- have to recreate %src & %gep
// put NewSrc at same location as %src		// put NewSrc at same location as %src
Builder.SetInsertPoint(cast<Instruction>(PtrOp));		Builder.SetInsertPoint(cast<Instruction>(PtrOp));
auto *NewSrc = cast<GetElementPtrInst>(		auto *NewSrc = cast<GetElementPtrInst>(
Builder.CreateGEP(GEPEltType, SO0, GO1, Src->getName()));		Builder.CreateGEP(GEPEltType, SO0, GO1, Src->getName()));
NewSrc->setIsInBounds(Src->isInBounds());		NewSrc->setIsInBounds(Src->isInBounds());
auto *NewGEP = GetElementPtrInst::Create(GEPEltType, NewSrc, {SO1});		auto *NewGEP =
		GetElementPtrInst::Create(GEPEltType, NewSrc, {SO1});
NewGEP->setIsInBounds(GEP.isInBounds());		NewGEP->setIsInBounds(GEP.isInBounds());
return NewGEP;		return NewGEP;
}		}
}		}
}		}
}		}

		// Fold (gep(gep(Ptr,Idx0),Idx1) -> gep(Ptr,add(Idx0,Idx1))
		if (GO1->getType() == SO1->getType()) {
		bool NewInBounds = GEP.isInBounds() && Src->isInBounds();
		lebedev.riUnsubmitted Not Done Reply Inline Actions I believe this `add` is `nsw` iff `NewGEP` is `inbounds`. alive2 timeouts/memouts even with 10min/64gb limits, but that means it fails to disprove it, which is a good sign. lebedev.ri: I believe this `add` is `nsw` iff `NewGEP` is `inbounds`. alive2 timeouts/memouts even with…
		auto *NewIdx =
		Builder.CreateAdd(GO1, SO1, GEP.getName() + ".idx",
		/HasNUW/ false, /HasNSW/ NewInBounds);
		auto *NewGEP = GetElementPtrInst::Create(
		GEPEltType, Src->getPointerOperand(), {NewIdx});
		NewGEP->setIsInBounds(NewInBounds);
		return NewGEP;
		}
		}

// Note that if our source is a gep chain itself then we wait for that		// Note that if our source is a gep chain itself then we wait for that
// chain to be resolved before we perform this transformation. This		// chain to be resolved before we perform this transformation. This
// avoids us creating a TON of code in some cases.		// avoids us creating a TON of code in some cases.
if (auto *SrcGEP = dyn_cast<GEPOperator>(Src->getOperand(0)))		if (auto *SrcGEP = dyn_cast<GEPOperator>(Src->getOperand(0)))
if (SrcGEP->getNumOperands() == 2 && shouldMergeGEPs(Src, SrcGEP))		if (SrcGEP->getNumOperands() == 2 && shouldMergeGEPs(Src, SrcGEP))
return nullptr; // Wait until our source is folded to completion.		return nullptr; // Wait until our source is folded to completion.

SmallVector<Value*, 8> Indices;		SmallVector<Value*, 8> Indices;
▲ Show 20 Lines • Show All 2,137 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/gep-combine-loop-invariant.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -instcombine -S -enable-new-pm=0 \| FileCheck %s			; RUN: opt < %s -instcombine -S -enable-new-pm=0 \| FileCheck %s
	; RUN: opt < %s -passes='require<loops>,instcombine' -S \| FileCheck %s			; RUN: opt < %s -passes='require<loops>,instcombine' -S \| FileCheck %s
	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	define i32 @foo(i8* nocapture readnone %match, i32 %cur_match, i32 %best_len, i32 %scan_end, i32* nocapture readonly %prev, i32 %limit, i32 %chain_length, i8* nocapture readonly %win, i32 %wmask) {			define i32 @foo(i8* nocapture readnone %match, i32 %cur_match, i32 %best_len, i32 %scan_end, i32* nocapture readonly %prev, i32 %limit, i32 %chain_length, i8* nocapture readonly %win, i32 %wmask) {
	; CHECK-LABEL: @foo(			; CHECK-LABEL: @foo(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[IDX_EXT2:%.]] = zext i32 [[CUR_MATCH:%.]] to i64			; CHECK-NEXT: [[IDX_EXT2:%.]] = zext i32 [[CUR_MATCH:%.]] to i64
	; CHECK-NEXT: [[ADD_PTR4:%.]] = getelementptr inbounds i8, i8 [[WIN:%.*]], i64 [[IDX_EXT2]]
	; CHECK-NEXT: [[IDX_EXT1:%.]] = zext i32 [[BEST_LEN:%.]] to i64			; CHECK-NEXT: [[IDX_EXT1:%.]] = zext i32 [[BEST_LEN:%.]] to i64
	; CHECK-NEXT: [[ADD_PTR25:%.]] = getelementptr inbounds i8, i8 [[ADD_PTR4]], i64 [[IDX_EXT1]]			; CHECK-NEXT: [[ADD_PTR25_IDX:%.*]] = add nuw nsw i64 [[IDX_EXT1]], [[IDX_EXT2]]
	; CHECK-NEXT: [[ADD_PTR36:%.]] = getelementptr inbounds i8, i8 [[ADD_PTR25]], i64 -1			; CHECK-NEXT: [[ADD_PTR36_IDX:%.*]] = add nsw i64 [[ADD_PTR25_IDX]], -1
				; CHECK-NEXT: [[ADD_PTR36:%.]] = getelementptr inbounds i8, i8 [[WIN:%.*]], i64 [[ADD_PTR36_IDX]]
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i8 [[ADD_PTR36]] to i32*			; CHECK-NEXT: [[TMP0:%.]] = bitcast i8 [[ADD_PTR36]] to i32*
	; CHECK-NEXT: [[TMP1:%.]] = load i32, i32 [[TMP0]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load i32, i32 [[TMP0]], align 4
	; CHECK-NEXT: [[CMP7:%.]] = icmp eq i32 [[TMP1]], [[SCAN_END:%.]]			; CHECK-NEXT: [[CMP7:%.]] = icmp eq i32 [[TMP1]], [[SCAN_END:%.]]
	; CHECK-NEXT: br i1 [[CMP7]], label [[DO_END:%.]], label [[IF_THEN_LR_PH:%.]]			; CHECK-NEXT: br i1 [[CMP7]], label [[DO_END:%.]], label [[IF_THEN_LR_PH:%.]]
	; CHECK: if.then.lr.ph:			; CHECK: if.then.lr.ph:
	; CHECK-NEXT: br label [[IF_THEN:%.*]]			; CHECK-NEXT: br label [[IF_THEN:%.*]]
	; CHECK: do.body:			; CHECK: do.body:
	; CHECK-NEXT: [[IDX_EXT:%.]] = zext i32 [[TMP4:%.]] to i64			; CHECK-NEXT: [[IDX_EXT:%.]] = zext i32 [[TMP4:%.]] to i64
	; CHECK-NEXT: [[ADD_PTR:%.]] = getelementptr inbounds i8, i8 [[WIN]], i64 [[IDX_EXT1]]			; CHECK-NEXT: [[ADD_PTR2_IDX:%.*]] = add nuw nsw i64 [[IDX_EXT]], [[IDX_EXT1]]
	; CHECK-NEXT: [[ADD_PTR2:%.]] = getelementptr inbounds i8, i8 [[ADD_PTR]], i64 -1			; CHECK-NEXT: [[ADD_PTR3_IDX:%.*]] = add nsw i64 [[ADD_PTR2_IDX]], -1
	; CHECK-NEXT: [[ADD_PTR3:%.]] = getelementptr inbounds i8, i8 [[ADD_PTR2]], i64 [[IDX_EXT]]			; CHECK-NEXT: [[ADD_PTR3:%.]] = getelementptr inbounds i8, i8 [[WIN]], i64 [[ADD_PTR3_IDX]]
	; CHECK-NEXT: [[TMP2:%.]] = bitcast i8 [[ADD_PTR3]] to i32*			; CHECK-NEXT: [[TMP2:%.]] = bitcast i8 [[ADD_PTR3]] to i32*
	; CHECK-NEXT: [[TMP3:%.]] = load i32, i32 [[TMP2]], align 4			; CHECK-NEXT: [[TMP3:%.]] = load i32, i32 [[TMP2]], align 4
	; CHECK-NEXT: [[CMP:%.*]] = icmp eq i32 [[TMP3]], [[SCAN_END]]			; CHECK-NEXT: [[CMP:%.*]] = icmp eq i32 [[TMP3]], [[SCAN_END]]
	; CHECK-NEXT: br i1 [[CMP]], label [[DO_END]], label [[IF_THEN]]			; CHECK-NEXT: br i1 [[CMP]], label [[DO_END]], label [[IF_THEN]]
	; CHECK: if.then:			; CHECK: if.then:
	; CHECK-NEXT: [[CUR_MATCH_ADDR_09:%.]] = phi i32 [ [[CUR_MATCH]], [[IF_THEN_LR_PH]] ], [ [[TMP4]], [[DO_BODY:%.]] ]			; CHECK-NEXT: [[CUR_MATCH_ADDR_09:%.]] = phi i32 [ [[CUR_MATCH]], [[IF_THEN_LR_PH]] ], [ [[TMP4]], [[DO_BODY:%.]] ]
	; CHECK-NEXT: [[CHAIN_LENGTH_ADDR_08:%.]] = phi i32 [ [[CHAIN_LENGTH:%.]], [[IF_THEN_LR_PH]] ], [ [[DEC:%.*]], [[DO_BODY]] ]			; CHECK-NEXT: [[CHAIN_LENGTH_ADDR_08:%.]] = phi i32 [ [[CHAIN_LENGTH:%.]], [[IF_THEN_LR_PH]] ], [ [[DEC:%.*]], [[DO_BODY]] ]
	; CHECK-NEXT: [[AND:%.]] = and i32 [[CUR_MATCH_ADDR_09]], [[WMASK:%.]]			; CHECK-NEXT: [[AND:%.]] = and i32 [[CUR_MATCH_ADDR_09]], [[WMASK:%.]]
	▲ Show 20 Lines • Show All 155 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/gep-custom-dl.ll

Show First 20 Lines • Show All 69 Lines • ▼ Show 20 Lines	;

%A = getelementptr [4 x i8 addrspace(2)], [4 x i8 addrspace(2)] addrspace(1)* @arst, i16 0, i16 2		%A = getelementptr [4 x i8 addrspace(2)], [4 x i8 addrspace(2)] addrspace(1)* @arst, i16 0, i16 2
store i8 addrspace(2)* %B, i8 addrspace(2)* addrspace(1)* %A		store i8 addrspace(2)* %B, i8 addrspace(2)* addrspace(1)* %A
ret void		ret void
}		}

define i32* @test4(i32* %I, i32 %C, i32 %D) {		define i32* @test4(i32* %I, i32 %C, i32 %D) {
; CHECK-LABEL: @test4(		; CHECK-LABEL: @test4(
; CHECK-NEXT: [[A:%.]] = getelementptr i32, i32 [[I:%.]], i32 [[C:%.]]		; CHECK-NEXT: [[B_IDX:%.]] = add i32 [[D:%.]], [[C:%.*]]
; CHECK-NEXT: [[B:%.]] = getelementptr i32, i32 [[A]], i32 [[D:%.*]]		; CHECK-NEXT: [[B:%.]] = getelementptr i32, i32 [[I:%.*]], i32 [[B_IDX]]
; CHECK-NEXT: ret i32* [[B]]		; CHECK-NEXT: ret i32* [[B]]
;		;
%A = getelementptr i32, i32* %I, i32 %C		%A = getelementptr i32, i32* %I, i32 %C
%B = getelementptr i32, i32* %A, i32 %D		%B = getelementptr i32, i32* %A, i32 %D
ret i32* %B		ret i32* %B
}		}


▲ Show 20 Lines • Show All 100 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/getelementptr.ll

Show First 20 Lines • Show All 109 Lines • ▼ Show 20 Lines	;

%A = getelementptr [4 x i8 addrspace(2)], [4 x i8 addrspace(2)] addrspace(1)* @arst, i16 0, i16 2		%A = getelementptr [4 x i8 addrspace(2)], [4 x i8 addrspace(2)] addrspace(1)* @arst, i16 0, i16 2
store i8 addrspace(2)* %B, i8 addrspace(2)* addrspace(1)* %A		store i8 addrspace(2)* %B, i8 addrspace(2)* addrspace(1)* %A
ret void		ret void
}		}

define i32* @test7(i32* %I, i64 %C, i64 %D) {		define i32* @test7(i32* %I, i64 %C, i64 %D) {
; CHECK-LABEL: @test7(		; CHECK-LABEL: @test7(
; CHECK-NEXT: [[A:%.]] = getelementptr i32, i32 [[I:%.]], i64 [[C:%.]]		; CHECK-NEXT: [[B_IDX:%.]] = add i64 [[D:%.]], [[C:%.*]]
; CHECK-NEXT: [[B:%.]] = getelementptr i32, i32 [[A]], i64 [[D:%.*]]		; CHECK-NEXT: [[B:%.]] = getelementptr i32, i32 [[I:%.*]], i64 [[B_IDX]]
; CHECK-NEXT: ret i32* [[B]]		; CHECK-NEXT: ret i32* [[B]]
;		;
%A = getelementptr i32, i32* %I, i64 %C		%A = getelementptr i32, i32* %I, i64 %C
%B = getelementptr i32, i32* %A, i64 %D		%B = getelementptr i32, i32* %A, i64 %D
ret i32* %B		ret i32* %B
}		}

define i8* @test8([10 x i32]* %X) {		define i8* @test8([10 x i32]* %X) {
▲ Show 20 Lines • Show All 1,196 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/select-gep.ll

Show First 20 Lines • Show All 96 Lines • ▼ Show 20 Lines	;
%cmp = icmp ugt i64 %x, %y		%cmp = icmp ugt i64 %x, %y
%select = select i1 %cmp, i32* %p, i32* %gep		%select = select i1 %cmp, i32* %p, i32* %gep
ret i32* %select		ret i32* %select
}		}

; PR51069		; PR51069
define i32* @test2c(i32* %p, i64 %x, i64 %y) {		define i32* @test2c(i32* %p, i64 %x, i64 %y) {
; CHECK-LABEL: @test2c(		; CHECK-LABEL: @test2c(
; CHECK-NEXT: [[GEP1:%.]] = getelementptr inbounds i32, i32 [[P:%.]], i64 [[X:%.]]		; CHECK-NEXT: [[ICMP:%.]] = icmp ugt i64 [[X:%.]], [[Y:%.*]]
; CHECK-NEXT: [[ICMP:%.]] = icmp ugt i64 [[X]], [[Y:%.]]
; CHECK-NEXT: [[SEL_IDX:%.*]] = select i1 [[ICMP]], i64 0, i64 6		; CHECK-NEXT: [[SEL_IDX:%.*]] = select i1 [[ICMP]], i64 0, i64 6
; CHECK-NEXT: [[SEL:%.]] = getelementptr i32, i32 [[GEP1]], i64 [[SEL_IDX]]		; CHECK-NEXT: [[SEL_IDX1:%.*]] = add i64 [[SEL_IDX]], [[X]]
		; CHECK-NEXT: [[SEL:%.]] = getelementptr i32, i32 [[P:%.*]], i64 [[SEL_IDX1]]
; CHECK-NEXT: ret i32* [[SEL]]		; CHECK-NEXT: ret i32* [[SEL]]
;		;
%gep1 = getelementptr inbounds i32, i32* %p, i64 %x		%gep1 = getelementptr inbounds i32, i32* %p, i64 %x
%gep2 = getelementptr inbounds i32, i32* %gep1, i64 6		%gep2 = getelementptr inbounds i32, i32* %gep1, i64 6
%icmp = icmp ugt i64 %x, %y		%icmp = icmp ugt i64 %x, %y
%sel = select i1 %icmp, i32* %gep1, i32* %gep2		%sel = select i1 %icmp, i32* %gep1, i32* %gep2
ret i32* %sel		ret i32* %sel
}		}

; PR51069		; PR51069
define i32* @test2d(i32* %p, i64 %x, i64 %y) {		define i32* @test2d(i32* %p, i64 %x, i64 %y) {
; CHECK-LABEL: @test2d(		; CHECK-LABEL: @test2d(
; CHECK-NEXT: [[GEP1:%.]] = getelementptr inbounds i32, i32 [[P:%.]], i64 [[X:%.]]		; CHECK-NEXT: [[ICMP:%.]] = icmp ugt i64 [[X:%.]], [[Y:%.*]]
; CHECK-NEXT: [[ICMP:%.]] = icmp ugt i64 [[X]], [[Y:%.]]
; CHECK-NEXT: [[SEL_IDX:%.*]] = select i1 [[ICMP]], i64 6, i64 0		; CHECK-NEXT: [[SEL_IDX:%.*]] = select i1 [[ICMP]], i64 6, i64 0
; CHECK-NEXT: [[SEL:%.]] = getelementptr i32, i32 [[GEP1]], i64 [[SEL_IDX]]		; CHECK-NEXT: [[SEL_IDX1:%.*]] = add i64 [[SEL_IDX]], [[X]]
		; CHECK-NEXT: [[SEL:%.]] = getelementptr i32, i32 [[P:%.*]], i64 [[SEL_IDX1]]
; CHECK-NEXT: ret i32* [[SEL]]		; CHECK-NEXT: ret i32* [[SEL]]
;		;
%gep1 = getelementptr inbounds i32, i32* %p, i64 %x		%gep1 = getelementptr inbounds i32, i32* %p, i64 %x
%gep2 = getelementptr inbounds i32, i32* %gep1, i64 6		%gep2 = getelementptr inbounds i32, i32* %gep1, i64 6
%icmp = icmp ugt i64 %x, %y		%icmp = icmp ugt i64 %x, %y
%sel = select i1 %icmp, i32* %gep2, i32* %gep1		%sel = select i1 %icmp, i32* %gep2, i32* %gep1
ret i32* %sel		ret i32* %sel
}		}
▲ Show 20 Lines • Show All 114 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/shift.ll

Show First 20 Lines • Show All 1,768 Lines • ▼ Show 20 Lines	;
ret void		ret void
}		}

; OSS Fuzz #26135		; OSS Fuzz #26135
; https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=26135		; https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=26135
define void @ashr_out_of_range_1(i177* %A) {		define void @ashr_out_of_range_1(i177* %A) {
; CHECK-LABEL: @ashr_out_of_range_1(		; CHECK-LABEL: @ashr_out_of_range_1(
; CHECK-NEXT: [[L:%.]] = load i177, i177 [[A:%.*]], align 4		; CHECK-NEXT: [[L:%.]] = load i177, i177 [[A:%.*]], align 4
; CHECK-NEXT: [[G11:%.]] = getelementptr i177, i177 [[A]], i64 -1
; CHECK-NEXT: [[B24_LOBIT:%.*]] = ashr i177 [[L]], 175		; CHECK-NEXT: [[B24_LOBIT:%.*]] = ashr i177 [[L]], 175
; CHECK-NEXT: [[TMP1:%.*]] = trunc i177 [[B24_LOBIT]] to i64		; CHECK-NEXT: [[TMP1:%.*]] = trunc i177 [[B24_LOBIT]] to i64
; CHECK-NEXT: [[G62:%.]] = getelementptr i177, i177 [[G11]], i64 [[TMP1]]		; CHECK-NEXT: [[G62_IDX:%.*]] = add i64 [[TMP1]], -1
		; CHECK-NEXT: [[G62:%.]] = getelementptr i177, i177 [[A]], i64 [[G62_IDX]]
; CHECK-NEXT: store i177 0, i177* [[G62]], align 4		; CHECK-NEXT: store i177 0, i177* [[G62]], align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
%L = load i177, i177* %A, align 4		%L = load i177, i177* %A, align 4
%B5 = udiv i177 %L, -1		%B5 = udiv i177 %L, -1
%B4 = add i177 %B5, -1		%B4 = add i177 %B5, -1
%B = and i177 %B4, %L		%B = and i177 %B4, %L
%B2 = add i177 %B, -1		%B2 = add i177 %B, -1
Show All 31 Lines

llvm/test/Transforms/LoopVectorize/AArch64/sve-vector-reverse.ll

	Show All 28 Lines
	; CHECK-NEXT: [[TMP3:%.*]] = shl i64 [[TMP2]], 3			; CHECK-NEXT: [[TMP3:%.*]] = shl i64 [[TMP2]], 3
	; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], [[TMP3]]			; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], [[TMP3]]
	; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]			; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP4:%.*]] = xor i64 [[INDEX]], -1			; CHECK-NEXT: [[TMP4:%.*]] = xor i64 [[INDEX]], -1
	; CHECK-NEXT: [[TMP5:%.*]] = add i64 [[TMP4]], [[N]]			; CHECK-NEXT: [[TMP5:%.*]] = add i64 [[TMP4]], [[N]]
	; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds double, double [[B]], i64 [[TMP5]]			; CHECK-NEXT: [[TMP6:%.*]] = call i32 @llvm.vscale.i32()
	; CHECK-NEXT: [[TMP7:%.*]] = call i32 @llvm.vscale.i32()			; CHECK-NEXT: [[DOTNEG:%.*]] = mul i32 [[TMP6]], -8
	; CHECK-NEXT: [[DOTNEG:%.*]] = mul i32 [[TMP7]], -8			; CHECK-NEXT: [[TMP7:%.*]] = or i32 [[DOTNEG]], 1
	; CHECK-NEXT: [[TMP8:%.*]] = or i32 [[DOTNEG]], 1			; CHECK-NEXT: [[TMP8:%.*]] = sext i32 [[TMP7]] to i64
	; CHECK-NEXT: [[TMP9:%.*]] = sext i32 [[TMP8]] to i64			; CHECK-NEXT: [[DOTIDX:%.*]] = add nsw i64 [[TMP5]], [[TMP8]]
	; CHECK-NEXT: [[TMP10:%.]] = getelementptr inbounds double, double [[TMP6]], i64 [[TMP9]]			; CHECK-NEXT: [[TMP9:%.]] = getelementptr inbounds double, double [[B]], i64 [[DOTIDX]]
	; CHECK-NEXT: [[TMP11:%.]] = bitcast double [[TMP10]] to <vscale x 8 x double>*			; CHECK-NEXT: [[TMP10:%.]] = bitcast double [[TMP9]] to <vscale x 8 x double>*
	; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <vscale x 8 x double>, <vscale x 8 x double> [[TMP11]], align 8, !alias.scope !0			; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <vscale x 8 x double>, <vscale x 8 x double> [[TMP10]], align 8, !alias.scope !0
	; CHECK-NEXT: [[REVERSE:%.*]] = call <vscale x 8 x double> @llvm.experimental.vector.reverse.nxv8f64(<vscale x 8 x double> [[WIDE_LOAD]])			; CHECK-NEXT: [[REVERSE:%.*]] = call <vscale x 8 x double> @llvm.experimental.vector.reverse.nxv8f64(<vscale x 8 x double> [[WIDE_LOAD]])
	; CHECK-NEXT: [[TMP12:%.*]] = fadd <vscale x 8 x double> [[REVERSE]], shufflevector (<vscale x 8 x double> insertelement (<vscale x 8 x double> poison, double 1.000000e+00, i32 0), <vscale x 8 x double> poison, <vscale x 8 x i32> zeroinitializer)			; CHECK-NEXT: [[TMP11:%.*]] = fadd <vscale x 8 x double> [[REVERSE]], shufflevector (<vscale x 8 x double> insertelement (<vscale x 8 x double> poison, double 1.000000e+00, i32 0), <vscale x 8 x double> poison, <vscale x 8 x i32> zeroinitializer)
	; CHECK-NEXT: [[TMP13:%.]] = getelementptr inbounds double, double [[A]], i64 [[TMP5]]			; CHECK-NEXT: [[REVERSE6:%.*]] = call <vscale x 8 x double> @llvm.experimental.vector.reverse.nxv8f64(<vscale x 8 x double> [[TMP11]])
	; CHECK-NEXT: [[REVERSE6:%.*]] = call <vscale x 8 x double> @llvm.experimental.vector.reverse.nxv8f64(<vscale x 8 x double> [[TMP12]])			; CHECK-NEXT: [[TMP12:%.*]] = call i32 @llvm.vscale.i32()
	; CHECK-NEXT: [[TMP14:%.*]] = call i32 @llvm.vscale.i32()			; CHECK-NEXT: [[DOTNEG7:%.*]] = mul i32 [[TMP12]], -8
	; CHECK-NEXT: [[DOTNEG7:%.*]] = mul i32 [[TMP14]], -8			; CHECK-NEXT: [[TMP13:%.*]] = or i32 [[DOTNEG7]], 1
	; CHECK-NEXT: [[TMP15:%.*]] = or i32 [[DOTNEG7]], 1			; CHECK-NEXT: [[TMP14:%.*]] = sext i32 [[TMP13]] to i64
	; CHECK-NEXT: [[TMP16:%.*]] = sext i32 [[TMP15]] to i64			; CHECK-NEXT: [[DOTIDX8:%.*]] = add nsw i64 [[TMP5]], [[TMP14]]
	; CHECK-NEXT: [[TMP17:%.]] = getelementptr inbounds double, double [[TMP13]], i64 [[TMP16]]			; CHECK-NEXT: [[TMP15:%.]] = getelementptr inbounds double, double [[A]], i64 [[DOTIDX8]]
	; CHECK-NEXT: [[TMP18:%.]] = bitcast double [[TMP17]] to <vscale x 8 x double>*			; CHECK-NEXT: [[TMP16:%.]] = bitcast double [[TMP15]] to <vscale x 8 x double>*
	; CHECK-NEXT: store <vscale x 8 x double> [[REVERSE6]], <vscale x 8 x double>* [[TMP18]], align 8, !alias.scope !3, !noalias !0			; CHECK-NEXT: store <vscale x 8 x double> [[REVERSE6]], <vscale x 8 x double>* [[TMP16]], align 8, !alias.scope !3, !noalias !0
	; CHECK-NEXT: [[TMP19:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-NEXT: [[TMP17:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[TMP20:%.*]] = shl i64 [[TMP19]], 3			; CHECK-NEXT: [[TMP18:%.*]] = shl i64 [[TMP17]], 3
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP20]]			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP18]]
	; CHECK-NEXT: [[TMP21:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP19:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP21]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP19]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_MOD_VF]], 0			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_MOD_VF]], 0
	; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP_LOOPEXIT:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP_LOOPEXIT:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_MOD_VF]], [[MIDDLE_BLOCK]] ], [ [[N]], [[FOR_BODY_PREHEADER]] ], [ [[N]], [[VECTOR_MEMCHECK]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_MOD_VF]], [[MIDDLE_BLOCK]] ], [ [[N]], [[FOR_BODY_PREHEADER]] ], [ [[N]], [[VECTOR_MEMCHECK]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.cond.cleanup.loopexit:			; CHECK: for.cond.cleanup.loopexit:
	; CHECK-NEXT: br label [[FOR_COND_CLEANUP]]			; CHECK-NEXT: br label [[FOR_COND_CLEANUP]]
	; CHECK: for.cond.cleanup:			; CHECK: for.cond.cleanup:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[I_08_IN:%.]] = phi i64 [ [[I_08:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]			; CHECK-NEXT: [[I_08_IN:%.]] = phi i64 [ [[I_08:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
	; CHECK-NEXT: [[I_08]] = add nsw i64 [[I_08_IN]], -1			; CHECK-NEXT: [[I_08]] = add nsw i64 [[I_08_IN]], -1
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds double, double [[B]], i64 [[I_08]]			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds double, double [[B]], i64 [[I_08]]
	; CHECK-NEXT: [[TMP22:%.]] = load double, double [[ARRAYIDX]], align 8			; CHECK-NEXT: [[TMP20:%.]] = load double, double [[ARRAYIDX]], align 8
	; CHECK-NEXT: [[ADD:%.*]] = fadd double [[TMP22]], 1.000000e+00			; CHECK-NEXT: [[ADD:%.*]] = fadd double [[TMP20]], 1.000000e+00
	; CHECK-NEXT: [[ARRAYIDX1:%.]] = getelementptr inbounds double, double [[A]], i64 [[I_08]]			; CHECK-NEXT: [[ARRAYIDX1:%.]] = getelementptr inbounds double, double [[A]], i64 [[I_08]]
	; CHECK-NEXT: store double [[ADD]], double* [[ARRAYIDX1]], align 8			; CHECK-NEXT: store double [[ADD]], double* [[ARRAYIDX1]], align 8
	; CHECK-NEXT: [[CMP:%.*]] = icmp sgt i64 [[I_08_IN]], 1			; CHECK-NEXT: [[CMP:%.*]] = icmp sgt i64 [[I_08_IN]], 1
	; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_COND_CLEANUP_LOOPEXIT]], !llvm.loop [[LOOP8:![0-9]+]]			; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_COND_CLEANUP_LOOPEXIT]], !llvm.loop [[LOOP8:![0-9]+]]
	;			;
	entry:			entry:
	%cmp7 = icmp sgt i64 %N, 0			%cmp7 = icmp sgt i64 %N, 0
	br i1 %cmp7, label %for.body, label %for.cond.cleanup			br i1 %cmp7, label %for.body, label %for.cond.cleanup
	Show All 36 Lines
	; CHECK-NEXT: [[TMP3:%.*]] = shl i64 [[TMP2]], 3			; CHECK-NEXT: [[TMP3:%.*]] = shl i64 [[TMP2]], 3
	; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], [[TMP3]]			; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], [[TMP3]]
	; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]			; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP4:%.*]] = xor i64 [[INDEX]], -1			; CHECK-NEXT: [[TMP4:%.*]] = xor i64 [[INDEX]], -1
	; CHECK-NEXT: [[TMP5:%.*]] = add i64 [[TMP4]], [[N]]			; CHECK-NEXT: [[TMP5:%.*]] = add i64 [[TMP4]], [[N]]
	; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds i64, i64 [[B]], i64 [[TMP5]]			; CHECK-NEXT: [[TMP6:%.*]] = call i32 @llvm.vscale.i32()
	; CHECK-NEXT: [[TMP7:%.*]] = call i32 @llvm.vscale.i32()			; CHECK-NEXT: [[DOTNEG:%.*]] = mul i32 [[TMP6]], -8
	; CHECK-NEXT: [[DOTNEG:%.*]] = mul i32 [[TMP7]], -8			; CHECK-NEXT: [[TMP7:%.*]] = or i32 [[DOTNEG]], 1
	; CHECK-NEXT: [[TMP8:%.*]] = or i32 [[DOTNEG]], 1			; CHECK-NEXT: [[TMP8:%.*]] = sext i32 [[TMP7]] to i64
	; CHECK-NEXT: [[TMP9:%.*]] = sext i32 [[TMP8]] to i64			; CHECK-NEXT: [[DOTIDX:%.*]] = add nsw i64 [[TMP5]], [[TMP8]]
	; CHECK-NEXT: [[TMP10:%.]] = getelementptr inbounds i64, i64 [[TMP6]], i64 [[TMP9]]			; CHECK-NEXT: [[TMP9:%.]] = getelementptr inbounds i64, i64 [[B]], i64 [[DOTIDX]]
	; CHECK-NEXT: [[TMP11:%.]] = bitcast i64 [[TMP10]] to <vscale x 8 x i64>*			; CHECK-NEXT: [[TMP10:%.]] = bitcast i64 [[TMP9]] to <vscale x 8 x i64>*
	; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <vscale x 8 x i64>, <vscale x 8 x i64> [[TMP11]], align 8, !alias.scope !9			; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <vscale x 8 x i64>, <vscale x 8 x i64> [[TMP10]], align 8, !alias.scope !9
	; CHECK-NEXT: [[REVERSE:%.*]] = call <vscale x 8 x i64> @llvm.experimental.vector.reverse.nxv8i64(<vscale x 8 x i64> [[WIDE_LOAD]])			; CHECK-NEXT: [[REVERSE:%.*]] = call <vscale x 8 x i64> @llvm.experimental.vector.reverse.nxv8i64(<vscale x 8 x i64> [[WIDE_LOAD]])
	; CHECK-NEXT: [[TMP12:%.*]] = add <vscale x 8 x i64> [[REVERSE]], shufflevector (<vscale x 8 x i64> insertelement (<vscale x 8 x i64> poison, i64 1, i32 0), <vscale x 8 x i64> poison, <vscale x 8 x i32> zeroinitializer)			; CHECK-NEXT: [[TMP11:%.*]] = add <vscale x 8 x i64> [[REVERSE]], shufflevector (<vscale x 8 x i64> insertelement (<vscale x 8 x i64> poison, i64 1, i32 0), <vscale x 8 x i64> poison, <vscale x 8 x i32> zeroinitializer)
	; CHECK-NEXT: [[TMP13:%.]] = getelementptr inbounds i64, i64 [[A]], i64 [[TMP5]]			; CHECK-NEXT: [[REVERSE6:%.*]] = call <vscale x 8 x i64> @llvm.experimental.vector.reverse.nxv8i64(<vscale x 8 x i64> [[TMP11]])
	; CHECK-NEXT: [[REVERSE6:%.*]] = call <vscale x 8 x i64> @llvm.experimental.vector.reverse.nxv8i64(<vscale x 8 x i64> [[TMP12]])			; CHECK-NEXT: [[TMP12:%.*]] = call i32 @llvm.vscale.i32()
	; CHECK-NEXT: [[TMP14:%.*]] = call i32 @llvm.vscale.i32()			; CHECK-NEXT: [[DOTNEG7:%.*]] = mul i32 [[TMP12]], -8
	; CHECK-NEXT: [[DOTNEG7:%.*]] = mul i32 [[TMP14]], -8			; CHECK-NEXT: [[TMP13:%.*]] = or i32 [[DOTNEG7]], 1
	; CHECK-NEXT: [[TMP15:%.*]] = or i32 [[DOTNEG7]], 1			; CHECK-NEXT: [[TMP14:%.*]] = sext i32 [[TMP13]] to i64
	; CHECK-NEXT: [[TMP16:%.*]] = sext i32 [[TMP15]] to i64			; CHECK-NEXT: [[DOTIDX8:%.*]] = add nsw i64 [[TMP5]], [[TMP14]]
	; CHECK-NEXT: [[TMP17:%.]] = getelementptr inbounds i64, i64 [[TMP13]], i64 [[TMP16]]			; CHECK-NEXT: [[TMP15:%.]] = getelementptr inbounds i64, i64 [[A]], i64 [[DOTIDX8]]
	; CHECK-NEXT: [[TMP18:%.]] = bitcast i64 [[TMP17]] to <vscale x 8 x i64>*			; CHECK-NEXT: [[TMP16:%.]] = bitcast i64 [[TMP15]] to <vscale x 8 x i64>*
	; CHECK-NEXT: store <vscale x 8 x i64> [[REVERSE6]], <vscale x 8 x i64>* [[TMP18]], align 8, !alias.scope !12, !noalias !9			; CHECK-NEXT: store <vscale x 8 x i64> [[REVERSE6]], <vscale x 8 x i64>* [[TMP16]], align 8, !alias.scope !12, !noalias !9
	; CHECK-NEXT: [[TMP19:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-NEXT: [[TMP17:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[TMP20:%.*]] = shl i64 [[TMP19]], 3			; CHECK-NEXT: [[TMP18:%.*]] = shl i64 [[TMP17]], 3
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP20]]			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP18]]
	; CHECK-NEXT: [[TMP21:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP19:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP21]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP14:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP19]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP14:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_MOD_VF]], 0			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_MOD_VF]], 0
	; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP_LOOPEXIT:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP_LOOPEXIT:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_MOD_VF]], [[MIDDLE_BLOCK]] ], [ [[N]], [[FOR_BODY_PREHEADER]] ], [ [[N]], [[VECTOR_MEMCHECK]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_MOD_VF]], [[MIDDLE_BLOCK]] ], [ [[N]], [[FOR_BODY_PREHEADER]] ], [ [[N]], [[VECTOR_MEMCHECK]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.cond.cleanup.loopexit:			; CHECK: for.cond.cleanup.loopexit:
	; CHECK-NEXT: br label [[FOR_COND_CLEANUP]]			; CHECK-NEXT: br label [[FOR_COND_CLEANUP]]
	; CHECK: for.cond.cleanup:			; CHECK: for.cond.cleanup:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[I_09_IN:%.]] = phi i64 [ [[I_09:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]			; CHECK-NEXT: [[I_09_IN:%.]] = phi i64 [ [[I_09:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
	; CHECK-NEXT: [[I_09]] = add nsw i64 [[I_09_IN]], -1			; CHECK-NEXT: [[I_09]] = add nsw i64 [[I_09_IN]], -1
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i64, i64 [[B]], i64 [[I_09]]			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i64, i64 [[B]], i64 [[I_09]]
	; CHECK-NEXT: [[TMP22:%.]] = load i64, i64 [[ARRAYIDX]], align 8			; CHECK-NEXT: [[TMP20:%.]] = load i64, i64 [[ARRAYIDX]], align 8
	; CHECK-NEXT: [[ADD:%.*]] = add i64 [[TMP22]], 1			; CHECK-NEXT: [[ADD:%.*]] = add i64 [[TMP20]], 1
	; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds i64, i64 [[A]], i64 [[I_09]]			; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds i64, i64 [[A]], i64 [[I_09]]
	; CHECK-NEXT: store i64 [[ADD]], i64* [[ARRAYIDX2]], align 8			; CHECK-NEXT: store i64 [[ADD]], i64* [[ARRAYIDX2]], align 8
	; CHECK-NEXT: [[CMP:%.*]] = icmp sgt i64 [[I_09_IN]], 1			; CHECK-NEXT: [[CMP:%.*]] = icmp sgt i64 [[I_09_IN]], 1
	; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_COND_CLEANUP_LOOPEXIT]], !llvm.loop [[LOOP15:![0-9]+]]			; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_COND_CLEANUP_LOOPEXIT]], !llvm.loop [[LOOP15:![0-9]+]]
	;			;
	entry:			entry:
	%cmp8 = icmp sgt i64 %N, 0			%cmp8 = icmp sgt i64 %N, 0
	br i1 %cmp8, label %for.body, label %for.cond.cleanup			br i1 %cmp8, label %for.body, label %for.cond.cleanup
	Show All 24 Lines

llvm/test/Transforms/LoopVectorize/AArch64/vector-reverse-mask4.ll

	Show All 38 Lines
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP0:%.*]] = xor i64 [[INDEX]], -1			; CHECK-NEXT: [[TMP0:%.*]] = xor i64 [[INDEX]], -1
	; CHECK-NEXT: [[TMP1:%.*]] = add i64 [[TMP0]], [[N]]			; CHECK-NEXT: [[TMP1:%.*]] = add i64 [[TMP0]], [[N]]
	; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds double, double [[COND]], i64 [[TMP1]]			; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds double, double [[COND]], i64 [[TMP1]]
	; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds double, double [[TMP2]], i64 -3			; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds double, double [[TMP2]], i64 -3
	; CHECK-NEXT: [[TMP4:%.]] = bitcast double [[TMP3]] to <4 x double>*			; CHECK-NEXT: [[TMP4:%.]] = bitcast double [[TMP3]] to <4 x double>*
	; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <4 x double>, <4 x double> [[TMP4]], align 8, !alias.scope !0			; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <4 x double>, <4 x double> [[TMP4]], align 8, !alias.scope !0
	; CHECK-NEXT: [[REVERSE:%.*]] = shufflevector <4 x double> [[WIDE_LOAD]], <4 x double> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>			; CHECK-NEXT: [[REVERSE:%.*]] = shufflevector <4 x double> [[WIDE_LOAD]], <4 x double> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
	; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds double, double [[TMP2]], i64 -4			; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds double, double [[TMP2]], i64 -7
	; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds double, double [[TMP5]], i64 -3			; CHECK-NEXT: [[TMP6:%.]] = bitcast double [[TMP5]] to <4 x double>*
	; CHECK-NEXT: [[TMP7:%.]] = bitcast double [[TMP6]] to <4 x double>*			; CHECK-NEXT: [[WIDE_LOAD6:%.]] = load <4 x double>, <4 x double> [[TMP6]], align 8, !alias.scope !0
	; CHECK-NEXT: [[WIDE_LOAD6:%.]] = load <4 x double>, <4 x double> [[TMP7]], align 8, !alias.scope !0
	; CHECK-NEXT: [[REVERSE7:%.*]] = shufflevector <4 x double> [[WIDE_LOAD6]], <4 x double> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>			; CHECK-NEXT: [[REVERSE7:%.*]] = shufflevector <4 x double> [[WIDE_LOAD6]], <4 x double> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
	; CHECK-NEXT: [[TMP8:%.*]] = fcmp une <4 x double> [[REVERSE]], zeroinitializer			; CHECK-NEXT: [[TMP7:%.*]] = fcmp une <4 x double> [[REVERSE]], zeroinitializer
	; CHECK-NEXT: [[TMP9:%.*]] = fcmp une <4 x double> [[REVERSE7]], zeroinitializer			; CHECK-NEXT: [[TMP8:%.*]] = fcmp une <4 x double> [[REVERSE7]], zeroinitializer
	; CHECK-NEXT: [[TMP10:%.]] = getelementptr inbounds double, double [[A]], i64 [[TMP1]]			; CHECK-NEXT: [[TMP9:%.]] = getelementptr inbounds double, double [[A]], i64 [[TMP1]]
	; CHECK-NEXT: [[TMP11:%.]] = getelementptr inbounds double, double [[TMP10]], i64 -3			; CHECK-NEXT: [[TMP10:%.]] = getelementptr inbounds double, double [[TMP9]], i64 -3
	; CHECK-NEXT: [[REVERSE8:%.*]] = shufflevector <4 x i1> [[TMP8]], <4 x i1> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>			; CHECK-NEXT: [[REVERSE8:%.*]] = shufflevector <4 x i1> [[TMP7]], <4 x i1> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
	; CHECK-NEXT: [[TMP12:%.]] = bitcast double [[TMP11]] to <4 x double>*			; CHECK-NEXT: [[TMP11:%.]] = bitcast double [[TMP10]] to <4 x double>*
	; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x double> @llvm.masked.load.v4f64.p0v4f64(<4 x double> nonnull [[TMP12]], i32 8, <4 x i1> [[REVERSE8]], <4 x double> poison), !alias.scope !3, !noalias !0			; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x double> @llvm.masked.load.v4f64.p0v4f64(<4 x double> nonnull [[TMP11]], i32 8, <4 x i1> [[REVERSE8]], <4 x double> poison), !alias.scope !3, !noalias !0
	; CHECK-NEXT: [[TMP13:%.]] = getelementptr inbounds double, double [[TMP10]], i64 -4			; CHECK-NEXT: [[TMP12:%.]] = getelementptr inbounds double, double [[TMP9]], i64 -7
	; CHECK-NEXT: [[TMP14:%.]] = getelementptr inbounds double, double [[TMP13]], i64 -3			; CHECK-NEXT: [[REVERSE10:%.*]] = shufflevector <4 x i1> [[TMP8]], <4 x i1> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
	; CHECK-NEXT: [[REVERSE10:%.*]] = shufflevector <4 x i1> [[TMP9]], <4 x i1> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>			; CHECK-NEXT: [[TMP13:%.]] = bitcast double [[TMP12]] to <4 x double>*
	; CHECK-NEXT: [[TMP15:%.]] = bitcast double [[TMP14]] to <4 x double>*			; CHECK-NEXT: [[WIDE_MASKED_LOAD11:%.]] = call <4 x double> @llvm.masked.load.v4f64.p0v4f64(<4 x double> nonnull [[TMP13]], i32 8, <4 x i1> [[REVERSE10]], <4 x double> poison), !alias.scope !3, !noalias !0
	; CHECK-NEXT: [[WIDE_MASKED_LOAD11:%.]] = call <4 x double> @llvm.masked.load.v4f64.p0v4f64(<4 x double> nonnull [[TMP15]], i32 8, <4 x i1> [[REVERSE10]], <4 x double> poison), !alias.scope !3, !noalias !0			; CHECK-NEXT: [[TMP14:%.*]] = fadd <4 x double> [[WIDE_MASKED_LOAD]], <double 1.000000e+00, double 1.000000e+00, double 1.000000e+00, double 1.000000e+00>
	; CHECK-NEXT: [[TMP16:%.*]] = fadd <4 x double> [[WIDE_MASKED_LOAD]], <double 1.000000e+00, double 1.000000e+00, double 1.000000e+00, double 1.000000e+00>			; CHECK-NEXT: [[TMP15:%.*]] = fadd <4 x double> [[WIDE_MASKED_LOAD11]], <double 1.000000e+00, double 1.000000e+00, double 1.000000e+00, double 1.000000e+00>
	; CHECK-NEXT: [[TMP17:%.*]] = fadd <4 x double> [[WIDE_MASKED_LOAD11]], <double 1.000000e+00, double 1.000000e+00, double 1.000000e+00, double 1.000000e+00>			; CHECK-NEXT: [[TMP16:%.]] = bitcast double [[TMP10]] to <4 x double>*
	; CHECK-NEXT: [[TMP18:%.]] = bitcast double [[TMP11]] to <4 x double>*			; CHECK-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> [[TMP14]], <4 x double>* [[TMP16]], i32 8, <4 x i1> [[REVERSE8]]), !alias.scope !3, !noalias !0
	; CHECK-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> [[TMP16]], <4 x double>* [[TMP18]], i32 8, <4 x i1> [[REVERSE8]]), !alias.scope !3, !noalias !0			; CHECK-NEXT: [[TMP17:%.]] = bitcast double [[TMP12]] to <4 x double>*
	; CHECK-NEXT: [[TMP19:%.]] = bitcast double [[TMP14]] to <4 x double>*			; CHECK-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> [[TMP15]], <4 x double>* [[TMP17]], i32 8, <4 x i1> [[REVERSE10]]), !alias.scope !3, !noalias !0
	; CHECK-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> [[TMP17]], <4 x double>* [[TMP19]], i32 8, <4 x i1> [[REVERSE10]]), !alias.scope !3, !noalias !0
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8
	; CHECK-NEXT: [[TMP20:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP18:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP20]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP18]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_VEC]], [[N]]			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_VEC]], [[N]]
	; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP_LOOPEXIT:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP_LOOPEXIT:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ [[N]], [[FOR_BODY_PREHEADER]] ], [ [[N]], [[VECTOR_MEMCHECK]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ [[N]], [[FOR_BODY_PREHEADER]] ], [ [[N]], [[VECTOR_MEMCHECK]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.cond.cleanup.loopexit:			; CHECK: for.cond.cleanup.loopexit:
	; CHECK-NEXT: br label [[FOR_COND_CLEANUP]]			; CHECK-NEXT: br label [[FOR_COND_CLEANUP]]
	; CHECK: for.cond.cleanup:			; CHECK: for.cond.cleanup:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[I_08_IN:%.]] = phi i64 [ [[I_08:%.]], [[FOR_INC:%.*]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]			; CHECK-NEXT: [[I_08_IN:%.]] = phi i64 [ [[I_08:%.]], [[FOR_INC:%.*]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
	; CHECK-NEXT: [[I_08]] = add nsw i64 [[I_08_IN]], -1			; CHECK-NEXT: [[I_08]] = add nsw i64 [[I_08_IN]], -1
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds double, double [[COND]], i64 [[I_08]]			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds double, double [[COND]], i64 [[I_08]]
	; CHECK-NEXT: [[TMP21:%.]] = load double, double [[ARRAYIDX]], align 8			; CHECK-NEXT: [[TMP19:%.]] = load double, double [[ARRAYIDX]], align 8
	; CHECK-NEXT: [[TOBOOL:%.*]] = fcmp une double [[TMP21]], 0.000000e+00			; CHECK-NEXT: [[TOBOOL:%.*]] = fcmp une double [[TMP19]], 0.000000e+00
	; CHECK-NEXT: br i1 [[TOBOOL]], label [[IF_THEN:%.*]], label [[FOR_INC]]			; CHECK-NEXT: br i1 [[TOBOOL]], label [[IF_THEN:%.*]], label [[FOR_INC]]
	; CHECK: if.then:			; CHECK: if.then:
	; CHECK-NEXT: [[ARRAYIDX1:%.]] = getelementptr inbounds double, double [[A]], i64 [[I_08]]			; CHECK-NEXT: [[ARRAYIDX1:%.]] = getelementptr inbounds double, double [[A]], i64 [[I_08]]
	; CHECK-NEXT: [[TMP22:%.]] = load double, double [[ARRAYIDX1]], align 8			; CHECK-NEXT: [[TMP20:%.]] = load double, double [[ARRAYIDX1]], align 8
	; CHECK-NEXT: [[ADD:%.*]] = fadd double [[TMP22]], 1.000000e+00			; CHECK-NEXT: [[ADD:%.*]] = fadd double [[TMP20]], 1.000000e+00
	; CHECK-NEXT: store double [[ADD]], double* [[ARRAYIDX1]], align 8			; CHECK-NEXT: store double [[ADD]], double* [[ARRAYIDX1]], align 8
	; CHECK-NEXT: br label [[FOR_INC]]			; CHECK-NEXT: br label [[FOR_INC]]
	; CHECK: for.inc:			; CHECK: for.inc:
	; CHECK-NEXT: [[CMP:%.*]] = icmp sgt i64 [[I_08_IN]], 1			; CHECK-NEXT: [[CMP:%.*]] = icmp sgt i64 [[I_08_IN]], 1
	; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_COND_CLEANUP_LOOPEXIT]], !llvm.loop [[LOOP8:![0-9]+]]			; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_COND_CLEANUP_LOOPEXIT]], !llvm.loop [[LOOP8:![0-9]+]]
	;			;

	entry:			entry:
	Show All 35 Lines

llvm/test/Transforms/LoopVectorize/ARM/mve-reduction-predselect.ll

	Show All 16 Lines
	; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i32 [[INDEX]]			; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i32 [[INDEX]]
	; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[TMP0]] to <4 x i32>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[TMP0]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP1]], i32 4, <4 x i1> [[ACTIVE_LANE_MASK]], <4 x i32> poison)			; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP1]], i32 4, <4 x i1> [[ACTIVE_LANE_MASK]], <4 x i32> poison)
	; CHECK-NEXT: [[TMP2:%.*]] = select <4 x i1> [[ACTIVE_LANE_MASK]], <4 x i32> [[WIDE_MASKED_LOAD]], <4 x i32> zeroinitializer			; CHECK-NEXT: [[TMP2:%.*]] = select <4 x i1> [[ACTIVE_LANE_MASK]], <4 x i32> [[WIDE_MASKED_LOAD]], <4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP3:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP2]])			; CHECK-NEXT: [[TMP3:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP2]])
	; CHECK-NEXT: [[TMP4]] = add i32 [[TMP3]], [[VEC_PHI]]			; CHECK-NEXT: [[TMP4]] = add i32 [[TMP3]], [[VEC_PHI]]
	; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 4
	; CHECK-NEXT: [[TMP5:%.*]] = icmp eq i32 [[INDEX_NEXT]], 260			; CHECK-NEXT: [[TMP5:%.*]] = icmp eq i32 [[INDEX_NEXT]], 260
	; CHECK-NEXT: br i1 [[TMP5]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP0:!llvm.loop !.]]			; CHECK-NEXT: br i1 [[TMP5]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: br i1 true, label [[DOT_CRIT_EDGE:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 true, label [[DOT_CRIT_EDGE:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: br label [[DOTLR_PH:%.*]]			; CHECK-NEXT: br label [[DOTLR_PH:%.*]]
	; CHECK: .lr.ph:			; CHECK: .lr.ph:
	; CHECK-NEXT: br i1 undef, label [[DOT_CRIT_EDGE]], label [[DOTLR_PH]], [[LOOP2:!llvm.loop !.*]]			; CHECK-NEXT: br i1 undef, label [[DOT_CRIT_EDGE]], label [[DOTLR_PH]], !llvm.loop [[LOOP2:![0-9]+]]
	; CHECK: ._crit_edge:			; CHECK: ._crit_edge:
	; CHECK-NEXT: [[SUM_0_LCSSA:%.*]] = phi i32 [ undef, [[DOTLR_PH]] ], [ [[TMP4]], [[MIDDLE_BLOCK]] ]			; CHECK-NEXT: [[SUM_0_LCSSA:%.*]] = phi i32 [ undef, [[DOTLR_PH]] ], [ [[TMP4]], [[MIDDLE_BLOCK]] ]
	; CHECK-NEXT: ret i32 [[SUM_0_LCSSA]]			; CHECK-NEXT: ret i32 [[SUM_0_LCSSA]]
	;			;
	entry:			entry:
	br label %.lr.ph			br label %.lr.ph

	.lr.ph: ; preds = %entry, %.lr.ph			.lr.ph: ; preds = %entry, %.lr.ph
	Show All 35 Lines
	; CHECK-NEXT: [[TMP8:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP7]])			; CHECK-NEXT: [[TMP8:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP7]])
	; CHECK-NEXT: [[TMP9:%.*]] = add i32 [[TMP8]], [[TMP6]]			; CHECK-NEXT: [[TMP9:%.*]] = add i32 [[TMP8]], [[TMP6]]
	; CHECK-NEXT: [[TMP10:%.*]] = select <4 x i1> [[ACTIVE_LANE_MASK]], <4 x i32> [[WIDE_MASKED_LOAD1]], <4 x i32> zeroinitializer			; CHECK-NEXT: [[TMP10:%.*]] = select <4 x i1> [[ACTIVE_LANE_MASK]], <4 x i32> [[WIDE_MASKED_LOAD1]], <4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP11:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP10]])			; CHECK-NEXT: [[TMP11:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP10]])
	; CHECK-NEXT: [[TMP12]] = add i32 [[TMP11]], [[TMP9]]			; CHECK-NEXT: [[TMP12]] = add i32 [[TMP11]], [[TMP9]]
	; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 4
	; CHECK-NEXT: [[VEC_IND_NEXT]] = add <4 x i32> [[VEC_IND]], <i32 4, i32 4, i32 4, i32 4>			; CHECK-NEXT: [[VEC_IND_NEXT]] = add <4 x i32> [[VEC_IND]], <i32 4, i32 4, i32 4, i32 4>
	; CHECK-NEXT: [[TMP13:%.*]] = icmp eq i32 [[INDEX_NEXT]], 260			; CHECK-NEXT: [[TMP13:%.*]] = icmp eq i32 [[INDEX_NEXT]], 260
	; CHECK-NEXT: br i1 [[TMP13]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP4:!llvm.loop !.]]			; CHECK-NEXT: br i1 [[TMP13]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: br i1 true, label [[DOT_CRIT_EDGE:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 true, label [[DOT_CRIT_EDGE:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: br label [[DOTLR_PH:%.*]]			; CHECK-NEXT: br label [[DOTLR_PH:%.*]]
	; CHECK: .lr.ph:			; CHECK: .lr.ph:
	; CHECK-NEXT: br i1 undef, label [[DOT_CRIT_EDGE]], label [[DOTLR_PH]], [[LOOP5:!llvm.loop !.*]]			; CHECK-NEXT: br i1 undef, label [[DOT_CRIT_EDGE]], label [[DOTLR_PH]], !llvm.loop [[LOOP5:![0-9]+]]
	; CHECK: ._crit_edge:			; CHECK: ._crit_edge:
	; CHECK-NEXT: [[SUM_0_LCSSA:%.*]] = phi i32 [ undef, [[DOTLR_PH]] ], [ [[TMP12]], [[MIDDLE_BLOCK]] ]			; CHECK-NEXT: [[SUM_0_LCSSA:%.*]] = phi i32 [ undef, [[DOTLR_PH]] ], [ [[TMP12]], [[MIDDLE_BLOCK]] ]
	; CHECK-NEXT: ret i32 [[SUM_0_LCSSA]]			; CHECK-NEXT: ret i32 [[SUM_0_LCSSA]]
	;			;
	entry:			entry:
	br label %.lr.ph			br label %.lr.ph

	.lr.ph: ; preds = %entry, %.lr.ph			.lr.ph: ; preds = %entry, %.lr.ph
	Show All 31 Lines
	; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i32 [[INDEX]]			; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i32 [[INDEX]]
	; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[TMP2]] to <4 x i32>*			; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[TMP2]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_MASKED_LOAD1:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP3]], i32 4, <4 x i1> [[ACTIVE_LANE_MASK]], <4 x i32> poison)			; CHECK-NEXT: [[WIDE_MASKED_LOAD1:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP3]], i32 4, <4 x i1> [[ACTIVE_LANE_MASK]], <4 x i32> poison)
	; CHECK-NEXT: [[TMP4:%.*]] = mul <4 x i32> [[VEC_PHI]], [[WIDE_MASKED_LOAD]]			; CHECK-NEXT: [[TMP4:%.*]] = mul <4 x i32> [[VEC_PHI]], [[WIDE_MASKED_LOAD]]
	; CHECK-NEXT: [[TMP5:%.*]] = mul <4 x i32> [[TMP4]], [[WIDE_MASKED_LOAD1]]			; CHECK-NEXT: [[TMP5:%.*]] = mul <4 x i32> [[TMP4]], [[WIDE_MASKED_LOAD1]]
	; CHECK-NEXT: [[TMP6]] = select <4 x i1> [[ACTIVE_LANE_MASK]], <4 x i32> [[TMP5]], <4 x i32> [[VEC_PHI]]			; CHECK-NEXT: [[TMP6]] = select <4 x i1> [[ACTIVE_LANE_MASK]], <4 x i32> [[TMP5]], <4 x i32> [[VEC_PHI]]
	; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 4
	; CHECK-NEXT: [[TMP7:%.*]] = icmp eq i32 [[INDEX_NEXT]], 260			; CHECK-NEXT: [[TMP7:%.*]] = icmp eq i32 [[INDEX_NEXT]], 260
	; CHECK-NEXT: br i1 [[TMP7]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP6:!llvm.loop !.]]			; CHECK-NEXT: br i1 [[TMP7]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[TMP8:%.*]] = call i32 @llvm.vector.reduce.mul.v4i32(<4 x i32> [[TMP6]])			; CHECK-NEXT: [[TMP8:%.*]] = call i32 @llvm.vector.reduce.mul.v4i32(<4 x i32> [[TMP6]])
	; CHECK-NEXT: br i1 true, label [[DOT_CRIT_EDGE:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 true, label [[DOT_CRIT_EDGE:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: br label [[DOTLR_PH:%.*]]			; CHECK-NEXT: br label [[DOTLR_PH:%.*]]
	; CHECK: .lr.ph:			; CHECK: .lr.ph:
	; CHECK-NEXT: br i1 undef, label [[DOT_CRIT_EDGE]], label [[DOTLR_PH]], [[LOOP7:!llvm.loop !.*]]			; CHECK-NEXT: br i1 undef, label [[DOT_CRIT_EDGE]], label [[DOTLR_PH]], !llvm.loop [[LOOP7:![0-9]+]]
	; CHECK: ._crit_edge:			; CHECK: ._crit_edge:
	; CHECK-NEXT: [[PROD_0_LCSSA:%.*]] = phi i32 [ undef, [[DOTLR_PH]] ], [ [[TMP8]], [[MIDDLE_BLOCK]] ]			; CHECK-NEXT: [[PROD_0_LCSSA:%.*]] = phi i32 [ undef, [[DOTLR_PH]] ], [ [[TMP8]], [[MIDDLE_BLOCK]] ]
	; CHECK-NEXT: ret i32 [[PROD_0_LCSSA]]			; CHECK-NEXT: ret i32 [[PROD_0_LCSSA]]
	;			;
	entry:			entry:
	br label %.lr.ph			br label %.lr.ph

	.lr.ph: ; preds = %entry, %.lr.ph			.lr.ph: ; preds = %entry, %.lr.ph
	Show All 30 Lines
	; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i32 [[INDEX]]			; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i32 [[INDEX]]
	; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[TMP2]] to <4 x i32>*			; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[TMP2]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_MASKED_LOAD1:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP3]], i32 4, <4 x i1> [[ACTIVE_LANE_MASK]], <4 x i32> poison)			; CHECK-NEXT: [[WIDE_MASKED_LOAD1:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP3]], i32 4, <4 x i1> [[ACTIVE_LANE_MASK]], <4 x i32> poison)
	; CHECK-NEXT: [[TMP4:%.*]] = and <4 x i32> [[VEC_PHI]], [[WIDE_MASKED_LOAD]]			; CHECK-NEXT: [[TMP4:%.*]] = and <4 x i32> [[VEC_PHI]], [[WIDE_MASKED_LOAD]]
	; CHECK-NEXT: [[TMP5:%.*]] = and <4 x i32> [[TMP4]], [[WIDE_MASKED_LOAD1]]			; CHECK-NEXT: [[TMP5:%.*]] = and <4 x i32> [[TMP4]], [[WIDE_MASKED_LOAD1]]
	; CHECK-NEXT: [[TMP6]] = select <4 x i1> [[ACTIVE_LANE_MASK]], <4 x i32> [[TMP5]], <4 x i32> [[VEC_PHI]]			; CHECK-NEXT: [[TMP6]] = select <4 x i1> [[ACTIVE_LANE_MASK]], <4 x i32> [[TMP5]], <4 x i32> [[VEC_PHI]]
	; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 4
	; CHECK-NEXT: [[TMP7:%.*]] = icmp eq i32 [[INDEX_NEXT]], 260			; CHECK-NEXT: [[TMP7:%.*]] = icmp eq i32 [[INDEX_NEXT]], 260
	; CHECK-NEXT: br i1 [[TMP7]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP8:!llvm.loop !.]]			; CHECK-NEXT: br i1 [[TMP7]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP8:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[TMP8:%.*]] = call i32 @llvm.vector.reduce.and.v4i32(<4 x i32> [[TMP6]])			; CHECK-NEXT: [[TMP8:%.*]] = call i32 @llvm.vector.reduce.and.v4i32(<4 x i32> [[TMP6]])
	; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: br i1 undef, label [[FOR_END]], label [[FOR_BODY]], [[LOOP9:!llvm.loop !.*]]			; CHECK-NEXT: br i1 undef, label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP9:![0-9]+]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: [[RESULT_0_LCSSA:%.*]] = phi i32 [ undef, [[FOR_BODY]] ], [ [[TMP8]], [[MIDDLE_BLOCK]] ]			; CHECK-NEXT: [[RESULT_0_LCSSA:%.*]] = phi i32 [ undef, [[FOR_BODY]] ], [ [[TMP8]], [[MIDDLE_BLOCK]] ]
	; CHECK-NEXT: ret i32 [[RESULT_0_LCSSA]]			; CHECK-NEXT: ret i32 [[RESULT_0_LCSSA]]
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %entry, %for.body			for.body: ; preds = %entry, %for.body
	Show All 30 Lines
	; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i32 [[INDEX]]			; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i32 [[INDEX]]
	; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[TMP2]] to <4 x i32>*			; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[TMP2]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_MASKED_LOAD1:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP3]], i32 4, <4 x i1> [[ACTIVE_LANE_MASK]], <4 x i32> poison)			; CHECK-NEXT: [[WIDE_MASKED_LOAD1:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP3]], i32 4, <4 x i1> [[ACTIVE_LANE_MASK]], <4 x i32> poison)
	; CHECK-NEXT: [[TMP4:%.*]] = add nsw <4 x i32> [[WIDE_MASKED_LOAD1]], [[WIDE_MASKED_LOAD]]			; CHECK-NEXT: [[TMP4:%.*]] = add nsw <4 x i32> [[WIDE_MASKED_LOAD1]], [[WIDE_MASKED_LOAD]]
	; CHECK-NEXT: [[TMP5:%.*]] = select <4 x i1> [[ACTIVE_LANE_MASK]], <4 x i32> [[TMP4]], <4 x i32> zeroinitializer			; CHECK-NEXT: [[TMP5:%.*]] = select <4 x i1> [[ACTIVE_LANE_MASK]], <4 x i32> [[TMP4]], <4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP6]] = or <4 x i32> [[VEC_PHI]], [[TMP5]]			; CHECK-NEXT: [[TMP6]] = or <4 x i32> [[VEC_PHI]], [[TMP5]]
	; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 4
	; CHECK-NEXT: [[TMP7:%.*]] = icmp eq i32 [[INDEX_NEXT]], 260			; CHECK-NEXT: [[TMP7:%.*]] = icmp eq i32 [[INDEX_NEXT]], 260
	; CHECK-NEXT: br i1 [[TMP7]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP10:!llvm.loop !.]]			; CHECK-NEXT: br i1 [[TMP7]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP10:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[TMP8:%.*]] = call i32 @llvm.vector.reduce.or.v4i32(<4 x i32> [[TMP6]])			; CHECK-NEXT: [[TMP8:%.*]] = call i32 @llvm.vector.reduce.or.v4i32(<4 x i32> [[TMP6]])
	; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: br i1 undef, label [[FOR_END]], label [[FOR_BODY]], [[LOOP11:!llvm.loop !.*]]			; CHECK-NEXT: br i1 undef, label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP11:![0-9]+]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: [[RESULT_0_LCSSA:%.*]] = phi i32 [ undef, [[FOR_BODY]] ], [ [[TMP8]], [[MIDDLE_BLOCK]] ]			; CHECK-NEXT: [[RESULT_0_LCSSA:%.*]] = phi i32 [ undef, [[FOR_BODY]] ], [ [[TMP8]], [[MIDDLE_BLOCK]] ]
	; CHECK-NEXT: ret i32 [[RESULT_0_LCSSA]]			; CHECK-NEXT: ret i32 [[RESULT_0_LCSSA]]
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %entry, %for.body			for.body: ; preds = %entry, %for.body
	Show All 30 Lines
	; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i32 [[INDEX]]			; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i32 [[INDEX]]
	; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[TMP2]] to <4 x i32>*			; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[TMP2]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_MASKED_LOAD1:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP3]], i32 4, <4 x i1> [[ACTIVE_LANE_MASK]], <4 x i32> poison)			; CHECK-NEXT: [[WIDE_MASKED_LOAD1:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP3]], i32 4, <4 x i1> [[ACTIVE_LANE_MASK]], <4 x i32> poison)
	; CHECK-NEXT: [[TMP4:%.*]] = add nsw <4 x i32> [[WIDE_MASKED_LOAD1]], [[WIDE_MASKED_LOAD]]			; CHECK-NEXT: [[TMP4:%.*]] = add nsw <4 x i32> [[WIDE_MASKED_LOAD1]], [[WIDE_MASKED_LOAD]]
	; CHECK-NEXT: [[TMP5:%.*]] = select <4 x i1> [[ACTIVE_LANE_MASK]], <4 x i32> [[TMP4]], <4 x i32> zeroinitializer			; CHECK-NEXT: [[TMP5:%.*]] = select <4 x i1> [[ACTIVE_LANE_MASK]], <4 x i32> [[TMP4]], <4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP6]] = xor <4 x i32> [[VEC_PHI]], [[TMP5]]			; CHECK-NEXT: [[TMP6]] = xor <4 x i32> [[VEC_PHI]], [[TMP5]]
	; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 4
	; CHECK-NEXT: [[TMP7:%.*]] = icmp eq i32 [[INDEX_NEXT]], 260			; CHECK-NEXT: [[TMP7:%.*]] = icmp eq i32 [[INDEX_NEXT]], 260
	; CHECK-NEXT: br i1 [[TMP7]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP12:!llvm.loop !.]]			; CHECK-NEXT: br i1 [[TMP7]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP12:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[TMP8:%.*]] = call i32 @llvm.vector.reduce.xor.v4i32(<4 x i32> [[TMP6]])			; CHECK-NEXT: [[TMP8:%.*]] = call i32 @llvm.vector.reduce.xor.v4i32(<4 x i32> [[TMP6]])
	; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: br i1 undef, label [[FOR_END]], label [[FOR_BODY]], [[LOOP13:!llvm.loop !.*]]			; CHECK-NEXT: br i1 undef, label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP13:![0-9]+]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: [[RESULT_0_LCSSA:%.*]] = phi i32 [ undef, [[FOR_BODY]] ], [ [[TMP8]], [[MIDDLE_BLOCK]] ]			; CHECK-NEXT: [[RESULT_0_LCSSA:%.*]] = phi i32 [ undef, [[FOR_BODY]] ], [ [[TMP8]], [[MIDDLE_BLOCK]] ]
	; CHECK-NEXT: ret i32 [[RESULT_0_LCSSA]]			; CHECK-NEXT: ret i32 [[RESULT_0_LCSSA]]
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %entry, %for.body			for.body: ; preds = %entry, %for.body
	Show All 30 Lines
	; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds float, float [[B:%.*]], i32 [[INDEX]]			; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds float, float [[B:%.*]], i32 [[INDEX]]
	; CHECK-NEXT: [[TMP3:%.]] = bitcast float [[TMP2]] to <4 x float>*			; CHECK-NEXT: [[TMP3:%.]] = bitcast float [[TMP2]] to <4 x float>*
	; CHECK-NEXT: [[WIDE_MASKED_LOAD1:%.]] = call <4 x float> @llvm.masked.load.v4f32.p0v4f32(<4 x float> [[TMP3]], i32 4, <4 x i1> [[ACTIVE_LANE_MASK]], <4 x float> poison)			; CHECK-NEXT: [[WIDE_MASKED_LOAD1:%.]] = call <4 x float> @llvm.masked.load.v4f32.p0v4f32(<4 x float> [[TMP3]], i32 4, <4 x i1> [[ACTIVE_LANE_MASK]], <4 x float> poison)
	; CHECK-NEXT: [[TMP4:%.*]] = fadd fast <4 x float> [[VEC_PHI]], [[WIDE_MASKED_LOAD]]			; CHECK-NEXT: [[TMP4:%.*]] = fadd fast <4 x float> [[VEC_PHI]], [[WIDE_MASKED_LOAD]]
	; CHECK-NEXT: [[TMP5:%.*]] = fadd fast <4 x float> [[TMP4]], [[WIDE_MASKED_LOAD1]]			; CHECK-NEXT: [[TMP5:%.*]] = fadd fast <4 x float> [[TMP4]], [[WIDE_MASKED_LOAD1]]
	; CHECK-NEXT: [[TMP6]] = select <4 x i1> [[ACTIVE_LANE_MASK]], <4 x float> [[TMP5]], <4 x float> [[VEC_PHI]]			; CHECK-NEXT: [[TMP6]] = select <4 x i1> [[ACTIVE_LANE_MASK]], <4 x float> [[TMP5]], <4 x float> [[VEC_PHI]]
	; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 4
	; CHECK-NEXT: [[TMP7:%.*]] = icmp eq i32 [[INDEX_NEXT]], 260			; CHECK-NEXT: [[TMP7:%.*]] = icmp eq i32 [[INDEX_NEXT]], 260
	; CHECK-NEXT: br i1 [[TMP7]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP14:!llvm.loop !.]]			; CHECK-NEXT: br i1 [[TMP7]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP14:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[TMP8:%.*]] = call fast float @llvm.vector.reduce.fadd.v4f32(float -0.000000e+00, <4 x float> [[TMP6]])			; CHECK-NEXT: [[TMP8:%.*]] = call fast float @llvm.vector.reduce.fadd.v4f32(float -0.000000e+00, <4 x float> [[TMP6]])
	; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: br i1 undef, label [[FOR_END]], label [[FOR_BODY]], [[LOOP15:!llvm.loop !.*]]			; CHECK-NEXT: br i1 undef, label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP15:![0-9]+]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: [[RESULT_0_LCSSA:%.*]] = phi float [ undef, [[FOR_BODY]] ], [ [[TMP8]], [[MIDDLE_BLOCK]] ]			; CHECK-NEXT: [[RESULT_0_LCSSA:%.*]] = phi float [ undef, [[FOR_BODY]] ], [ [[TMP8]], [[MIDDLE_BLOCK]] ]
	; CHECK-NEXT: ret float [[RESULT_0_LCSSA]]			; CHECK-NEXT: ret float [[RESULT_0_LCSSA]]
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %entry, %for.body			for.body: ; preds = %entry, %for.body
	Show All 30 Lines
	; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds float, float [[B:%.*]], i32 [[INDEX]]			; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds float, float [[B:%.*]], i32 [[INDEX]]
	; CHECK-NEXT: [[TMP3:%.]] = bitcast float [[TMP2]] to <4 x float>*			; CHECK-NEXT: [[TMP3:%.]] = bitcast float [[TMP2]] to <4 x float>*
	; CHECK-NEXT: [[WIDE_MASKED_LOAD1:%.]] = call <4 x float> @llvm.masked.load.v4f32.p0v4f32(<4 x float> [[TMP3]], i32 4, <4 x i1> [[ACTIVE_LANE_MASK]], <4 x float> poison)			; CHECK-NEXT: [[WIDE_MASKED_LOAD1:%.]] = call <4 x float> @llvm.masked.load.v4f32.p0v4f32(<4 x float> [[TMP3]], i32 4, <4 x i1> [[ACTIVE_LANE_MASK]], <4 x float> poison)
	; CHECK-NEXT: [[TMP4:%.*]] = fmul fast <4 x float> [[VEC_PHI]], [[WIDE_MASKED_LOAD]]			; CHECK-NEXT: [[TMP4:%.*]] = fmul fast <4 x float> [[VEC_PHI]], [[WIDE_MASKED_LOAD]]
	; CHECK-NEXT: [[TMP5:%.*]] = fmul fast <4 x float> [[TMP4]], [[WIDE_MASKED_LOAD1]]			; CHECK-NEXT: [[TMP5:%.*]] = fmul fast <4 x float> [[TMP4]], [[WIDE_MASKED_LOAD1]]
	; CHECK-NEXT: [[TMP6]] = select <4 x i1> [[ACTIVE_LANE_MASK]], <4 x float> [[TMP5]], <4 x float> [[VEC_PHI]]			; CHECK-NEXT: [[TMP6]] = select <4 x i1> [[ACTIVE_LANE_MASK]], <4 x float> [[TMP5]], <4 x float> [[VEC_PHI]]
	; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 4
	; CHECK-NEXT: [[TMP7:%.*]] = icmp eq i32 [[INDEX_NEXT]], 260			; CHECK-NEXT: [[TMP7:%.*]] = icmp eq i32 [[INDEX_NEXT]], 260
	; CHECK-NEXT: br i1 [[TMP7]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP16:!llvm.loop !.]]			; CHECK-NEXT: br i1 [[TMP7]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP16:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[TMP8:%.*]] = call fast float @llvm.vector.reduce.fmul.v4f32(float 1.000000e+00, <4 x float> [[TMP6]])			; CHECK-NEXT: [[TMP8:%.*]] = call fast float @llvm.vector.reduce.fmul.v4f32(float 1.000000e+00, <4 x float> [[TMP6]])
	; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: br i1 undef, label [[FOR_END]], label [[FOR_BODY]], [[LOOP17:!llvm.loop !.*]]			; CHECK-NEXT: br i1 undef, label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP17:![0-9]+]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: [[RESULT_0_LCSSA:%.*]] = phi float [ undef, [[FOR_BODY]] ], [ [[TMP8]], [[MIDDLE_BLOCK]] ]			; CHECK-NEXT: [[RESULT_0_LCSSA:%.*]] = phi float [ undef, [[FOR_BODY]] ], [ [[TMP8]], [[MIDDLE_BLOCK]] ]
	; CHECK-NEXT: ret float [[RESULT_0_LCSSA]]			; CHECK-NEXT: ret float [[RESULT_0_LCSSA]]
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %entry, %for.body			for.body: ; preds = %entry, %for.body
	Show All 25 Lines
	; CHECK-NEXT: [[VEC_PHI:%.]] = phi <4 x i32> [ <i32 1000, i32 1000, i32 1000, i32 1000>, [[VECTOR_PH]] ], [ [[TMP3:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_PHI:%.]] = phi <4 x i32> [ <i32 1000, i32 1000, i32 1000, i32 1000>, [[VECTOR_PH]] ], [ [[TMP3:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i32 [[INDEX]]			; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i32 [[INDEX]]
	; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[TMP0]] to <4 x i32>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[TMP0]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 4			; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 4
	; CHECK-NEXT: [[TMP2:%.*]] = icmp slt <4 x i32> [[VEC_PHI]], [[WIDE_LOAD]]			; CHECK-NEXT: [[TMP2:%.*]] = icmp slt <4 x i32> [[VEC_PHI]], [[WIDE_LOAD]]
	; CHECK-NEXT: [[TMP3]] = select <4 x i1> [[TMP2]], <4 x i32> [[VEC_PHI]], <4 x i32> [[WIDE_LOAD]]			; CHECK-NEXT: [[TMP3]] = select <4 x i1> [[TMP2]], <4 x i32> [[VEC_PHI]], <4 x i32> [[WIDE_LOAD]]
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 4
	; CHECK-NEXT: [[TMP4:%.*]] = icmp eq i32 [[INDEX_NEXT]], 256			; CHECK-NEXT: [[TMP4:%.*]] = icmp eq i32 [[INDEX_NEXT]], 256
	; CHECK-NEXT: br i1 [[TMP4]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP18:!llvm.loop !.]]			; CHECK-NEXT: br i1 [[TMP4]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP18:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[TMP5:%.*]] = call i32 @llvm.vector.reduce.smin.v4i32(<4 x i32> [[TMP3]])			; CHECK-NEXT: [[TMP5:%.*]] = call i32 @llvm.vector.reduce.smin.v4i32(<4 x i32> [[TMP3]])
	; CHECK-NEXT: br i1 false, label [[FOR_END:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 false, label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ 256, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ 256, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ [[TMP5]], [[MIDDLE_BLOCK]] ], [ 1000, [[ENTRY]] ]			; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ [[TMP5]], [[MIDDLE_BLOCK]] ], [ 1000, [[ENTRY]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i32 [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]			; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i32 [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
	; CHECK-NEXT: [[RESULT_08:%.]] = phi i32 [ [[V0:%.]], [[FOR_BODY]] ], [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ]			; CHECK-NEXT: [[RESULT_08:%.]] = phi i32 [ [[V0:%.]], [[FOR_BODY]] ], [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ]
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[A]], i32 [[INDVARS_IV]]			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[A]], i32 [[INDVARS_IV]]
	; CHECK-NEXT: [[L0:%.]] = load i32, i32 [[ARRAYIDX]], align 4			; CHECK-NEXT: [[L0:%.]] = load i32, i32 [[ARRAYIDX]], align 4
	; CHECK-NEXT: [[C0:%.*]] = icmp slt i32 [[RESULT_08]], [[L0]]			; CHECK-NEXT: [[C0:%.*]] = icmp slt i32 [[RESULT_08]], [[L0]]
	; CHECK-NEXT: [[V0]] = select i1 [[C0]], i32 [[RESULT_08]], i32 [[L0]]			; CHECK-NEXT: [[V0]] = select i1 [[C0]], i32 [[RESULT_08]], i32 [[L0]]
	; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add i32 [[INDVARS_IV]], 1			; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add i32 [[INDVARS_IV]], 1
	; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INDVARS_IV_NEXT]], 257			; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INDVARS_IV_NEXT]], 257
	; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], [[LOOP19:!llvm.loop !.*]]			; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP19:![0-9]+]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: [[RESULT_0_LCSSA:%.*]] = phi i32 [ [[V0]], [[FOR_BODY]] ], [ [[TMP5]], [[MIDDLE_BLOCK]] ]			; CHECK-NEXT: [[RESULT_0_LCSSA:%.*]] = phi i32 [ [[V0]], [[FOR_BODY]] ], [ [[TMP5]], [[MIDDLE_BLOCK]] ]
	; CHECK-NEXT: ret i32 [[RESULT_0_LCSSA]]			; CHECK-NEXT: ret i32 [[RESULT_0_LCSSA]]
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %entry, %for.body			for.body: ; preds = %entry, %for.body
	Show All 23 Lines
	; CHECK-NEXT: [[VEC_PHI:%.]] = phi <4 x i32> [ <i32 1000, i32 1000, i32 1000, i32 1000>, [[VECTOR_PH]] ], [ [[TMP3:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_PHI:%.]] = phi <4 x i32> [ <i32 1000, i32 1000, i32 1000, i32 1000>, [[VECTOR_PH]] ], [ [[TMP3:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i32 [[INDEX]]			; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i32 [[INDEX]]
	; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[TMP0]] to <4 x i32>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[TMP0]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 4			; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <4 x i32>, <4 x i32> [[TMP1]], align 4
	; CHECK-NEXT: [[TMP2:%.*]] = icmp ugt <4 x i32> [[VEC_PHI]], [[WIDE_LOAD]]			; CHECK-NEXT: [[TMP2:%.*]] = icmp ugt <4 x i32> [[VEC_PHI]], [[WIDE_LOAD]]
	; CHECK-NEXT: [[TMP3]] = select <4 x i1> [[TMP2]], <4 x i32> [[VEC_PHI]], <4 x i32> [[WIDE_LOAD]]			; CHECK-NEXT: [[TMP3]] = select <4 x i1> [[TMP2]], <4 x i32> [[VEC_PHI]], <4 x i32> [[WIDE_LOAD]]
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 4
	; CHECK-NEXT: [[TMP4:%.*]] = icmp eq i32 [[INDEX_NEXT]], 256			; CHECK-NEXT: [[TMP4:%.*]] = icmp eq i32 [[INDEX_NEXT]], 256
	; CHECK-NEXT: br i1 [[TMP4]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP20:!llvm.loop !.]]			; CHECK-NEXT: br i1 [[TMP4]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP20:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[TMP5:%.*]] = call i32 @llvm.vector.reduce.umax.v4i32(<4 x i32> [[TMP3]])			; CHECK-NEXT: [[TMP5:%.*]] = call i32 @llvm.vector.reduce.umax.v4i32(<4 x i32> [[TMP3]])
	; CHECK-NEXT: br i1 false, label [[FOR_END:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 false, label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ 256, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ 256, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ [[TMP5]], [[MIDDLE_BLOCK]] ], [ 1000, [[ENTRY]] ]			; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ [[TMP5]], [[MIDDLE_BLOCK]] ], [ 1000, [[ENTRY]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i32 [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]			; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i32 [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
	; CHECK-NEXT: [[RESULT_08:%.]] = phi i32 [ [[V0:%.]], [[FOR_BODY]] ], [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ]			; CHECK-NEXT: [[RESULT_08:%.]] = phi i32 [ [[V0:%.]], [[FOR_BODY]] ], [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ]
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[A]], i32 [[INDVARS_IV]]			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[A]], i32 [[INDVARS_IV]]
	; CHECK-NEXT: [[L0:%.]] = load i32, i32 [[ARRAYIDX]], align 4			; CHECK-NEXT: [[L0:%.]] = load i32, i32 [[ARRAYIDX]], align 4
	; CHECK-NEXT: [[C0:%.*]] = icmp ugt i32 [[RESULT_08]], [[L0]]			; CHECK-NEXT: [[C0:%.*]] = icmp ugt i32 [[RESULT_08]], [[L0]]
	; CHECK-NEXT: [[V0]] = select i1 [[C0]], i32 [[RESULT_08]], i32 [[L0]]			; CHECK-NEXT: [[V0]] = select i1 [[C0]], i32 [[RESULT_08]], i32 [[L0]]
	; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add i32 [[INDVARS_IV]], 1			; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add i32 [[INDVARS_IV]], 1
	; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INDVARS_IV_NEXT]], 257			; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INDVARS_IV_NEXT]], 257
	; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], [[LOOP21:!llvm.loop !.*]]			; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP21:![0-9]+]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: [[RESULT_0_LCSSA:%.*]] = phi i32 [ [[V0]], [[FOR_BODY]] ], [ [[TMP5]], [[MIDDLE_BLOCK]] ]			; CHECK-NEXT: [[RESULT_0_LCSSA:%.*]] = phi i32 [ [[V0]], [[FOR_BODY]] ], [ [[TMP5]], [[MIDDLE_BLOCK]] ]
	; CHECK-NEXT: ret i32 [[RESULT_0_LCSSA]]			; CHECK-NEXT: ret i32 [[RESULT_0_LCSSA]]
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %entry, %for.body			for.body: ; preds = %entry, %for.body
	▲ Show 20 Lines • Show All 50 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/ARM/mve-reductions.ll

	Show First 20 Lines • Show All 1,148 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i32 [[TMP0]], 6			; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i32 [[TMP0]], 6
	; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: [[N_VEC:%.*]] = and i32 [[TMP2]], -4			; CHECK-NEXT: [[N_VEC:%.*]] = and i32 [[TMP2]], -4
	; CHECK-NEXT: [[IND_END:%.*]] = shl i32 [[N_VEC]], 1			; CHECK-NEXT: [[IND_END:%.*]] = shl i32 [[N_VEC]], 1
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_PHI:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[TMP10:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_PHI:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[TMP8:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[OFFSET_IDX:%.*]] = shl i32 [[INDEX]], 1			; CHECK-NEXT: [[OFFSET_IDX:%.*]] = shl i32 [[INDEX]], 1
	; CHECK-NEXT: [[TMP3:%.*]] = or i32 [[OFFSET_IDX]], 1			; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[ARR:%.*]], i32 [[OFFSET_IDX]]
	; CHECK-NEXT: [[TMP4:%.]] = getelementptr inbounds i32, i32 [[ARR:%.*]], i32 -1			; CHECK-NEXT: [[TMP4:%.]] = bitcast i32 [[TMP3]] to <8 x i32>*
	; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 [[TMP3]]			; CHECK-NEXT: [[WIDE_VEC:%.]] = load <8 x i32>, <8 x i32> [[TMP4]], align 4
	; CHECK-NEXT: [[TMP6:%.]] = bitcast i32 [[TMP5]] to <8 x i32>*
	; CHECK-NEXT: [[WIDE_VEC:%.]] = load <8 x i32>, <8 x i32> [[TMP6]], align 4
	; CHECK-NEXT: [[STRIDED_VEC:%.*]] = shufflevector <8 x i32> [[WIDE_VEC]], <8 x i32> poison, <4 x i32> <i32 0, i32 2, i32 4, i32 6>			; CHECK-NEXT: [[STRIDED_VEC:%.*]] = shufflevector <8 x i32> [[WIDE_VEC]], <8 x i32> poison, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
	; CHECK-NEXT: [[STRIDED_VEC1:%.*]] = shufflevector <8 x i32> [[WIDE_VEC]], <8 x i32> poison, <4 x i32> <i32 1, i32 3, i32 5, i32 7>			; CHECK-NEXT: [[STRIDED_VEC1:%.*]] = shufflevector <8 x i32> [[WIDE_VEC]], <8 x i32> poison, <4 x i32> <i32 1, i32 3, i32 5, i32 7>
	; CHECK-NEXT: [[TMP7:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[STRIDED_VEC1]])			; CHECK-NEXT: [[TMP5:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[STRIDED_VEC1]])
	; CHECK-NEXT: [[TMP8:%.*]] = add i32 [[TMP7]], [[VEC_PHI]]			; CHECK-NEXT: [[TMP6:%.*]] = add i32 [[TMP5]], [[VEC_PHI]]
	; CHECK-NEXT: [[TMP9:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[STRIDED_VEC]])			; CHECK-NEXT: [[TMP7:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[STRIDED_VEC]])
	; CHECK-NEXT: [[TMP10]] = add i32 [[TMP9]], [[TMP8]]			; CHECK-NEXT: [[TMP8]] = add i32 [[TMP7]], [[TMP6]]
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 4
	; CHECK-NEXT: [[TMP11:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP9:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP11]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP26:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP9]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP26:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 [[TMP2]], [[N_VEC]]			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 [[TMP2]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[CMP_N]], label [[EXIT]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[EXIT]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i32 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ 0, [[FOR_BODY_PREHEADER]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i32 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ 0, [[FOR_BODY_PREHEADER]] ]
	; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ [[TMP10]], [[MIDDLE_BLOCK]] ], [ 0, [[FOR_BODY_PREHEADER]] ]			; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ [[TMP8]], [[MIDDLE_BLOCK]] ], [ 0, [[FOR_BODY_PREHEADER]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[IV:%.]] = phi i32 [ [[IV_NEXT:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]			; CHECK-NEXT: [[IV:%.]] = phi i32 [ [[IV_NEXT:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
	; CHECK-NEXT: [[RED_PHI:%.]] = phi i32 [ [[RED_2:%.]], [[FOR_BODY]] ], [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ]			; CHECK-NEXT: [[RED_PHI:%.]] = phi i32 [ [[RED_2:%.]], [[FOR_BODY]] ], [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ]
	; CHECK-NEXT: [[ADD:%.*]] = or i32 [[IV]], 1			; CHECK-NEXT: [[ADD:%.*]] = or i32 [[IV]], 1
	; CHECK-NEXT: [[GEP_0:%.]] = getelementptr inbounds i32, i32 [[ARR]], i32 [[ADD]]			; CHECK-NEXT: [[GEP_0:%.]] = getelementptr inbounds i32, i32 [[ARR]], i32 [[ADD]]
	; CHECK-NEXT: [[L_0:%.]] = load i32, i32 [[GEP_0]], align 4			; CHECK-NEXT: [[L_0:%.]] = load i32, i32 [[GEP_0]], align 4
	; CHECK-NEXT: [[GEP_1:%.]] = getelementptr inbounds i32, i32 [[ARR]], i32 [[IV]]			; CHECK-NEXT: [[GEP_1:%.]] = getelementptr inbounds i32, i32 [[ARR]], i32 [[IV]]
	; CHECK-NEXT: [[L_1:%.]] = load i32, i32 [[GEP_1]], align 4			; CHECK-NEXT: [[L_1:%.]] = load i32, i32 [[GEP_1]], align 4
	; CHECK-NEXT: [[RED_1:%.*]] = add i32 [[L_0]], [[RED_PHI]]			; CHECK-NEXT: [[RED_1:%.*]] = add i32 [[L_0]], [[RED_PHI]]
	; CHECK-NEXT: [[RED_2]] = add i32 [[RED_1]], [[L_1]]			; CHECK-NEXT: [[RED_2]] = add i32 [[RED_1]], [[L_1]]
	; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i32 [[IV]], 2			; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i32 [[IV]], 2
	; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[IV_NEXT]], [[N]]			; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[IV_NEXT]], [[N]]
	; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[EXIT]], !llvm.loop [[LOOP27:![0-9]+]]			; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[EXIT]], !llvm.loop [[LOOP27:![0-9]+]]
	; CHECK: exit:			; CHECK: exit:
	; CHECK-NEXT: [[RET_LCSSA:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[RED_2]], [[FOR_BODY]] ], [ [[TMP10]], [[MIDDLE_BLOCK]] ]			; CHECK-NEXT: [[RET_LCSSA:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[RED_2]], [[FOR_BODY]] ], [ [[TMP8]], [[MIDDLE_BLOCK]] ]
	; CHECK-NEXT: ret i32 [[RET_LCSSA]]			; CHECK-NEXT: ret i32 [[RET_LCSSA]]
	;			;
	entry:			entry:
	%guard = icmp sgt i32 %n, 0			%guard = icmp sgt i32 %n, 0
	br i1 %guard , label %for.body, label %exit			br i1 %guard , label %for.body, label %exit

	for.body: ; preds = %for.body.preheader, %for.body			for.body: ; preds = %for.body.preheader, %for.body
	%iv = phi i32 [ %iv.next, %for.body ], [ 0, %entry ]			%iv = phi i32 [ %iv.next, %for.body ], [ 0, %entry ]
	Show All 18 Lines

llvm/test/Transforms/LoopVectorize/X86/x86-interleaved-accesses-masked-group.ll

	Show First 20 Lines • Show All 1,433 Lines • ▼ Show 20 Lines
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP0:%.*]] = icmp ugt <8 x i32> [[VEC_IND]], [[BROADCAST_SPLAT]]			; ENABLED_MASKED_STRIDED-NEXT: [[TMP0:%.*]] = icmp ugt <8 x i32> [[VEC_IND]], [[BROADCAST_SPLAT]]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP1:%.*]] = shl nuw nsw i32 [[INDEX]], 1			; ENABLED_MASKED_STRIDED-NEXT: [[TMP1:%.*]] = shl nuw nsw i32 [[INDEX]], 1
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP2:%.]] = getelementptr inbounds i8, i8 [[P:%.*]], i32 [[TMP1]]			; ENABLED_MASKED_STRIDED-NEXT: [[TMP2:%.]] = getelementptr inbounds i8, i8 [[P:%.*]], i32 [[TMP1]]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP3:%.]] = bitcast i8 [[TMP2]] to <16 x i8>*			; ENABLED_MASKED_STRIDED-NEXT: [[TMP3:%.]] = bitcast i8 [[TMP2]] to <16 x i8>*
	; ENABLED_MASKED_STRIDED-NEXT: [[INTERLEAVED_MASK:%.*]] = shufflevector <8 x i1> [[TMP0]], <8 x i1> poison, <16 x i32> <i32 0, i32 0, i32 1, i32 1, i32 2, i32 2, i32 3, i32 3, i32 4, i32 4, i32 5, i32 5, i32 6, i32 6, i32 7, i32 7>			; ENABLED_MASKED_STRIDED-NEXT: [[INTERLEAVED_MASK:%.*]] = shufflevector <8 x i1> [[TMP0]], <8 x i1> poison, <16 x i32> <i32 0, i32 0, i32 1, i32 1, i32 2, i32 2, i32 3, i32 3, i32 4, i32 4, i32 5, i32 5, i32 6, i32 6, i32 7, i32 7>
	; ENABLED_MASKED_STRIDED-NEXT: [[WIDE_MASKED_VEC:%.]] = call <16 x i8> @llvm.masked.load.v16i8.p0v16i8(<16 x i8> [[TMP3]], i32 1, <16 x i1> [[INTERLEAVED_MASK]], <16 x i8> poison)			; ENABLED_MASKED_STRIDED-NEXT: [[WIDE_MASKED_VEC:%.]] = call <16 x i8> @llvm.masked.load.v16i8.p0v16i8(<16 x i8> [[TMP3]], i32 1, <16 x i1> [[INTERLEAVED_MASK]], <16 x i8> poison)
	; ENABLED_MASKED_STRIDED-NEXT: [[STRIDED_VEC:%.*]] = shufflevector <16 x i8> [[WIDE_MASKED_VEC]], <16 x i8> poison, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>			; ENABLED_MASKED_STRIDED-NEXT: [[STRIDED_VEC:%.*]] = shufflevector <16 x i8> [[WIDE_MASKED_VEC]], <16 x i8> poison, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>
	; ENABLED_MASKED_STRIDED-NEXT: [[STRIDED_VEC1:%.*]] = shufflevector <16 x i8> [[WIDE_MASKED_VEC]], <16 x i8> poison, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>			; ENABLED_MASKED_STRIDED-NEXT: [[STRIDED_VEC1:%.*]] = shufflevector <16 x i8> [[WIDE_MASKED_VEC]], <16 x i8> poison, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP4:%.*]] = or i32 [[TMP1]], 1			; ENABLED_MASKED_STRIDED-NEXT: [[TMP4:%.*]] = icmp slt <8 x i8> [[STRIDED_VEC]], [[STRIDED_VEC1]]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP5:%.*]] = icmp slt <8 x i8> [[STRIDED_VEC]], [[STRIDED_VEC1]]			; ENABLED_MASKED_STRIDED-NEXT: [[TMP5:%.*]] = select <8 x i1> [[TMP4]], <8 x i8> [[STRIDED_VEC1]], <8 x i8> [[STRIDED_VEC]]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP6:%.*]] = select <8 x i1> [[TMP5]], <8 x i8> [[STRIDED_VEC1]], <8 x i8> [[STRIDED_VEC]]			; ENABLED_MASKED_STRIDED-NEXT: [[TMP6:%.*]] = sub <8 x i8> zeroinitializer, [[TMP5]]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP7:%.*]] = sub <8 x i8> zeroinitializer, [[TMP6]]			; ENABLED_MASKED_STRIDED-NEXT: [[TMP7:%.]] = getelementptr inbounds i8, i8 [[Q:%.*]], i32 [[TMP1]]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP8:%.]] = getelementptr inbounds i8, i8 [[Q:%.*]], i32 -1			; ENABLED_MASKED_STRIDED-NEXT: [[TMP8:%.]] = bitcast i8 [[TMP7]] to <16 x i8>*
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP9:%.]] = getelementptr inbounds i8, i8 [[TMP8]], i32 [[TMP4]]			; ENABLED_MASKED_STRIDED-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <8 x i8> [[TMP5]], <8 x i8> [[TMP6]], <16 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15>
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP10:%.]] = bitcast i8 [[TMP9]] to <16 x i8>*			; ENABLED_MASKED_STRIDED-NEXT: call void @llvm.masked.store.v16i8.p0v16i8(<16 x i8> [[INTERLEAVED_VEC]], <16 x i8>* [[TMP8]], i32 1, <16 x i1> [[INTERLEAVED_MASK]])
	; ENABLED_MASKED_STRIDED-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <8 x i8> [[TMP6]], <8 x i8> [[TMP7]], <16 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15>
	; ENABLED_MASKED_STRIDED-NEXT: call void @llvm.masked.store.v16i8.p0v16i8(<16 x i8> [[INTERLEAVED_VEC]], <16 x i8>* [[TMP10]], i32 1, <16 x i1> [[INTERLEAVED_MASK]])
	; ENABLED_MASKED_STRIDED-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 8			; ENABLED_MASKED_STRIDED-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 8
	; ENABLED_MASKED_STRIDED-NEXT: [[VEC_IND_NEXT]] = add <8 x i32> [[VEC_IND]], <i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8>			; ENABLED_MASKED_STRIDED-NEXT: [[VEC_IND_NEXT]] = add <8 x i32> [[VEC_IND]], <i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8>
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP11:%.*]] = icmp eq i32 [[INDEX_NEXT]], 1024			; ENABLED_MASKED_STRIDED-NEXT: [[TMP9:%.*]] = icmp eq i32 [[INDEX_NEXT]], 1024
	; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP11]], label [[FOR_END:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP9:![0-9]+]]			; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP9]], label [[FOR_END:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP9:![0-9]+]]
	; ENABLED_MASKED_STRIDED: for.end:			; ENABLED_MASKED_STRIDED: for.end:
	; ENABLED_MASKED_STRIDED-NEXT: ret void			; ENABLED_MASKED_STRIDED-NEXT: ret void
	;			;
	entry:			entry:
	%conv = zext i8 %guard to i32			%conv = zext i8 %guard to i32
	br label %for.body			br label %for.body

	for.body:			for.body:
	▲ Show 20 Lines • Show All 409 Lines • ▼ Show 20 Lines
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP2:%.*]] = shl nuw nsw i32 [[INDEX]], 1			; ENABLED_MASKED_STRIDED-NEXT: [[TMP2:%.*]] = shl nuw nsw i32 [[INDEX]], 1
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP3:%.]] = getelementptr inbounds i8, i8 [[P:%.*]], i32 [[TMP2]]			; ENABLED_MASKED_STRIDED-NEXT: [[TMP3:%.]] = getelementptr inbounds i8, i8 [[P:%.*]], i32 [[TMP2]]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP4:%.*]] = select <8 x i1> [[TMP1]], <8 x i1> [[TMP0]], <8 x i1> zeroinitializer			; ENABLED_MASKED_STRIDED-NEXT: [[TMP4:%.*]] = select <8 x i1> [[TMP1]], <8 x i1> [[TMP0]], <8 x i1> zeroinitializer
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP5:%.]] = bitcast i8 [[TMP3]] to <16 x i8>*			; ENABLED_MASKED_STRIDED-NEXT: [[TMP5:%.]] = bitcast i8 [[TMP3]] to <16 x i8>*
	; ENABLED_MASKED_STRIDED-NEXT: [[INTERLEAVED_MASK:%.*]] = shufflevector <8 x i1> [[TMP4]], <8 x i1> poison, <16 x i32> <i32 0, i32 0, i32 1, i32 1, i32 2, i32 2, i32 3, i32 3, i32 4, i32 4, i32 5, i32 5, i32 6, i32 6, i32 7, i32 7>			; ENABLED_MASKED_STRIDED-NEXT: [[INTERLEAVED_MASK:%.*]] = shufflevector <8 x i1> [[TMP4]], <8 x i1> poison, <16 x i32> <i32 0, i32 0, i32 1, i32 1, i32 2, i32 2, i32 3, i32 3, i32 4, i32 4, i32 5, i32 5, i32 6, i32 6, i32 7, i32 7>
	; ENABLED_MASKED_STRIDED-NEXT: [[WIDE_MASKED_VEC:%.]] = call <16 x i8> @llvm.masked.load.v16i8.p0v16i8(<16 x i8> [[TMP5]], i32 1, <16 x i1> [[INTERLEAVED_MASK]], <16 x i8> poison)			; ENABLED_MASKED_STRIDED-NEXT: [[WIDE_MASKED_VEC:%.]] = call <16 x i8> @llvm.masked.load.v16i8.p0v16i8(<16 x i8> [[TMP5]], i32 1, <16 x i1> [[INTERLEAVED_MASK]], <16 x i8> poison)
	; ENABLED_MASKED_STRIDED-NEXT: [[STRIDED_VEC:%.*]] = shufflevector <16 x i8> [[WIDE_MASKED_VEC]], <16 x i8> poison, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>			; ENABLED_MASKED_STRIDED-NEXT: [[STRIDED_VEC:%.*]] = shufflevector <16 x i8> [[WIDE_MASKED_VEC]], <16 x i8> poison, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>
	; ENABLED_MASKED_STRIDED-NEXT: [[STRIDED_VEC3:%.*]] = shufflevector <16 x i8> [[WIDE_MASKED_VEC]], <16 x i8> poison, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>			; ENABLED_MASKED_STRIDED-NEXT: [[STRIDED_VEC3:%.*]] = shufflevector <16 x i8> [[WIDE_MASKED_VEC]], <16 x i8> poison, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP6:%.*]] = or i32 [[TMP2]], 1			; ENABLED_MASKED_STRIDED-NEXT: [[TMP6:%.*]] = icmp slt <8 x i8> [[STRIDED_VEC]], [[STRIDED_VEC3]]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP7:%.*]] = icmp slt <8 x i8> [[STRIDED_VEC]], [[STRIDED_VEC3]]			; ENABLED_MASKED_STRIDED-NEXT: [[TMP7:%.*]] = select <8 x i1> [[TMP6]], <8 x i8> [[STRIDED_VEC3]], <8 x i8> [[STRIDED_VEC]]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP8:%.*]] = select <8 x i1> [[TMP7]], <8 x i8> [[STRIDED_VEC3]], <8 x i8> [[STRIDED_VEC]]			; ENABLED_MASKED_STRIDED-NEXT: [[TMP8:%.*]] = sub <8 x i8> zeroinitializer, [[TMP7]]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP9:%.*]] = sub <8 x i8> zeroinitializer, [[TMP8]]			; ENABLED_MASKED_STRIDED-NEXT: [[TMP9:%.]] = getelementptr inbounds i8, i8 [[Q:%.*]], i32 [[TMP2]]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP10:%.]] = getelementptr inbounds i8, i8 [[Q:%.*]], i32 -1			; ENABLED_MASKED_STRIDED-NEXT: [[TMP10:%.]] = bitcast i8 [[TMP9]] to <16 x i8>*
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP11:%.]] = getelementptr inbounds i8, i8 [[TMP10]], i32 [[TMP6]]			; ENABLED_MASKED_STRIDED-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <8 x i8> [[TMP7]], <8 x i8> [[TMP8]], <16 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15>
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP12:%.]] = bitcast i8 [[TMP11]] to <16 x i8>*			; ENABLED_MASKED_STRIDED-NEXT: call void @llvm.masked.store.v16i8.p0v16i8(<16 x i8> [[INTERLEAVED_VEC]], <16 x i8>* [[TMP10]], i32 1, <16 x i1> [[INTERLEAVED_MASK]])
	; ENABLED_MASKED_STRIDED-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <8 x i8> [[TMP8]], <8 x i8> [[TMP9]], <16 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15>
	; ENABLED_MASKED_STRIDED-NEXT: call void @llvm.masked.store.v16i8.p0v16i8(<16 x i8> [[INTERLEAVED_VEC]], <16 x i8>* [[TMP12]], i32 1, <16 x i1> [[INTERLEAVED_MASK]])
	; ENABLED_MASKED_STRIDED-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 8			; ENABLED_MASKED_STRIDED-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 8
	; ENABLED_MASKED_STRIDED-NEXT: [[VEC_IND_NEXT]] = add <8 x i32> [[VEC_IND]], <i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8>			; ENABLED_MASKED_STRIDED-NEXT: [[VEC_IND_NEXT]] = add <8 x i32> [[VEC_IND]], <i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8>
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP13:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]			; ENABLED_MASKED_STRIDED-NEXT: [[TMP11:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]
	; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP13]], label [[FOR_END]], label [[VECTOR_BODY]], !llvm.loop [[LOOP10:![0-9]+]]			; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP11]], label [[FOR_END]], label [[VECTOR_BODY]], !llvm.loop [[LOOP10:![0-9]+]]
	; ENABLED_MASKED_STRIDED: for.end:			; ENABLED_MASKED_STRIDED: for.end:
	; ENABLED_MASKED_STRIDED-NEXT: ret void			; ENABLED_MASKED_STRIDED-NEXT: ret void
	;			;
	entry:			entry:
	%cmp22 = icmp sgt i32 %n, 0			%cmp22 = icmp sgt i32 %n, 0
	br i1 %cmp22, label %for.body.preheader, label %for.end			br i1 %cmp22, label %for.body.preheader, label %for.end

	for.body.preheader:			for.body.preheader:
	▲ Show 20 Lines • Show All 409 Lines • ▼ Show 20 Lines
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP0:%.*]] = icmp ule <8 x i32> [[INDUCTION]], [[BROADCAST_SPLAT]]			; ENABLED_MASKED_STRIDED-NEXT: [[TMP0:%.*]] = icmp ule <8 x i32> [[INDUCTION]], [[BROADCAST_SPLAT]]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP1:%.*]] = shl nuw nsw i32 [[INDEX]], 1			; ENABLED_MASKED_STRIDED-NEXT: [[TMP1:%.*]] = shl nuw nsw i32 [[INDEX]], 1
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP2:%.]] = getelementptr inbounds i8, i8 [[P:%.*]], i32 [[TMP1]]			; ENABLED_MASKED_STRIDED-NEXT: [[TMP2:%.]] = getelementptr inbounds i8, i8 [[P:%.*]], i32 [[TMP1]]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP3:%.]] = bitcast i8 [[TMP2]] to <16 x i8>*			; ENABLED_MASKED_STRIDED-NEXT: [[TMP3:%.]] = bitcast i8 [[TMP2]] to <16 x i8>*
	; ENABLED_MASKED_STRIDED-NEXT: [[INTERLEAVED_MASK:%.*]] = shufflevector <8 x i1> [[TMP0]], <8 x i1> poison, <16 x i32> <i32 0, i32 0, i32 1, i32 1, i32 2, i32 2, i32 3, i32 3, i32 4, i32 4, i32 5, i32 5, i32 6, i32 6, i32 7, i32 7>			; ENABLED_MASKED_STRIDED-NEXT: [[INTERLEAVED_MASK:%.*]] = shufflevector <8 x i1> [[TMP0]], <8 x i1> poison, <16 x i32> <i32 0, i32 0, i32 1, i32 1, i32 2, i32 2, i32 3, i32 3, i32 4, i32 4, i32 5, i32 5, i32 6, i32 6, i32 7, i32 7>
	; ENABLED_MASKED_STRIDED-NEXT: [[WIDE_MASKED_VEC:%.]] = call <16 x i8> @llvm.masked.load.v16i8.p0v16i8(<16 x i8> [[TMP3]], i32 1, <16 x i1> [[INTERLEAVED_MASK]], <16 x i8> poison)			; ENABLED_MASKED_STRIDED-NEXT: [[WIDE_MASKED_VEC:%.]] = call <16 x i8> @llvm.masked.load.v16i8.p0v16i8(<16 x i8> [[TMP3]], i32 1, <16 x i1> [[INTERLEAVED_MASK]], <16 x i8> poison)
	; ENABLED_MASKED_STRIDED-NEXT: [[STRIDED_VEC:%.*]] = shufflevector <16 x i8> [[WIDE_MASKED_VEC]], <16 x i8> poison, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>			; ENABLED_MASKED_STRIDED-NEXT: [[STRIDED_VEC:%.*]] = shufflevector <16 x i8> [[WIDE_MASKED_VEC]], <16 x i8> poison, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>
	; ENABLED_MASKED_STRIDED-NEXT: [[STRIDED_VEC3:%.*]] = shufflevector <16 x i8> [[WIDE_MASKED_VEC]], <16 x i8> poison, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>			; ENABLED_MASKED_STRIDED-NEXT: [[STRIDED_VEC3:%.*]] = shufflevector <16 x i8> [[WIDE_MASKED_VEC]], <16 x i8> poison, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP4:%.*]] = or i32 [[TMP1]], 1			; ENABLED_MASKED_STRIDED-NEXT: [[TMP4:%.*]] = icmp slt <8 x i8> [[STRIDED_VEC]], [[STRIDED_VEC3]]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP5:%.*]] = icmp slt <8 x i8> [[STRIDED_VEC]], [[STRIDED_VEC3]]			; ENABLED_MASKED_STRIDED-NEXT: [[TMP5:%.*]] = select <8 x i1> [[TMP4]], <8 x i8> [[STRIDED_VEC3]], <8 x i8> [[STRIDED_VEC]]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP6:%.*]] = select <8 x i1> [[TMP5]], <8 x i8> [[STRIDED_VEC3]], <8 x i8> [[STRIDED_VEC]]			; ENABLED_MASKED_STRIDED-NEXT: [[TMP6:%.*]] = sub <8 x i8> zeroinitializer, [[TMP5]]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP7:%.*]] = sub <8 x i8> zeroinitializer, [[TMP6]]			; ENABLED_MASKED_STRIDED-NEXT: [[TMP7:%.]] = getelementptr inbounds i8, i8 [[Q:%.*]], i32 [[TMP1]]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP8:%.]] = getelementptr inbounds i8, i8 [[Q:%.*]], i32 -1			; ENABLED_MASKED_STRIDED-NEXT: [[TMP8:%.]] = bitcast i8 [[TMP7]] to <16 x i8>*
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP9:%.]] = getelementptr inbounds i8, i8 [[TMP8]], i32 [[TMP4]]			; ENABLED_MASKED_STRIDED-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <8 x i8> [[TMP5]], <8 x i8> [[TMP6]], <16 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15>
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP10:%.]] = bitcast i8 [[TMP9]] to <16 x i8>*			; ENABLED_MASKED_STRIDED-NEXT: call void @llvm.masked.store.v16i8.p0v16i8(<16 x i8> [[INTERLEAVED_VEC]], <16 x i8>* [[TMP8]], i32 1, <16 x i1> [[INTERLEAVED_MASK]])
	; ENABLED_MASKED_STRIDED-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <8 x i8> [[TMP6]], <8 x i8> [[TMP7]], <16 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15>
	; ENABLED_MASKED_STRIDED-NEXT: call void @llvm.masked.store.v16i8.p0v16i8(<16 x i8> [[INTERLEAVED_VEC]], <16 x i8>* [[TMP10]], i32 1, <16 x i1> [[INTERLEAVED_MASK]])
	; ENABLED_MASKED_STRIDED-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 8			; ENABLED_MASKED_STRIDED-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 8
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP11:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]			; ENABLED_MASKED_STRIDED-NEXT: [[TMP9:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]
	; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP11]], label [[FOR_END]], label [[VECTOR_BODY]], !llvm.loop [[LOOP11:![0-9]+]]			; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP9]], label [[FOR_END]], label [[VECTOR_BODY]], !llvm.loop [[LOOP11:![0-9]+]]
	; ENABLED_MASKED_STRIDED: for.end:			; ENABLED_MASKED_STRIDED: for.end:
	; ENABLED_MASKED_STRIDED-NEXT: ret void			; ENABLED_MASKED_STRIDED-NEXT: ret void
	;			;
	entry:			entry:
	%cmp20 = icmp sgt i32 %n, 0			%cmp20 = icmp sgt i32 %n, 0
	br i1 %cmp20, label %for.body.preheader, label %for.end			br i1 %cmp20, label %for.body.preheader, label %for.end

	for.body.preheader:			for.body.preheader:
	Show All 28 Lines

llvm/test/Transforms/LoopVectorize/consecutive-ptr-uniforms.ll

	Show First 20 Lines • Show All 44 Lines • ▼ Show 20 Lines
	; Check that a reverse consecutive pointer is recognized as uniform and remains			; Check that a reverse consecutive pointer is recognized as uniform and remains
	; uniform after vectorization.			; uniform after vectorization.
	;			;
	; CHECK: LV: Found uniform instruction: %tmp1 = getelementptr inbounds i32, i32* %a, i64 %i			; CHECK: LV: Found uniform instruction: %tmp1 = getelementptr inbounds i32, i32* %a, i64 %i
	; CHECK: vector.body			; CHECK: vector.body
	; CHECK: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]			; CHECK: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
	; CHECK: %offset.idx = sub i64 %n, %index			; CHECK: %offset.idx = sub i64 %n, %index
	; CHECK-NOT: getelementptr			; CHECK-NOT: getelementptr
	; CHECK: %[[G0:.+]] = getelementptr inbounds i32, i32* %a, i64 -3			; CHECK: %[[G0IDX:.+]] = add nsw i64 %offset.idx, -3
	; CHECK: getelementptr inbounds i32, i32* %[[G0]], i64 %offset.idx			; CHECK: getelementptr inbounds i32, i32* %a, i64 %[[G0IDX]]
	; CHECK-NOT: getelementptr			; CHECK-NOT: getelementptr
	; CHECK: br i1 {{.*}}, label %middle.block, label %vector.body			; CHECK: br i1 {{.*}}, label %middle.block, label %vector.body
	;			;
	define i32 @consecutive_ptr_reverse(i32* %a, i64 %n) {			define i32 @consecutive_ptr_reverse(i32* %a, i64 %n) {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	▲ Show 20 Lines • Show All 430 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/interleaved-accesses.ll

	Show First 20 Lines • Show All 680 Lines • ▼ Show 20 Lines
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[OFFSET_IDX:%.*]] = shl i64 [[INDEX]], 1			; CHECK-NEXT: [[OFFSET_IDX:%.*]] = shl i64 [[INDEX]], 1
	; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i64 [[OFFSET_IDX]]			; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i64 [[OFFSET_IDX]]
	; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[TMP0]] to <8 x i32>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[TMP0]] to <8 x i32>*
	; CHECK-NEXT: [[WIDE_VEC:%.]] = load <8 x i32>, <8 x i32> [[TMP1]], align 4			; CHECK-NEXT: [[WIDE_VEC:%.]] = load <8 x i32>, <8 x i32> [[TMP1]], align 4
	; CHECK-NEXT: [[STRIDED_VEC:%.*]] = shufflevector <8 x i32> [[WIDE_VEC]], <8 x i32> poison, <4 x i32> <i32 0, i32 2, i32 4, i32 6>			; CHECK-NEXT: [[STRIDED_VEC:%.*]] = shufflevector <8 x i32> [[WIDE_VEC]], <8 x i32> poison, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
	; CHECK-NEXT: [[STRIDED_VEC1:%.*]] = shufflevector <8 x i32> [[WIDE_VEC]], <8 x i32> poison, <4 x i32> <i32 1, i32 3, i32 5, i32 7>			; CHECK-NEXT: [[STRIDED_VEC1:%.*]] = shufflevector <8 x i32> [[WIDE_VEC]], <8 x i32> poison, <4 x i32> <i32 1, i32 3, i32 5, i32 7>
	; CHECK-NEXT: [[TMP2:%.*]] = or i64 [[OFFSET_IDX]], 1			; CHECK-NEXT: [[TMP2:%.*]] = mul nsw <4 x i32> [[STRIDED_VEC1]], [[STRIDED_VEC]]
	; CHECK-NEXT: [[TMP3:%.*]] = mul nsw <4 x i32> [[STRIDED_VEC1]], [[STRIDED_VEC]]
	; CHECK-NEXT: [[STRIDED_VEC3:%.*]] = shufflevector <8 x i32> [[WIDE_VEC]], <8 x i32> poison, <4 x i32> <i32 0, i32 2, i32 4, i32 6>			; CHECK-NEXT: [[STRIDED_VEC3:%.*]] = shufflevector <8 x i32> [[WIDE_VEC]], <8 x i32> poison, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
	; CHECK-NEXT: [[STRIDED_VEC4:%.*]] = shufflevector <8 x i32> [[WIDE_VEC]], <8 x i32> poison, <4 x i32> <i32 1, i32 3, i32 5, i32 7>			; CHECK-NEXT: [[STRIDED_VEC4:%.*]] = shufflevector <8 x i32> [[WIDE_VEC]], <8 x i32> poison, <4 x i32> <i32 1, i32 3, i32 5, i32 7>
	; CHECK-NEXT: [[TMP4:%.*]] = add nsw <4 x i32> [[STRIDED_VEC4]], [[STRIDED_VEC3]]			; CHECK-NEXT: [[TMP3:%.*]] = add nsw <4 x i32> [[STRIDED_VEC4]], [[STRIDED_VEC3]]
	; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i64 -1			; CHECK-NEXT: [[TMP4:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i64 [[OFFSET_IDX]]
	; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[TMP5]], i64 [[TMP2]]			; CHECK-NEXT: [[TMP5:%.]] = bitcast i32 [[TMP4]] to <8 x i32>*
	; CHECK-NEXT: [[TMP7:%.]] = bitcast i32 [[TMP6]] to <8 x i32>*			; CHECK-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> [[TMP3]], <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7>
	; CHECK-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7>			; CHECK-NEXT: store <8 x i32> [[INTERLEAVED_VEC]], <8 x i32>* [[TMP5]], align 4
	; CHECK-NEXT: store <8 x i32> [[INTERLEAVED_VEC]], <8 x i32>* [[TMP7]], align 4
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
	; CHECK-NEXT: [[TMP8:%.*]] = icmp eq i64 [[INDEX_NEXT]], 512			; CHECK-NEXT: [[TMP6:%.*]] = icmp eq i64 [[INDEX_NEXT]], 512
	; CHECK-NEXT: br i1 [[TMP8]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP18:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP6]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP18:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: br i1 true, label [[FOR_COND_CLEANUP:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 true, label [[FOR_COND_CLEANUP:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.cond.cleanup:			; CHECK: for.cond.cleanup:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: br i1 undef, label [[FOR_BODY]], label [[FOR_COND_CLEANUP]], !llvm.loop [[LOOP19:![0-9]+]]			; CHECK-NEXT: br i1 undef, label [[FOR_BODY]], label [[FOR_COND_CLEANUP]], !llvm.loop [[LOOP19:![0-9]+]]
	▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP0:%.*]] = mul i64 [[INDEX]], 3			; CHECK-NEXT: [[TMP0:%.*]] = mul i64 [[INDEX]], 3
	; CHECK-NEXT: [[NEXT_GEP:%.]] = getelementptr i32, i32 [[A:%.*]], i64 [[TMP0]]			; CHECK-NEXT: [[NEXT_GEP:%.]] = getelementptr i32, i32 [[A:%.*]], i64 [[TMP0]]
	; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[NEXT_GEP]] to <12 x i32>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[NEXT_GEP]] to <12 x i32>*
	; CHECK-NEXT: [[WIDE_VEC:%.]] = load <12 x i32>, <12 x i32> [[TMP1]], align 4			; CHECK-NEXT: [[WIDE_VEC:%.]] = load <12 x i32>, <12 x i32> [[TMP1]], align 4
	; CHECK-NEXT: [[STRIDED_VEC:%.*]] = shufflevector <12 x i32> [[WIDE_VEC]], <12 x i32> poison, <4 x i32> <i32 0, i32 3, i32 6, i32 9>			; CHECK-NEXT: [[STRIDED_VEC:%.*]] = shufflevector <12 x i32> [[WIDE_VEC]], <12 x i32> poison, <4 x i32> <i32 0, i32 3, i32 6, i32 9>
	; CHECK-NEXT: [[STRIDED_VEC2:%.*]] = shufflevector <12 x i32> [[WIDE_VEC]], <12 x i32> poison, <4 x i32> <i32 1, i32 4, i32 7, i32 10>			; CHECK-NEXT: [[STRIDED_VEC2:%.*]] = shufflevector <12 x i32> [[WIDE_VEC]], <12 x i32> poison, <4 x i32> <i32 1, i32 4, i32 7, i32 10>
	; CHECK-NEXT: [[STRIDED_VEC3:%.*]] = shufflevector <12 x i32> [[WIDE_VEC]], <12 x i32> poison, <4 x i32> <i32 2, i32 5, i32 8, i32 11>			; CHECK-NEXT: [[STRIDED_VEC3:%.*]] = shufflevector <12 x i32> [[WIDE_VEC]], <12 x i32> poison, <4 x i32> <i32 2, i32 5, i32 8, i32 11>
	; CHECK-NEXT: [[TMP2:%.*]] = add <4 x i32> [[STRIDED_VEC]], [[VEC_IND]]			; CHECK-NEXT: [[TMP2:%.*]] = add <4 x i32> [[STRIDED_VEC]], [[VEC_IND]]
	; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[NEXT_GEP]], i64 2			; CHECK-NEXT: [[TMP3:%.*]] = add <4 x i32> [[STRIDED_VEC2]], [[VEC_IND]]
	; CHECK-NEXT: [[TMP4:%.*]] = add <4 x i32> [[STRIDED_VEC2]], [[VEC_IND]]			; CHECK-NEXT: [[TMP4:%.*]] = add <4 x i32> [[STRIDED_VEC3]], [[VEC_IND]]
	; CHECK-NEXT: [[TMP5:%.*]] = add <4 x i32> [[STRIDED_VEC3]], [[VEC_IND]]			; CHECK-NEXT: [[TMP5:%.]] = bitcast i32 [[NEXT_GEP]] to <12 x i32>*
	; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[TMP3]], i64 -2			; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> [[TMP3]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
	; CHECK-NEXT: [[TMP7:%.]] = bitcast i32 [[TMP6]] to <12 x i32>*			; CHECK-NEXT: [[TMP7:%.*]] = shufflevector <4 x i32> [[TMP4]], <4 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> [[TMP4]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>			; CHECK-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <8 x i32> [[TMP6]], <8 x i32> [[TMP7]], <12 x i32> <i32 0, i32 4, i32 8, i32 1, i32 5, i32 9, i32 2, i32 6, i32 10, i32 3, i32 7, i32 11>
	; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: store <12 x i32> [[INTERLEAVED_VEC]], <12 x i32>* [[TMP5]], align 4
	; CHECK-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <8 x i32> [[TMP8]], <8 x i32> [[TMP9]], <12 x i32> <i32 0, i32 4, i32 8, i32 1, i32 5, i32 9, i32 2, i32 6, i32 10, i32 3, i32 7, i32 11>
	; CHECK-NEXT: store <12 x i32> [[INTERLEAVED_VEC]], <12 x i32>* [[TMP7]], align 4
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
	; CHECK-NEXT: [[VEC_IND_NEXT]] = add <4 x i32> [[VEC_IND]], <i32 4, i32 4, i32 4, i32 4>			; CHECK-NEXT: [[VEC_IND_NEXT]] = add <4 x i32> [[VEC_IND]], <i32 4, i32 4, i32 4, i32 4>
	; CHECK-NEXT: [[TMP10:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1024			; CHECK-NEXT: [[TMP8:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1024
	; CHECK-NEXT: br i1 [[TMP10]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP20:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP8]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP20:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: br i1 true, label [[FOR_COND_CLEANUP:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 true, label [[FOR_COND_CLEANUP:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.cond.cleanup:			; CHECK: for.cond.cleanup:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: br i1 undef, label [[FOR_COND_CLEANUP]], label [[FOR_BODY]], !llvm.loop [[LOOP21:![0-9]+]]			; CHECK-NEXT: br i1 undef, label [[FOR_COND_CLEANUP]], label [[FOR_BODY]], !llvm.loop [[LOOP21:![0-9]+]]
	▲ Show 20 Lines • Show All 528 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[BROADCAST_SPLAT2:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT1]], <4 x i32> poison, <4 x i32> zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT2:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT1]], <4 x i32> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[OFFSET_IDX:%.*]] = shl i64 [[INDEX]], 1			; CHECK-NEXT: [[OFFSET_IDX:%.*]] = shl i64 [[INDEX]], 1
	; CHECK-NEXT: [[TMP3:%.*]] = or i64 [[OFFSET_IDX]], 2			; CHECK-NEXT: [[TMP3:%.*]] = or i64 [[OFFSET_IDX]], 2
	; CHECK-NEXT: [[TMP4:%.*]] = or i64 [[OFFSET_IDX]], 4			; CHECK-NEXT: [[TMP4:%.*]] = or i64 [[OFFSET_IDX]], 4
	; CHECK-NEXT: [[TMP5:%.*]] = or i64 [[OFFSET_IDX]], 6			; CHECK-NEXT: [[TMP5:%.*]] = or i64 [[OFFSET_IDX]], 6
	; CHECK-NEXT: [[TMP6:%.*]] = or i64 [[OFFSET_IDX]], 1			; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i64 [[OFFSET_IDX]]
	; CHECK-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i64 [[OFFSET_IDX]]			; CHECK-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[TMP3]]
	; CHECK-NEXT: [[TMP8:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[TMP3]]			; CHECK-NEXT: [[TMP8:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[TMP4]]
	; CHECK-NEXT: [[TMP9:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[TMP4]]			; CHECK-NEXT: [[TMP9:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[TMP5]]
	; CHECK-NEXT: [[TMP10:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[TMP5]]			; CHECK-NEXT: store i32 [[X:%.]], i32 [[TMP6]], align 4
	; CHECK-NEXT: [[TMP11:%.]] = getelementptr inbounds i32, i32 [[A]], i64 -1			; CHECK-NEXT: store i32 [[X]], i32* [[TMP7]], align 4
	; CHECK-NEXT: store i32 [[X:%.]], i32 [[TMP7]], align 4
	; CHECK-NEXT: store i32 [[X]], i32* [[TMP8]], align 4			; CHECK-NEXT: store i32 [[X]], i32* [[TMP8]], align 4
	; CHECK-NEXT: store i32 [[X]], i32* [[TMP9]], align 4			; CHECK-NEXT: store i32 [[X]], i32* [[TMP9]], align 4
	; CHECK-NEXT: store i32 [[X]], i32* [[TMP10]], align 4			; CHECK-NEXT: [[TMP10:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[OFFSET_IDX]]
	; CHECK-NEXT: [[TMP12:%.]] = getelementptr inbounds i32, i32 [[TMP11]], i64 [[TMP6]]			; CHECK-NEXT: [[TMP11:%.]] = bitcast i32 [[TMP10]] to <8 x i32>*
	; CHECK-NEXT: [[TMP13:%.]] = bitcast i32 [[TMP12]] to <8 x i32>*
	; CHECK-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLAT]], <4 x i32> [[BROADCAST_SPLAT2]], <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7>			; CHECK-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLAT]], <4 x i32> [[BROADCAST_SPLAT2]], <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7>
	; CHECK-NEXT: store <8 x i32> [[INTERLEAVED_VEC]], <8 x i32>* [[TMP13]], align 4			; CHECK-NEXT: store <8 x i32> [[INTERLEAVED_VEC]], <8 x i32>* [[TMP11]], align 4
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
	; CHECK-NEXT: [[TMP14:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP12:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP14]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP32:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP12]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP32:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP2]], [[N_VEC]]			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP2]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[I:%.]] = phi i64 [ [[I_NEXT:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]			; CHECK-NEXT: [[I:%.]] = phi i64 [ [[I_NEXT:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
	▲ Show 20 Lines • Show All 266 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Fold (gep (oneuse(gep Ptr, Idx0)), Idx1) -> (gep Ptr, (add Idx0, Idx1)) (PR51069)ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 360752

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp

llvm/test/Transforms/InstCombine/gep-combine-loop-invariant.ll

llvm/test/Transforms/InstCombine/gep-custom-dl.ll

llvm/test/Transforms/InstCombine/getelementptr.ll

llvm/test/Transforms/InstCombine/select-gep.ll

llvm/test/Transforms/InstCombine/shift.ll

llvm/test/Transforms/LoopVectorize/AArch64/sve-vector-reverse.ll

llvm/test/Transforms/LoopVectorize/AArch64/vector-reverse-mask4.ll

llvm/test/Transforms/LoopVectorize/ARM/mve-reduction-predselect.ll

llvm/test/Transforms/LoopVectorize/ARM/mve-reductions.ll

llvm/test/Transforms/LoopVectorize/X86/x86-interleaved-accesses-masked-group.ll

llvm/test/Transforms/LoopVectorize/consecutive-ptr-uniforms.ll

llvm/test/Transforms/LoopVectorize/interleaved-accesses.ll

[InstCombine] Fold (gep (oneuse(gep Ptr, Idx0)), Idx1) -> (gep Ptr, (add Idx0, Idx1)) (PR51069)
ClosedPublic