This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
2/2
SLPVectorizer.cpp
-
test/Transforms/SLPVectorizer/
-
Transforms/
-
SLPVectorizer/
-
AArch64/
-
ext-trunc.ll
-
gather-reduce.ll
-
getelementptr.ll
-
SystemZ/
-
gep-indices.ll
-
X86/
-
load-merge-inseltpoison.ll
-
load-merge.ll
-
minimum-sizes.ll
-
opaque-ptr.ll
-
partail.ll

Differential D144128

[SLP] Check with target before vectorizing GEP Indices
ClosedPublic

Authored by jonpa on Feb 15 2023, 10:59 AM.

Download Raw Diff

Details

Reviewers

uweigand
ABataev
dmgreen
RKSimon
spatel
fhahn
SjoerdMeijer

Commits

rG1387a13e1d0b: [SLP] Check with target before vectorizing GEP Indices.

Summary

The target hook prefersVectorizedAddressing() already exists to check with target if address computations should be vectorized, so it seems like this could be used in SLPVectorizer as well.

This gives some changes on SystemZ, but it doesn't seem to matter much (on SPEC). Some ~100 less (expensive) extractions into address registers.

Some test changes on AArch64 and X86, have not looked into them, but it seems these subtargets now change when "gather" is unsupported, which makes sense, I think.

Diff Detail

Event Timeline

jonpa created this revision.Feb 15 2023, 10:59 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 15 2023, 10:59 AM

Herald added subscribers: vporpo, ctetreau, pengfei and 2 others. · View Herald Transcript

jonpa requested review of this revision.Feb 15 2023, 10:59 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 15 2023, 10:59 AM

Herald added subscribers: llvm-commits, • pcwang-thead. · View Herald Transcript

Harbormaster completed remote builds in B213950: Diff 497745.Feb 15 2023, 12:02 PM

Can you try to move this check to buildTree_rec function, NotProfitableForVectorization lambda and make it return true if S.getOpcode() == Instruction::GetElementPtr && !TTI->prefersVectorizedAddressing()?

Can you try to move this check to buildTree_rec function, NotProfitableForVectorization lambda and make it return true if S.getOpcode() == Instruction::GetElementPtr && !TTI->prefersVectorizedAddressing()?

I added this, and it did seem to have some effect, although just two additional extractions less on SystemZ/SPEC.

ABataev added inline comments.Feb 16 2023, 5:44 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
11605–11612	Could you try to remove this change and keep just the change in NotProfitableForVectorization?

jonpa marked an inline comment as done.Feb 16 2023, 5:52 AM

jonpa added inline comments.

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
11605–11612	sorry - yeah I tried that, but then those changes dissapeared, so it seems that these two points do not overlap.

Could you check the throughput of (some of) the tests with and without your changes in godbolt?

Harbormaster completed remote builds in B214139: Diff 497990.Feb 16 2023, 7:03 AM

In D144128#4131910, @ABataev wrote:

Could you check the throughput of (some of) the tests with and without your changes in godbolt?

Hmm, that sounds interesting, but I don't quite know how to go about that. Is godbolt supposed to show cycles somehow?

On SystemZ, this is a rather simple issue as extractions from vector registers into GPRs is considered expensive, so therefore it is basically always wrong to vectorize address computations. I was hoping that other targets could inspect their test changes and from that decide if it looks ok...?

ABataev added reviewers: dmgreen, RKSimon, spatel.Feb 16 2023, 9:32 AM

Herald added a subscriber: StephenFan. · View Herald TranscriptFeb 16 2023, 9:32 AM

The X86 changes look fine, although it would be better if we had more avx512 (or avx2 with fast-gather) test coverage

I think this change is OK for AArch64 too, I don't think it will change much in practice. Some of the tests may not be testing what they did in the past though.

This revision is now accepted and ready to land.Feb 20 2023, 4:31 AM

RKSimon mentioned this in rGd9bceeedbf0f: [SLP][X86] load-merge.ll - add AVX512 test coverage.Feb 20 2023, 3:21 PM

RKSimon mentioned this in rG2ca266dc1aa3: [SLP][X86] minimum-sizes.ll - add AVX512 test coverage.Feb 20 2023, 3:32 PM

@jonpa I've tried to improve x86 gather test coverage - please can you rebase?

Patch rebased with updated X86 tests. minimum-sizes.ll did no longer update automatically with "CHECK" prefix, so I added different prefixes to make the update succeed.

Harbormaster completed remote builds in B215026: Diff 499177.Feb 21 2023, 9:40 AM

LGTM - cheers

This revision was landed with ongoing or failed builds.Feb 23 2023, 6:32 AM

Closed by commit rG1387a13e1d0b: [SLP] Check with target before vectorizing GEP Indices. (authored by jonpa). · Explain Why

This revision was automatically updated to reflect the committed changes.

jonpa added a commit: rG1387a13e1d0b: [SLP] Check with target before vectorizing GEP Indices..

Hi this patch is causing some regressions.
If you look at the example I attached the code sequence is no longer being vectorised when it is beneficial to do so.
https://godbolt.org/z/MTex1z73K

Can you take a look please?

In D144128#4199448, @zjaffal wrote:

Hi this patch is causing some regressions.
If you look at the example I attached the code sequence is no longer being vectorised when it is beneficial to do so.
https://godbolt.org/z/MTex1z73K

Can you take a look please?

To add to what @zjaffal said, I am curious why the current cost-modeling wouldn't be sufficient to prevent cases where vectorizing GEPs would be not profitable?

The case @zjaffal shared requires a more costly vector GEP, but the cost is offset by the benefits from vectorizing the rest of the tree.

In D144128#4199494, @fhahn wrote:

In D144128#4199448, @zjaffal wrote:

Hi this patch is causing some regressions.
If you look at the example I attached the code sequence is no longer being vectorised when it is beneficial to do so.
https://godbolt.org/z/MTex1z73K

Can you take a look please?

To add to what @zjaffal said, I am curious why the current cost-modeling wouldn't be sufficient to prevent cases where vectorizing GEPs would be not profitable?

The case @zjaffal shared requires a more costly vector GEP, but the cost is offset by the benefits from vectorizing the rest of the tree.

Sorry about your regression. I looked for a runline in the "godbolt" link, but could not find one.

As I recall, there were the basic idea here to not look at GEPs in collectSeedInstructions() for a target that does like to vectorize address computations. On SystemZ, this would mean having to use an expensive vector element extract instruction, which is never really a good idea. Since the tuning of the vectorizers with cost functions is far from being perfect, I think it's good to add a broad general heuristic if possible to avoid doing "bad things".

I added this check in collectSeedInstructions(), but during review it was later then also added in NotProfitableForVectorization(). I am curious if it helps your problem to remove the change in the latter function?

This patch is not super important for SystemZ, but rather just "probably a good idea". Not much code change. It would be kind of ok to revert it if it really causes a regression. On the other hand, perhaps you should consider returning false from prefersVectorizedAddressing(), if you don't generally prefer this heuristic?

In D144128#4202886, @jonpa wrote:

Sorry about your regression. I looked for a runline in the "godbolt" link, but could not find one.

I think it is just opt -passes=slp-vectorizer.

As I recall, there were the basic idea here to not look at GEPs in collectSeedInstructions() for a target that does like to vectorize address computations. On SystemZ, this would mean having to use an expensive vector element extract instruction, which is never really a good idea. Since the tuning of the vectorizers with cost functions is far from being perfect, I think it's good to add a broad general heuristic if possible to avoid doing "bad things".

I added this check in collectSeedInstructions(), but during review it was later then also added in NotProfitableForVectorization(). I am curious if it helps your problem to remove the change in the latter function?

This patch is not super important for SystemZ, but rather just "probably a good idea". Not much code change. It would be kind of ok to revert it if it really causes a regression. On the other hand, perhaps you should consider returning false from prefersVectorizedAddressing(), if you don't generally prefer this heuristic?

Thanks for elaborating the motivation! IMO it would be slightly preferable to adjust the cost on SystemZ to reflect an accurate cost (and prevent SLP from using it if the whole tree isn't profitable over all). One issue with using prefersVectorizedAddressing seems to be that it indicates a preference (and is used as such in other uses), whereas here it is used to not even attempt using vector addresses if it is the only option. The AArch64 backend in general prefers non-vector addresses if possible, but they are costed relatively accurately and can be beneficial if the larger tree offsets the cost.

So unfortunately changing prefersVectorizedAddressing would potentially have undesirable side-effects. It might be better to revert the commit for now and then recommit it while avoiding the regressions. While I don't have any performance data to back it up I think that X86 would likely be impacted as well by a similar issue to AArch64.

zjaffal mentioned this in D146540: [SLP] Add test to check for GEP vectorization.Mar 21 2023, 9:21 AM

It would be ok for me if you reverted it while investigating regressions. I would hope then that you would try the "partial revert" meaning removing just the change in NotProfitableForVectorization().

Since targets are different and cost tuning is coarse and difficult, maybe it would make sense to change this hook so that there are different levels of preference. It should be possible to use it as is, but for your targets you could keep using it but only where you wanted to...

zjaffal mentioned this in rG984b46e6cc2a: [SLP] Add test to check for GEP vectorization.Mar 27 2023, 9:53 AM

fhahn added a reverting change: rG417fe52e6fb4: Revert "[SLP] Check with target before vectorizing GEP Indices.".Mar 28 2023, 12:07 AM

In D144128#4220571, @jonpa wrote:

It would be ok for me if you reverted it while investigating regressions. I would hope then that you would try the "partial revert" meaning removing just the change in NotProfitableForVectorization().

Unfortunately that didn't help, so I went ahead and reverted the whole commit for now.

Since targets are different and cost tuning is coarse and difficult, maybe it would make sense to change this hook so that there are different levels of preference. It should be possible to use it as is, but for your targets you could keep using it but only where you wanted to...

Sounds reasonable to me! Another issue that might be worth checking out may be why llvm/test/Transforms/SLPVectorizer/AArch64/vector-getelementptr.ll isn't vectorized except for the vector-gep when those GEPs are not part of the seed instructions .

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

SLPVectorizer.cpp

25 lines

test/

Transforms/

SLPVectorizer/

AArch64/

ext-trunc.ll

51 lines

gather-reduce.ll

500 lines

getelementptr.ll

189 lines

SystemZ/

gep-indices.ll

48 lines

X86/

load-merge-inseltpoison.ll

50 lines

50 lines

59 lines

33 lines

46 lines

Diff 497990

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,070 Lines • ▼ Show 20 Lines	void BoUpSLP::buildTree_rec(ArrayRef<Value *> VL, unsigned Depth,
// If we deal with insert/extract instructions, they all must have constant		// If we deal with insert/extract instructions, they all must have constant
// indices, otherwise we should gather them, not try to vectorize.		// indices, otherwise we should gather them, not try to vectorize.
// If alternate op node with 2 elements with gathered operands - do not		// If alternate op node with 2 elements with gathered operands - do not
// vectorize.		// vectorize.
auto &&NotProfitableForVectorization = [&S, this,		auto &&NotProfitableForVectorization = [&S, this,
Depth](ArrayRef<Value *> VL) {		Depth](ArrayRef<Value *> VL) {
if (!S.getOpcode() \|\| !S.isAltShuffle() \|\| VL.size() > 2)		if (!S.getOpcode() \|\| !S.isAltShuffle() \|\| VL.size() > 2)
return false;		return false;
		if (S.getOpcode() == Instruction::GetElementPtr &&
		!TTI->prefersVectorizedAddressing())
		return true;
if (VectorizableTree.size() < MinTreeSize)		if (VectorizableTree.size() < MinTreeSize)
return false;		return false;
if (Depth >= RecursionMaxDepth - 1)		if (Depth >= RecursionMaxDepth - 1)
return true;		return true;
// Check if all operands are extracts, part of vector node or can build a		// Check if all operands are extracts, part of vector node or can build a
// regular vectorize node.		// regular vectorize node.
SmallVector<unsigned, 2> InstsCount(VL.size(), 0);		SmallVector<unsigned, 2> InstsCount(VL.size(), 0);
for (Value *V : VL) {		for (Value *V : VL) {
▲ Show 20 Lines • Show All 6,507 Lines • ▼ Show 20 Lines	for (Instruction &I : *BB) {
// Ignore store instructions that are volatile or have a pointer operand		// Ignore store instructions that are volatile or have a pointer operand
// that doesn't point to a scalar type.		// that doesn't point to a scalar type.
if (auto *SI = dyn_cast<StoreInst>(&I)) {		if (auto *SI = dyn_cast<StoreInst>(&I)) {
if (!SI->isSimple())		if (!SI->isSimple())
continue;		continue;
if (!isValidElementType(SI->getValueOperand()->getType()))		if (!isValidElementType(SI->getValueOperand()->getType()))
continue;		continue;
Stores[getUnderlyingObject(SI->getPointerOperand())].push_back(SI);		Stores[getUnderlyingObject(SI->getPointerOperand())].push_back(SI);
		continue;
}		}

// Ignore getelementptr instructions that have more than one index, a		// Ignore getelementptr instructions that have more than one index, a
// constant index, or a pointer operand that doesn't point to a scalar		// constant index, or a pointer operand that doesn't point to a scalar
// type.		// type.
else if (auto *GEP = dyn_cast<GetElementPtrInst>(&I)) {		if (TTI->prefersVectorizedAddressing())
		if (auto *GEP = dyn_cast<GetElementPtrInst>(&I)) {
		ABataevUnsubmitted Done Reply Inline Actions Could you try to remove this change and keep just the change in NotProfitableForVectorization? ABataev: Could you try to remove this change and keep just the change in NotProfitableForVectorization?
		jonpaAuthorUnsubmitted Done Reply Inline Actions sorry - yeah I tried that, but then those changes dissapeared, so it seems that these two points do not overlap. jonpa: sorry - yeah I tried that, but then those changes dissapeared, so it seems that these two…
auto Idx = GEP->idx_begin()->get();		auto Idx = GEP->idx_begin()->get();
if (GEP->getNumIndices() > 1 \|\| isa<Constant>(Idx))		if (GEP->getNumIndices() > 1 \|\| isa<Constant>(Idx))
continue;		continue;
if (!isValidElementType(Idx->getType()))		if (!isValidElementType(Idx->getType()))
continue;		continue;
if (GEP->getType()->isVectorTy())		if (GEP->getType()->isVectorTy())
continue;		continue;
GEPs[GEP->getPointerOperand()].push_back(GEP);		GEPs[GEP->getPointerOperand()].push_back(GEP);
}		}
}		}
}		}

bool SLPVectorizerPass::tryToVectorizePair(Value A, Value B, BoUpSLP &R) {		bool SLPVectorizerPass::tryToVectorizePair(Value A, Value B, BoUpSLP &R) {
if (!A \|\| !B)		if (!A \|\| !B)
return false;		return false;
if (isa<InsertElementInst>(A) \|\| isa<InsertElementInst>(B))		if (isa<InsertElementInst>(A) \|\| isa<InsertElementInst>(B))
return false;		return false;
▲ Show 20 Lines • Show All 2,348 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/ext-trunc.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -S -passes=slp-vectorizer -mtriple=aarch64--linux-gnu < %s \| FileCheck %s			; RUN: opt -S -passes=slp-vectorizer -mtriple=aarch64--linux-gnu < %s \| FileCheck %s

	target datalayout = "e-m:e-i32:64-i128:128-n32:64-S128"			target datalayout = "e-m:e-i32:64-i128:128-n32:64-S128"

	declare void @foo(i64, i64, i64, i64)			declare void @foo(i64, i64, i64, i64)

	define void @test1(<4 x i16> %a, <4 x i16> %b, ptr %p) {			define void @test1(<4 x i16> %a, <4 x i16> %b, ptr %p) {
	; Make sure types of sub and its sources are not extended.			; Make sure types of sub and its sources are not extended.
	; CHECK-LABEL: @test1(			; CHECK-LABEL: @test1(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[Z0:%.]] = zext <4 x i16> [[A:%.]] to <4 x i32>			; CHECK-NEXT: [[Z0:%.]] = zext <4 x i16> [[A:%.]] to <4 x i32>
	; CHECK-NEXT: [[Z1:%.]] = zext <4 x i16> [[B:%.]] to <4 x i32>			; CHECK-NEXT: [[Z1:%.]] = zext <4 x i16> [[B:%.]] to <4 x i32>
	; CHECK-NEXT: [[SUB0:%.*]] = sub <4 x i32> [[Z0]], [[Z1]]			; CHECK-NEXT: [[SUB0:%.*]] = sub <4 x i32> [[Z0]], [[Z1]]
	; CHECK-NEXT: [[TMP0:%.*]] = sext <4 x i32> [[SUB0]] to <4 x i64>			; CHECK-NEXT: [[E0:%.*]] = extractelement <4 x i32> [[SUB0]], i32 0
	; CHECK-NEXT: [[TMP1:%.*]] = extractelement <4 x i64> [[TMP0]], i32 0			; CHECK-NEXT: [[S0:%.*]] = sext i32 [[E0]] to i64
	; CHECK-NEXT: [[GEP0:%.]] = getelementptr inbounds i64, ptr [[P:%.]], i64 [[TMP1]]			; CHECK-NEXT: [[GEP0:%.]] = getelementptr inbounds i64, ptr [[P:%.]], i64 [[S0]]
	; CHECK-NEXT: [[LOAD0:%.*]] = load i64, ptr [[GEP0]], align 4			; CHECK-NEXT: [[LOAD0:%.*]] = load i64, ptr [[GEP0]], align 4
	; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x i64> [[TMP0]], i32 1			; CHECK-NEXT: [[E1:%.*]] = extractelement <4 x i32> [[SUB0]], i32 1
	; CHECK-NEXT: [[GEP1:%.*]] = getelementptr inbounds i64, ptr [[P]], i64 [[TMP2]]			; CHECK-NEXT: [[S1:%.*]] = sext i32 [[E1]] to i64
				; CHECK-NEXT: [[GEP1:%.*]] = getelementptr inbounds i64, ptr [[P]], i64 [[S1]]
	; CHECK-NEXT: [[LOAD1:%.*]] = load i64, ptr [[GEP1]], align 4			; CHECK-NEXT: [[LOAD1:%.*]] = load i64, ptr [[GEP1]], align 4
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x i64> [[TMP0]], i32 2			; CHECK-NEXT: [[E2:%.*]] = extractelement <4 x i32> [[SUB0]], i32 2
	; CHECK-NEXT: [[GEP2:%.*]] = getelementptr inbounds i64, ptr [[P]], i64 [[TMP3]]			; CHECK-NEXT: [[S2:%.*]] = sext i32 [[E2]] to i64
				; CHECK-NEXT: [[GEP2:%.*]] = getelementptr inbounds i64, ptr [[P]], i64 [[S2]]
	; CHECK-NEXT: [[LOAD2:%.*]] = load i64, ptr [[GEP2]], align 4			; CHECK-NEXT: [[LOAD2:%.*]] = load i64, ptr [[GEP2]], align 4
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP0]], i32 3			; CHECK-NEXT: [[E3:%.*]] = extractelement <4 x i32> [[SUB0]], i32 3
	; CHECK-NEXT: [[GEP3:%.*]] = getelementptr inbounds i64, ptr [[P]], i64 [[TMP4]]			; CHECK-NEXT: [[S3:%.*]] = sext i32 [[E3]] to i64
				; CHECK-NEXT: [[GEP3:%.*]] = getelementptr inbounds i64, ptr [[P]], i64 [[S3]]
	; CHECK-NEXT: [[LOAD3:%.*]] = load i64, ptr [[GEP3]], align 4			; CHECK-NEXT: [[LOAD3:%.*]] = load i64, ptr [[GEP3]], align 4
	; CHECK-NEXT: call void @foo(i64 [[LOAD0]], i64 [[LOAD1]], i64 [[LOAD2]], i64 [[LOAD3]])			; CHECK-NEXT: call void @foo(i64 [[LOAD0]], i64 [[LOAD1]], i64 [[LOAD2]], i64 [[LOAD3]])
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%z0 = zext <4 x i16> %a to <4 x i32>			%z0 = zext <4 x i16> %a to <4 x i32>
	%z1 = zext <4 x i16> %b to <4 x i32>			%z1 = zext <4 x i16> %b to <4 x i32>
	%sub0 = sub <4 x i32> %z0, %z1			%sub0 = sub <4 x i32> %z0, %z1
	Show All 18 Lines
	}			}

	define void @test2(<4 x i16> %a, <4 x i16> %b, i64 %c0, i64 %c1, i64 %c2, i64 %c3, ptr %p) {			define void @test2(<4 x i16> %a, <4 x i16> %b, i64 %c0, i64 %c1, i64 %c2, i64 %c3, ptr %p) {
	; CHECK-LABEL: @test2(			; CHECK-LABEL: @test2(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[Z0:%.]] = zext <4 x i16> [[A:%.]] to <4 x i32>			; CHECK-NEXT: [[Z0:%.]] = zext <4 x i16> [[A:%.]] to <4 x i32>
	; CHECK-NEXT: [[Z1:%.]] = zext <4 x i16> [[B:%.]] to <4 x i32>			; CHECK-NEXT: [[Z1:%.]] = zext <4 x i16> [[B:%.]] to <4 x i32>
	; CHECK-NEXT: [[SUB0:%.*]] = sub <4 x i32> [[Z0]], [[Z1]]			; CHECK-NEXT: [[SUB0:%.*]] = sub <4 x i32> [[Z0]], [[Z1]]
	; CHECK-NEXT: [[TMP0:%.*]] = sext <4 x i32> [[SUB0]] to <4 x i64>			; CHECK-NEXT: [[E0:%.*]] = extractelement <4 x i32> [[SUB0]], i32 0
	; CHECK-NEXT: [[TMP1:%.]] = insertelement <4 x i64> poison, i64 [[C0:%.]], i32 0			; CHECK-NEXT: [[S0:%.*]] = sext i32 [[E0]] to i64
	; CHECK-NEXT: [[TMP2:%.]] = insertelement <4 x i64> [[TMP1]], i64 [[C1:%.]], i32 1			; CHECK-NEXT: [[A0:%.]] = add i64 [[S0]], [[C0:%.]]
	; CHECK-NEXT: [[TMP3:%.]] = insertelement <4 x i64> [[TMP2]], i64 [[C2:%.]], i32 2			; CHECK-NEXT: [[GEP0:%.]] = getelementptr inbounds i64, ptr [[P:%.]], i64 [[A0]]
	; CHECK-NEXT: [[TMP4:%.]] = insertelement <4 x i64> [[TMP3]], i64 [[C3:%.]], i32 3
	; CHECK-NEXT: [[TMP5:%.*]] = add <4 x i64> [[TMP0]], [[TMP4]]
	; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP5]], i32 0
	; CHECK-NEXT: [[GEP0:%.]] = getelementptr inbounds i64, ptr [[P:%.]], i64 [[TMP6]]
	; CHECK-NEXT: [[LOAD0:%.*]] = load i64, ptr [[GEP0]], align 4			; CHECK-NEXT: [[LOAD0:%.*]] = load i64, ptr [[GEP0]], align 4
	; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP5]], i32 1			; CHECK-NEXT: [[E1:%.*]] = extractelement <4 x i32> [[SUB0]], i32 1
	; CHECK-NEXT: [[GEP1:%.*]] = getelementptr inbounds i64, ptr [[P]], i64 [[TMP7]]			; CHECK-NEXT: [[S1:%.*]] = sext i32 [[E1]] to i64
				; CHECK-NEXT: [[A1:%.]] = add i64 [[S1]], [[C1:%.]]
				; CHECK-NEXT: [[GEP1:%.*]] = getelementptr inbounds i64, ptr [[P]], i64 [[A1]]
	; CHECK-NEXT: [[LOAD1:%.*]] = load i64, ptr [[GEP1]], align 4			; CHECK-NEXT: [[LOAD1:%.*]] = load i64, ptr [[GEP1]], align 4
	; CHECK-NEXT: [[TMP8:%.*]] = extractelement <4 x i64> [[TMP5]], i32 2			; CHECK-NEXT: [[E2:%.*]] = extractelement <4 x i32> [[SUB0]], i32 2
	; CHECK-NEXT: [[GEP2:%.*]] = getelementptr inbounds i64, ptr [[P]], i64 [[TMP8]]			; CHECK-NEXT: [[S2:%.*]] = sext i32 [[E2]] to i64
				; CHECK-NEXT: [[A2:%.]] = add i64 [[S2]], [[C2:%.]]
				; CHECK-NEXT: [[GEP2:%.*]] = getelementptr inbounds i64, ptr [[P]], i64 [[A2]]
	; CHECK-NEXT: [[LOAD2:%.*]] = load i64, ptr [[GEP2]], align 4			; CHECK-NEXT: [[LOAD2:%.*]] = load i64, ptr [[GEP2]], align 4
	; CHECK-NEXT: [[TMP9:%.*]] = extractelement <4 x i64> [[TMP5]], i32 3			; CHECK-NEXT: [[E3:%.*]] = extractelement <4 x i32> [[SUB0]], i32 3
	; CHECK-NEXT: [[GEP3:%.*]] = getelementptr inbounds i64, ptr [[P]], i64 [[TMP9]]			; CHECK-NEXT: [[S3:%.*]] = sext i32 [[E3]] to i64
				; CHECK-NEXT: [[A3:%.]] = add i64 [[S3]], [[C3:%.]]
				; CHECK-NEXT: [[GEP3:%.*]] = getelementptr inbounds i64, ptr [[P]], i64 [[A3]]
	; CHECK-NEXT: [[LOAD3:%.*]] = load i64, ptr [[GEP3]], align 4			; CHECK-NEXT: [[LOAD3:%.*]] = load i64, ptr [[GEP3]], align 4
	; CHECK-NEXT: call void @foo(i64 [[LOAD0]], i64 [[LOAD1]], i64 [[LOAD2]], i64 [[LOAD3]])			; CHECK-NEXT: call void @foo(i64 [[LOAD0]], i64 [[LOAD1]], i64 [[LOAD2]], i64 [[LOAD3]])
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%z0 = zext <4 x i16> %a to <4 x i32>			%z0 = zext <4 x i16> %a to <4 x i32>
	%z1 = zext <4 x i16> %b to <4 x i32>			%z1 = zext <4 x i16> %b to <4 x i32>
	%sub0 = sub <4 x i32> %z0, %z1			%sub0 = sub <4 x i32> %z0, %z1
	Show All 23 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/gather-reduce.ll

	Show All 30 Lines
	; GENERIC-NEXT: br label [[FOR_COND_CLEANUP]]			; GENERIC-NEXT: br label [[FOR_COND_CLEANUP]]
	; GENERIC: for.cond.cleanup:			; GENERIC: for.cond.cleanup:
	; GENERIC-NEXT: [[SUM_0_LCSSA:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[ADD66:%.]], [[FOR_COND_CLEANUP_LOOPEXIT:%.]] ]			; GENERIC-NEXT: [[SUM_0_LCSSA:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[ADD66:%.]], [[FOR_COND_CLEANUP_LOOPEXIT:%.]] ]
	; GENERIC-NEXT: ret i32 [[SUM_0_LCSSA]]			; GENERIC-NEXT: ret i32 [[SUM_0_LCSSA]]
	; GENERIC: for.body:			; GENERIC: for.body:
	; GENERIC-NEXT: [[I_0103:%.]] = phi i32 [ [[INC:%.]], [[FOR_BODY]] ], [ 0, [[FOR_BODY_PREHEADER]] ]			; GENERIC-NEXT: [[I_0103:%.]] = phi i32 [ [[INC:%.]], [[FOR_BODY]] ], [ 0, [[FOR_BODY_PREHEADER]] ]
	; GENERIC-NEXT: [[SUM_0102:%.*]] = phi i32 [ [[ADD66]], [[FOR_BODY]] ], [ 0, [[FOR_BODY_PREHEADER]] ]			; GENERIC-NEXT: [[SUM_0102:%.*]] = phi i32 [ [[ADD66]], [[FOR_BODY]] ], [ 0, [[FOR_BODY_PREHEADER]] ]
	; GENERIC-NEXT: [[A_ADDR_0101:%.]] = phi ptr [ [[INCDEC_PTR58:%.]], [[FOR_BODY]] ], [ [[A:%.*]], [[FOR_BODY_PREHEADER]] ]			; GENERIC-NEXT: [[A_ADDR_0101:%.]] = phi ptr [ [[INCDEC_PTR58:%.]], [[FOR_BODY]] ], [ [[A:%.*]], [[FOR_BODY_PREHEADER]] ]
	; GENERIC-NEXT: [[INCDEC_PTR58]] = getelementptr inbounds i16, ptr [[A_ADDR_0101]], i64 8			; GENERIC-NEXT: [[INCDEC_PTR:%.*]] = getelementptr inbounds i16, ptr [[A_ADDR_0101]], i64 1
	; GENERIC-NEXT: [[TMP1:%.*]] = load <8 x i16>, ptr [[A_ADDR_0101]], align 2			; GENERIC-NEXT: [[TMP0:%.*]] = load i16, ptr [[A_ADDR_0101]], align 2
	; GENERIC-NEXT: [[TMP2:%.*]] = zext <8 x i16> [[TMP1]] to <8 x i32>			; GENERIC-NEXT: [[CONV:%.*]] = zext i16 [[TMP0]] to i64
	; GENERIC-NEXT: [[TMP4:%.]] = load <8 x i16>, ptr [[B:%.]], align 2			; GENERIC-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds i16, ptr [[B:%.]], i64 1
	; GENERIC-NEXT: [[TMP5:%.*]] = zext <8 x i16> [[TMP4]] to <8 x i32>			; GENERIC-NEXT: [[TMP1:%.*]] = load i16, ptr [[B]], align 2
	; GENERIC-NEXT: [[TMP6:%.*]] = sub nsw <8 x i32> [[TMP2]], [[TMP5]]			; GENERIC-NEXT: [[CONV2:%.*]] = zext i16 [[TMP1]] to i64
	; GENERIC-NEXT: [[TMP7:%.*]] = extractelement <8 x i32> [[TMP6]], i64 0			; GENERIC-NEXT: [[SUB:%.*]] = sub nsw i64 [[CONV]], [[CONV2]]
	; GENERIC-NEXT: [[TMP8:%.*]] = sext i32 [[TMP7]] to i64			; GENERIC-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i16, ptr [[G:%.]], i64 [[SUB]]
	; GENERIC-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i16, ptr [[G:%.]], i64 [[TMP8]]			; GENERIC-NEXT: [[TMP2:%.*]] = load i16, ptr [[ARRAYIDX]], align 2
	; GENERIC-NEXT: [[TMP9:%.*]] = load i16, ptr [[ARRAYIDX]], align 2			; GENERIC-NEXT: [[CONV3:%.*]] = zext i16 [[TMP2]] to i32
	; GENERIC-NEXT: [[CONV3:%.*]] = zext i16 [[TMP9]] to i32
	; GENERIC-NEXT: [[ADD:%.*]] = add nsw i32 [[SUM_0102]], [[CONV3]]			; GENERIC-NEXT: [[ADD:%.*]] = add nsw i32 [[SUM_0102]], [[CONV3]]
	; GENERIC-NEXT: [[TMP10:%.*]] = extractelement <8 x i32> [[TMP6]], i64 1			; GENERIC-NEXT: [[INCDEC_PTR4:%.*]] = getelementptr inbounds i16, ptr [[A_ADDR_0101]], i64 2
	; GENERIC-NEXT: [[TMP11:%.*]] = sext i32 [[TMP10]] to i64			; GENERIC-NEXT: [[TMP3:%.*]] = load i16, ptr [[INCDEC_PTR]], align 2
	; GENERIC-NEXT: [[ARRAYIDX10:%.*]] = getelementptr inbounds i16, ptr [[G]], i64 [[TMP11]]			; GENERIC-NEXT: [[CONV5:%.*]] = zext i16 [[TMP3]] to i64
	; GENERIC-NEXT: [[TMP12:%.*]] = load i16, ptr [[ARRAYIDX10]], align 2			; GENERIC-NEXT: [[INCDEC_PTR6:%.*]] = getelementptr inbounds i16, ptr [[B]], i64 2
	; GENERIC-NEXT: [[CONV11:%.*]] = zext i16 [[TMP12]] to i32			; GENERIC-NEXT: [[TMP4:%.*]] = load i16, ptr [[INCDEC_PTR1]], align 2
				; GENERIC-NEXT: [[CONV7:%.*]] = zext i16 [[TMP4]] to i64
				; GENERIC-NEXT: [[SUB8:%.*]] = sub nsw i64 [[CONV5]], [[CONV7]]
				; GENERIC-NEXT: [[ARRAYIDX10:%.*]] = getelementptr inbounds i16, ptr [[G]], i64 [[SUB8]]
				; GENERIC-NEXT: [[TMP5:%.*]] = load i16, ptr [[ARRAYIDX10]], align 2
				; GENERIC-NEXT: [[CONV11:%.*]] = zext i16 [[TMP5]] to i32
	; GENERIC-NEXT: [[ADD12:%.*]] = add nsw i32 [[ADD]], [[CONV11]]			; GENERIC-NEXT: [[ADD12:%.*]] = add nsw i32 [[ADD]], [[CONV11]]
	; GENERIC-NEXT: [[TMP13:%.*]] = extractelement <8 x i32> [[TMP6]], i64 2			; GENERIC-NEXT: [[INCDEC_PTR13:%.*]] = getelementptr inbounds i16, ptr [[A_ADDR_0101]], i64 3
	; GENERIC-NEXT: [[TMP14:%.*]] = sext i32 [[TMP13]] to i64			; GENERIC-NEXT: [[TMP6:%.*]] = load i16, ptr [[INCDEC_PTR4]], align 2
	; GENERIC-NEXT: [[ARRAYIDX19:%.*]] = getelementptr inbounds i16, ptr [[G]], i64 [[TMP14]]			; GENERIC-NEXT: [[CONV14:%.*]] = zext i16 [[TMP6]] to i64
	; GENERIC-NEXT: [[TMP15:%.*]] = load i16, ptr [[ARRAYIDX19]], align 2			; GENERIC-NEXT: [[INCDEC_PTR15:%.*]] = getelementptr inbounds i16, ptr [[B]], i64 3
	; GENERIC-NEXT: [[CONV20:%.*]] = zext i16 [[TMP15]] to i32			; GENERIC-NEXT: [[TMP7:%.*]] = load i16, ptr [[INCDEC_PTR6]], align 2
				; GENERIC-NEXT: [[CONV16:%.*]] = zext i16 [[TMP7]] to i64
				; GENERIC-NEXT: [[SUB17:%.*]] = sub nsw i64 [[CONV14]], [[CONV16]]
				; GENERIC-NEXT: [[ARRAYIDX19:%.*]] = getelementptr inbounds i16, ptr [[G]], i64 [[SUB17]]
				; GENERIC-NEXT: [[TMP8:%.*]] = load i16, ptr [[ARRAYIDX19]], align 2
				; GENERIC-NEXT: [[CONV20:%.*]] = zext i16 [[TMP8]] to i32
	; GENERIC-NEXT: [[ADD21:%.*]] = add nsw i32 [[ADD12]], [[CONV20]]			; GENERIC-NEXT: [[ADD21:%.*]] = add nsw i32 [[ADD12]], [[CONV20]]
	; GENERIC-NEXT: [[TMP16:%.*]] = extractelement <8 x i32> [[TMP6]], i64 3			; GENERIC-NEXT: [[INCDEC_PTR22:%.*]] = getelementptr inbounds i16, ptr [[A_ADDR_0101]], i64 4
	; GENERIC-NEXT: [[TMP17:%.*]] = sext i32 [[TMP16]] to i64			; GENERIC-NEXT: [[TMP9:%.*]] = load i16, ptr [[INCDEC_PTR13]], align 2
	; GENERIC-NEXT: [[ARRAYIDX28:%.*]] = getelementptr inbounds i16, ptr [[G]], i64 [[TMP17]]			; GENERIC-NEXT: [[CONV23:%.*]] = zext i16 [[TMP9]] to i64
	; GENERIC-NEXT: [[TMP18:%.*]] = load i16, ptr [[ARRAYIDX28]], align 2			; GENERIC-NEXT: [[INCDEC_PTR24:%.*]] = getelementptr inbounds i16, ptr [[B]], i64 4
	; GENERIC-NEXT: [[CONV29:%.*]] = zext i16 [[TMP18]] to i32			; GENERIC-NEXT: [[TMP10:%.*]] = load i16, ptr [[INCDEC_PTR15]], align 2
				; GENERIC-NEXT: [[CONV25:%.*]] = zext i16 [[TMP10]] to i64
				; GENERIC-NEXT: [[SUB26:%.*]] = sub nsw i64 [[CONV23]], [[CONV25]]
				; GENERIC-NEXT: [[ARRAYIDX28:%.*]] = getelementptr inbounds i16, ptr [[G]], i64 [[SUB26]]
				; GENERIC-NEXT: [[TMP11:%.*]] = load i16, ptr [[ARRAYIDX28]], align 2
				; GENERIC-NEXT: [[CONV29:%.*]] = zext i16 [[TMP11]] to i32
	; GENERIC-NEXT: [[ADD30:%.*]] = add nsw i32 [[ADD21]], [[CONV29]]			; GENERIC-NEXT: [[ADD30:%.*]] = add nsw i32 [[ADD21]], [[CONV29]]
	; GENERIC-NEXT: [[TMP19:%.*]] = extractelement <8 x i32> [[TMP6]], i64 4			; GENERIC-NEXT: [[INCDEC_PTR31:%.*]] = getelementptr inbounds i16, ptr [[A_ADDR_0101]], i64 5
	; GENERIC-NEXT: [[TMP20:%.*]] = sext i32 [[TMP19]] to i64			; GENERIC-NEXT: [[TMP12:%.*]] = load i16, ptr [[INCDEC_PTR22]], align 2
	; GENERIC-NEXT: [[ARRAYIDX37:%.*]] = getelementptr inbounds i16, ptr [[G]], i64 [[TMP20]]			; GENERIC-NEXT: [[CONV32:%.*]] = zext i16 [[TMP12]] to i64
	; GENERIC-NEXT: [[TMP21:%.*]] = load i16, ptr [[ARRAYIDX37]], align 2			; GENERIC-NEXT: [[INCDEC_PTR33:%.*]] = getelementptr inbounds i16, ptr [[B]], i64 5
	; GENERIC-NEXT: [[CONV38:%.*]] = zext i16 [[TMP21]] to i32			; GENERIC-NEXT: [[TMP13:%.*]] = load i16, ptr [[INCDEC_PTR24]], align 2
				; GENERIC-NEXT: [[CONV34:%.*]] = zext i16 [[TMP13]] to i64
				; GENERIC-NEXT: [[SUB35:%.*]] = sub nsw i64 [[CONV32]], [[CONV34]]
				; GENERIC-NEXT: [[ARRAYIDX37:%.*]] = getelementptr inbounds i16, ptr [[G]], i64 [[SUB35]]
				; GENERIC-NEXT: [[TMP14:%.*]] = load i16, ptr [[ARRAYIDX37]], align 2
				; GENERIC-NEXT: [[CONV38:%.*]] = zext i16 [[TMP14]] to i32
	; GENERIC-NEXT: [[ADD39:%.*]] = add nsw i32 [[ADD30]], [[CONV38]]			; GENERIC-NEXT: [[ADD39:%.*]] = add nsw i32 [[ADD30]], [[CONV38]]
	; GENERIC-NEXT: [[TMP22:%.*]] = extractelement <8 x i32> [[TMP6]], i64 5			; GENERIC-NEXT: [[INCDEC_PTR40:%.*]] = getelementptr inbounds i16, ptr [[A_ADDR_0101]], i64 6
	; GENERIC-NEXT: [[TMP23:%.*]] = sext i32 [[TMP22]] to i64			; GENERIC-NEXT: [[TMP15:%.*]] = load i16, ptr [[INCDEC_PTR31]], align 2
	; GENERIC-NEXT: [[ARRAYIDX46:%.*]] = getelementptr inbounds i16, ptr [[G]], i64 [[TMP23]]			; GENERIC-NEXT: [[CONV41:%.*]] = zext i16 [[TMP15]] to i64
	; GENERIC-NEXT: [[TMP24:%.*]] = load i16, ptr [[ARRAYIDX46]], align 2			; GENERIC-NEXT: [[INCDEC_PTR42:%.*]] = getelementptr inbounds i16, ptr [[B]], i64 6
	; GENERIC-NEXT: [[CONV47:%.*]] = zext i16 [[TMP24]] to i32			; GENERIC-NEXT: [[TMP16:%.*]] = load i16, ptr [[INCDEC_PTR33]], align 2
				; GENERIC-NEXT: [[CONV43:%.*]] = zext i16 [[TMP16]] to i64
				; GENERIC-NEXT: [[SUB44:%.*]] = sub nsw i64 [[CONV41]], [[CONV43]]
				; GENERIC-NEXT: [[ARRAYIDX46:%.*]] = getelementptr inbounds i16, ptr [[G]], i64 [[SUB44]]
				; GENERIC-NEXT: [[TMP17:%.*]] = load i16, ptr [[ARRAYIDX46]], align 2
				; GENERIC-NEXT: [[CONV47:%.*]] = zext i16 [[TMP17]] to i32
	; GENERIC-NEXT: [[ADD48:%.*]] = add nsw i32 [[ADD39]], [[CONV47]]			; GENERIC-NEXT: [[ADD48:%.*]] = add nsw i32 [[ADD39]], [[CONV47]]
	; GENERIC-NEXT: [[TMP25:%.*]] = extractelement <8 x i32> [[TMP6]], i64 6			; GENERIC-NEXT: [[INCDEC_PTR49:%.*]] = getelementptr inbounds i16, ptr [[A_ADDR_0101]], i64 7
	; GENERIC-NEXT: [[TMP26:%.*]] = sext i32 [[TMP25]] to i64			; GENERIC-NEXT: [[TMP18:%.*]] = load i16, ptr [[INCDEC_PTR40]], align 2
	; GENERIC-NEXT: [[ARRAYIDX55:%.*]] = getelementptr inbounds i16, ptr [[G]], i64 [[TMP26]]			; GENERIC-NEXT: [[CONV50:%.*]] = zext i16 [[TMP18]] to i64
	; GENERIC-NEXT: [[TMP27:%.*]] = load i16, ptr [[ARRAYIDX55]], align 2			; GENERIC-NEXT: [[INCDEC_PTR51:%.*]] = getelementptr inbounds i16, ptr [[B]], i64 7
	; GENERIC-NEXT: [[CONV56:%.*]] = zext i16 [[TMP27]] to i32			; GENERIC-NEXT: [[TMP19:%.*]] = load i16, ptr [[INCDEC_PTR42]], align 2
				; GENERIC-NEXT: [[CONV52:%.*]] = zext i16 [[TMP19]] to i64
				; GENERIC-NEXT: [[SUB53:%.*]] = sub nsw i64 [[CONV50]], [[CONV52]]
				; GENERIC-NEXT: [[ARRAYIDX55:%.*]] = getelementptr inbounds i16, ptr [[G]], i64 [[SUB53]]
				; GENERIC-NEXT: [[TMP20:%.*]] = load i16, ptr [[ARRAYIDX55]], align 2
				; GENERIC-NEXT: [[CONV56:%.*]] = zext i16 [[TMP20]] to i32
	; GENERIC-NEXT: [[ADD57:%.*]] = add nsw i32 [[ADD48]], [[CONV56]]			; GENERIC-NEXT: [[ADD57:%.*]] = add nsw i32 [[ADD48]], [[CONV56]]
	; GENERIC-NEXT: [[TMP28:%.*]] = extractelement <8 x i32> [[TMP6]], i64 7			; GENERIC-NEXT: [[INCDEC_PTR58]] = getelementptr inbounds i16, ptr [[A_ADDR_0101]], i64 8
	; GENERIC-NEXT: [[TMP29:%.*]] = sext i32 [[TMP28]] to i64			; GENERIC-NEXT: [[TMP21:%.*]] = load i16, ptr [[INCDEC_PTR49]], align 2
	; GENERIC-NEXT: [[ARRAYIDX64:%.*]] = getelementptr inbounds i16, ptr [[G]], i64 [[TMP29]]			; GENERIC-NEXT: [[CONV59:%.*]] = zext i16 [[TMP21]] to i64
	; GENERIC-NEXT: [[TMP30:%.*]] = load i16, ptr [[ARRAYIDX64]], align 2			; GENERIC-NEXT: [[TMP22:%.*]] = load i16, ptr [[INCDEC_PTR51]], align 2
	; GENERIC-NEXT: [[CONV65:%.*]] = zext i16 [[TMP30]] to i32			; GENERIC-NEXT: [[CONV61:%.*]] = zext i16 [[TMP22]] to i64
				; GENERIC-NEXT: [[SUB62:%.*]] = sub nsw i64 [[CONV59]], [[CONV61]]
				; GENERIC-NEXT: [[ARRAYIDX64:%.*]] = getelementptr inbounds i16, ptr [[G]], i64 [[SUB62]]
				; GENERIC-NEXT: [[TMP23:%.*]] = load i16, ptr [[ARRAYIDX64]], align 2
				; GENERIC-NEXT: [[CONV65:%.*]] = zext i16 [[TMP23]] to i32
	; GENERIC-NEXT: [[ADD66]] = add nsw i32 [[ADD57]], [[CONV65]]			; GENERIC-NEXT: [[ADD66]] = add nsw i32 [[ADD57]], [[CONV65]]
	; GENERIC-NEXT: [[INC]] = add nuw nsw i32 [[I_0103]], 1			; GENERIC-NEXT: [[INC]] = add nuw nsw i32 [[I_0103]], 1
	; GENERIC-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], [[N]]			; GENERIC-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], [[N]]
	; GENERIC-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP_LOOPEXIT]], label [[FOR_BODY]]			; GENERIC-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP_LOOPEXIT]], label [[FOR_BODY]]
	;			;
	; KRYO-LABEL: @gather_reduce_8x16_i32(			; KRYO-LABEL: @gather_reduce_8x16_i32(
	; KRYO-NEXT: entry:			; KRYO-NEXT: entry:
	; KRYO-NEXT: [[CMP_99:%.]] = icmp sgt i32 [[N:%.]], 0			; KRYO-NEXT: [[CMP_99:%.]] = icmp sgt i32 [[N:%.]], 0
	; KRYO-NEXT: br i1 [[CMP_99]], label [[FOR_BODY_PREHEADER:%.]], label [[FOR_COND_CLEANUP:%.]]			; KRYO-NEXT: br i1 [[CMP_99]], label [[FOR_BODY_PREHEADER:%.]], label [[FOR_COND_CLEANUP:%.]]
	; KRYO: for.body.preheader:			; KRYO: for.body.preheader:
	; KRYO-NEXT: br label [[FOR_BODY:%.*]]			; KRYO-NEXT: br label [[FOR_BODY:%.*]]
	; KRYO: for.cond.cleanup.loopexit:			; KRYO: for.cond.cleanup.loopexit:
	; KRYO-NEXT: br label [[FOR_COND_CLEANUP]]			; KRYO-NEXT: br label [[FOR_COND_CLEANUP]]
	; KRYO: for.cond.cleanup:			; KRYO: for.cond.cleanup:
	; KRYO-NEXT: [[SUM_0_LCSSA:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[ADD66:%.]], [[FOR_COND_CLEANUP_LOOPEXIT:%.]] ]			; KRYO-NEXT: [[SUM_0_LCSSA:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[ADD66:%.]], [[FOR_COND_CLEANUP_LOOPEXIT:%.]] ]
	; KRYO-NEXT: ret i32 [[SUM_0_LCSSA]]			; KRYO-NEXT: ret i32 [[SUM_0_LCSSA]]
	; KRYO: for.body:			; KRYO: for.body:
	; KRYO-NEXT: [[I_0103:%.]] = phi i32 [ [[INC:%.]], [[FOR_BODY]] ], [ 0, [[FOR_BODY_PREHEADER]] ]			; KRYO-NEXT: [[I_0103:%.]] = phi i32 [ [[INC:%.]], [[FOR_BODY]] ], [ 0, [[FOR_BODY_PREHEADER]] ]
	; KRYO-NEXT: [[SUM_0102:%.*]] = phi i32 [ [[ADD66]], [[FOR_BODY]] ], [ 0, [[FOR_BODY_PREHEADER]] ]			; KRYO-NEXT: [[SUM_0102:%.*]] = phi i32 [ [[ADD66]], [[FOR_BODY]] ], [ 0, [[FOR_BODY_PREHEADER]] ]
	; KRYO-NEXT: [[A_ADDR_0101:%.]] = phi ptr [ [[INCDEC_PTR58:%.]], [[FOR_BODY]] ], [ [[A:%.*]], [[FOR_BODY_PREHEADER]] ]			; KRYO-NEXT: [[A_ADDR_0101:%.]] = phi ptr [ [[INCDEC_PTR58:%.]], [[FOR_BODY]] ], [ [[A:%.*]], [[FOR_BODY_PREHEADER]] ]
	; KRYO-NEXT: [[INCDEC_PTR58]] = getelementptr inbounds i16, ptr [[A_ADDR_0101]], i64 8			; KRYO-NEXT: [[INCDEC_PTR:%.*]] = getelementptr inbounds i16, ptr [[A_ADDR_0101]], i64 1
	; KRYO-NEXT: [[TMP1:%.*]] = load <8 x i16>, ptr [[A_ADDR_0101]], align 2			; KRYO-NEXT: [[TMP0:%.*]] = load i16, ptr [[A_ADDR_0101]], align 2
	; KRYO-NEXT: [[TMP2:%.*]] = zext <8 x i16> [[TMP1]] to <8 x i32>			; KRYO-NEXT: [[CONV:%.*]] = zext i16 [[TMP0]] to i64
	; KRYO-NEXT: [[TMP4:%.]] = load <8 x i16>, ptr [[B:%.]], align 2			; KRYO-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds i16, ptr [[B:%.]], i64 1
	; KRYO-NEXT: [[TMP5:%.*]] = zext <8 x i16> [[TMP4]] to <8 x i32>			; KRYO-NEXT: [[TMP1:%.*]] = load i16, ptr [[B]], align 2
	; KRYO-NEXT: [[TMP6:%.*]] = sub nsw <8 x i32> [[TMP2]], [[TMP5]]			; KRYO-NEXT: [[CONV2:%.*]] = zext i16 [[TMP1]] to i64
	; KRYO-NEXT: [[TMP7:%.*]] = extractelement <8 x i32> [[TMP6]], i64 0			; KRYO-NEXT: [[SUB:%.*]] = sub nsw i64 [[CONV]], [[CONV2]]
	; KRYO-NEXT: [[TMP8:%.*]] = sext i32 [[TMP7]] to i64			; KRYO-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i16, ptr [[G:%.]], i64 [[SUB]]
	; KRYO-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i16, ptr [[G:%.]], i64 [[TMP8]]			; KRYO-NEXT: [[TMP2:%.*]] = load i16, ptr [[ARRAYIDX]], align 2
	; KRYO-NEXT: [[TMP9:%.*]] = load i16, ptr [[ARRAYIDX]], align 2			; KRYO-NEXT: [[CONV3:%.*]] = zext i16 [[TMP2]] to i32
	; KRYO-NEXT: [[CONV3:%.*]] = zext i16 [[TMP9]] to i32
	; KRYO-NEXT: [[ADD:%.*]] = add nsw i32 [[SUM_0102]], [[CONV3]]			; KRYO-NEXT: [[ADD:%.*]] = add nsw i32 [[SUM_0102]], [[CONV3]]
	; KRYO-NEXT: [[TMP10:%.*]] = extractelement <8 x i32> [[TMP6]], i64 1			; KRYO-NEXT: [[INCDEC_PTR4:%.*]] = getelementptr inbounds i16, ptr [[A_ADDR_0101]], i64 2
	; KRYO-NEXT: [[TMP11:%.*]] = sext i32 [[TMP10]] to i64			; KRYO-NEXT: [[TMP3:%.*]] = load i16, ptr [[INCDEC_PTR]], align 2
	; KRYO-NEXT: [[ARRAYIDX10:%.*]] = getelementptr inbounds i16, ptr [[G]], i64 [[TMP11]]			; KRYO-NEXT: [[CONV5:%.*]] = zext i16 [[TMP3]] to i64
	; KRYO-NEXT: [[TMP12:%.*]] = load i16, ptr [[ARRAYIDX10]], align 2			; KRYO-NEXT: [[INCDEC_PTR6:%.*]] = getelementptr inbounds i16, ptr [[B]], i64 2
	; KRYO-NEXT: [[CONV11:%.*]] = zext i16 [[TMP12]] to i32			; KRYO-NEXT: [[TMP4:%.*]] = load i16, ptr [[INCDEC_PTR1]], align 2
				; KRYO-NEXT: [[CONV7:%.*]] = zext i16 [[TMP4]] to i64
				; KRYO-NEXT: [[SUB8:%.*]] = sub nsw i64 [[CONV5]], [[CONV7]]
				; KRYO-NEXT: [[ARRAYIDX10:%.*]] = getelementptr inbounds i16, ptr [[G]], i64 [[SUB8]]
				; KRYO-NEXT: [[TMP5:%.*]] = load i16, ptr [[ARRAYIDX10]], align 2
				; KRYO-NEXT: [[CONV11:%.*]] = zext i16 [[TMP5]] to i32
	; KRYO-NEXT: [[ADD12:%.*]] = add nsw i32 [[ADD]], [[CONV11]]			; KRYO-NEXT: [[ADD12:%.*]] = add nsw i32 [[ADD]], [[CONV11]]
	; KRYO-NEXT: [[TMP13:%.*]] = extractelement <8 x i32> [[TMP6]], i64 2			; KRYO-NEXT: [[INCDEC_PTR13:%.*]] = getelementptr inbounds i16, ptr [[A_ADDR_0101]], i64 3
	; KRYO-NEXT: [[TMP14:%.*]] = sext i32 [[TMP13]] to i64			; KRYO-NEXT: [[TMP6:%.*]] = load i16, ptr [[INCDEC_PTR4]], align 2
	; KRYO-NEXT: [[ARRAYIDX19:%.*]] = getelementptr inbounds i16, ptr [[G]], i64 [[TMP14]]			; KRYO-NEXT: [[CONV14:%.*]] = zext i16 [[TMP6]] to i64
	; KRYO-NEXT: [[TMP15:%.*]] = load i16, ptr [[ARRAYIDX19]], align 2			; KRYO-NEXT: [[INCDEC_PTR15:%.*]] = getelementptr inbounds i16, ptr [[B]], i64 3
	; KRYO-NEXT: [[CONV20:%.*]] = zext i16 [[TMP15]] to i32			; KRYO-NEXT: [[TMP7:%.*]] = load i16, ptr [[INCDEC_PTR6]], align 2
				; KRYO-NEXT: [[CONV16:%.*]] = zext i16 [[TMP7]] to i64
				; KRYO-NEXT: [[SUB17:%.*]] = sub nsw i64 [[CONV14]], [[CONV16]]
				; KRYO-NEXT: [[ARRAYIDX19:%.*]] = getelementptr inbounds i16, ptr [[G]], i64 [[SUB17]]
				; KRYO-NEXT: [[TMP8:%.*]] = load i16, ptr [[ARRAYIDX19]], align 2
				; KRYO-NEXT: [[CONV20:%.*]] = zext i16 [[TMP8]] to i32
	; KRYO-NEXT: [[ADD21:%.*]] = add nsw i32 [[ADD12]], [[CONV20]]			; KRYO-NEXT: [[ADD21:%.*]] = add nsw i32 [[ADD12]], [[CONV20]]
	; KRYO-NEXT: [[TMP16:%.*]] = extractelement <8 x i32> [[TMP6]], i64 3			; KRYO-NEXT: [[INCDEC_PTR22:%.*]] = getelementptr inbounds i16, ptr [[A_ADDR_0101]], i64 4
	; KRYO-NEXT: [[TMP17:%.*]] = sext i32 [[TMP16]] to i64			; KRYO-NEXT: [[TMP9:%.*]] = load i16, ptr [[INCDEC_PTR13]], align 2
	; KRYO-NEXT: [[ARRAYIDX28:%.*]] = getelementptr inbounds i16, ptr [[G]], i64 [[TMP17]]			; KRYO-NEXT: [[CONV23:%.*]] = zext i16 [[TMP9]] to i64
	; KRYO-NEXT: [[TMP18:%.*]] = load i16, ptr [[ARRAYIDX28]], align 2			; KRYO-NEXT: [[INCDEC_PTR24:%.*]] = getelementptr inbounds i16, ptr [[B]], i64 4
	; KRYO-NEXT: [[CONV29:%.*]] = zext i16 [[TMP18]] to i32			; KRYO-NEXT: [[TMP10:%.*]] = load i16, ptr [[INCDEC_PTR15]], align 2
				; KRYO-NEXT: [[CONV25:%.*]] = zext i16 [[TMP10]] to i64
				; KRYO-NEXT: [[SUB26:%.*]] = sub nsw i64 [[CONV23]], [[CONV25]]
				; KRYO-NEXT: [[ARRAYIDX28:%.*]] = getelementptr inbounds i16, ptr [[G]], i64 [[SUB26]]
				; KRYO-NEXT: [[TMP11:%.*]] = load i16, ptr [[ARRAYIDX28]], align 2
				; KRYO-NEXT: [[CONV29:%.*]] = zext i16 [[TMP11]] to i32
	; KRYO-NEXT: [[ADD30:%.*]] = add nsw i32 [[ADD21]], [[CONV29]]			; KRYO-NEXT: [[ADD30:%.*]] = add nsw i32 [[ADD21]], [[CONV29]]
	; KRYO-NEXT: [[TMP19:%.*]] = extractelement <8 x i32> [[TMP6]], i64 4			; KRYO-NEXT: [[INCDEC_PTR31:%.*]] = getelementptr inbounds i16, ptr [[A_ADDR_0101]], i64 5
	; KRYO-NEXT: [[TMP20:%.*]] = sext i32 [[TMP19]] to i64			; KRYO-NEXT: [[TMP12:%.*]] = load i16, ptr [[INCDEC_PTR22]], align 2
	; KRYO-NEXT: [[ARRAYIDX37:%.*]] = getelementptr inbounds i16, ptr [[G]], i64 [[TMP20]]			; KRYO-NEXT: [[CONV32:%.*]] = zext i16 [[TMP12]] to i64
	; KRYO-NEXT: [[TMP21:%.*]] = load i16, ptr [[ARRAYIDX37]], align 2			; KRYO-NEXT: [[INCDEC_PTR33:%.*]] = getelementptr inbounds i16, ptr [[B]], i64 5
	; KRYO-NEXT: [[CONV38:%.*]] = zext i16 [[TMP21]] to i32			; KRYO-NEXT: [[TMP13:%.*]] = load i16, ptr [[INCDEC_PTR24]], align 2
				; KRYO-NEXT: [[CONV34:%.*]] = zext i16 [[TMP13]] to i64
				; KRYO-NEXT: [[SUB35:%.*]] = sub nsw i64 [[CONV32]], [[CONV34]]
				; KRYO-NEXT: [[ARRAYIDX37:%.*]] = getelementptr inbounds i16, ptr [[G]], i64 [[SUB35]]
				; KRYO-NEXT: [[TMP14:%.*]] = load i16, ptr [[ARRAYIDX37]], align 2
				; KRYO-NEXT: [[CONV38:%.*]] = zext i16 [[TMP14]] to i32
	; KRYO-NEXT: [[ADD39:%.*]] = add nsw i32 [[ADD30]], [[CONV38]]			; KRYO-NEXT: [[ADD39:%.*]] = add nsw i32 [[ADD30]], [[CONV38]]
	; KRYO-NEXT: [[TMP22:%.*]] = extractelement <8 x i32> [[TMP6]], i64 5			; KRYO-NEXT: [[INCDEC_PTR40:%.*]] = getelementptr inbounds i16, ptr [[A_ADDR_0101]], i64 6
	; KRYO-NEXT: [[TMP23:%.*]] = sext i32 [[TMP22]] to i64			; KRYO-NEXT: [[TMP15:%.*]] = load i16, ptr [[INCDEC_PTR31]], align 2
	; KRYO-NEXT: [[ARRAYIDX46:%.*]] = getelementptr inbounds i16, ptr [[G]], i64 [[TMP23]]			; KRYO-NEXT: [[CONV41:%.*]] = zext i16 [[TMP15]] to i64
	; KRYO-NEXT: [[TMP24:%.*]] = load i16, ptr [[ARRAYIDX46]], align 2			; KRYO-NEXT: [[INCDEC_PTR42:%.*]] = getelementptr inbounds i16, ptr [[B]], i64 6
	; KRYO-NEXT: [[CONV47:%.*]] = zext i16 [[TMP24]] to i32			; KRYO-NEXT: [[TMP16:%.*]] = load i16, ptr [[INCDEC_PTR33]], align 2
				; KRYO-NEXT: [[CONV43:%.*]] = zext i16 [[TMP16]] to i64
				; KRYO-NEXT: [[SUB44:%.*]] = sub nsw i64 [[CONV41]], [[CONV43]]
				; KRYO-NEXT: [[ARRAYIDX46:%.*]] = getelementptr inbounds i16, ptr [[G]], i64 [[SUB44]]
				; KRYO-NEXT: [[TMP17:%.*]] = load i16, ptr [[ARRAYIDX46]], align 2
				; KRYO-NEXT: [[CONV47:%.*]] = zext i16 [[TMP17]] to i32
	; KRYO-NEXT: [[ADD48:%.*]] = add nsw i32 [[ADD39]], [[CONV47]]			; KRYO-NEXT: [[ADD48:%.*]] = add nsw i32 [[ADD39]], [[CONV47]]
	; KRYO-NEXT: [[TMP25:%.*]] = extractelement <8 x i32> [[TMP6]], i64 6			; KRYO-NEXT: [[INCDEC_PTR49:%.*]] = getelementptr inbounds i16, ptr [[A_ADDR_0101]], i64 7
	; KRYO-NEXT: [[TMP26:%.*]] = sext i32 [[TMP25]] to i64			; KRYO-NEXT: [[TMP18:%.*]] = load i16, ptr [[INCDEC_PTR40]], align 2
	; KRYO-NEXT: [[ARRAYIDX55:%.*]] = getelementptr inbounds i16, ptr [[G]], i64 [[TMP26]]			; KRYO-NEXT: [[CONV50:%.*]] = zext i16 [[TMP18]] to i64
	; KRYO-NEXT: [[TMP27:%.*]] = load i16, ptr [[ARRAYIDX55]], align 2			; KRYO-NEXT: [[INCDEC_PTR51:%.*]] = getelementptr inbounds i16, ptr [[B]], i64 7
	; KRYO-NEXT: [[CONV56:%.*]] = zext i16 [[TMP27]] to i32			; KRYO-NEXT: [[TMP19:%.*]] = load i16, ptr [[INCDEC_PTR42]], align 2
				; KRYO-NEXT: [[CONV52:%.*]] = zext i16 [[TMP19]] to i64
				; KRYO-NEXT: [[SUB53:%.*]] = sub nsw i64 [[CONV50]], [[CONV52]]
				; KRYO-NEXT: [[ARRAYIDX55:%.*]] = getelementptr inbounds i16, ptr [[G]], i64 [[SUB53]]
				; KRYO-NEXT: [[TMP20:%.*]] = load i16, ptr [[ARRAYIDX55]], align 2
				; KRYO-NEXT: [[CONV56:%.*]] = zext i16 [[TMP20]] to i32
	; KRYO-NEXT: [[ADD57:%.*]] = add nsw i32 [[ADD48]], [[CONV56]]			; KRYO-NEXT: [[ADD57:%.*]] = add nsw i32 [[ADD48]], [[CONV56]]
	; KRYO-NEXT: [[TMP28:%.*]] = extractelement <8 x i32> [[TMP6]], i64 7			; KRYO-NEXT: [[INCDEC_PTR58]] = getelementptr inbounds i16, ptr [[A_ADDR_0101]], i64 8
	; KRYO-NEXT: [[TMP29:%.*]] = sext i32 [[TMP28]] to i64			; KRYO-NEXT: [[TMP21:%.*]] = load i16, ptr [[INCDEC_PTR49]], align 2
	; KRYO-NEXT: [[ARRAYIDX64:%.*]] = getelementptr inbounds i16, ptr [[G]], i64 [[TMP29]]			; KRYO-NEXT: [[CONV59:%.*]] = zext i16 [[TMP21]] to i64
	; KRYO-NEXT: [[TMP30:%.*]] = load i16, ptr [[ARRAYIDX64]], align 2			; KRYO-NEXT: [[TMP22:%.*]] = load i16, ptr [[INCDEC_PTR51]], align 2
	; KRYO-NEXT: [[CONV65:%.*]] = zext i16 [[TMP30]] to i32			; KRYO-NEXT: [[CONV61:%.*]] = zext i16 [[TMP22]] to i64
				; KRYO-NEXT: [[SUB62:%.*]] = sub nsw i64 [[CONV59]], [[CONV61]]
				; KRYO-NEXT: [[ARRAYIDX64:%.*]] = getelementptr inbounds i16, ptr [[G]], i64 [[SUB62]]
				; KRYO-NEXT: [[TMP23:%.*]] = load i16, ptr [[ARRAYIDX64]], align 2
				; KRYO-NEXT: [[CONV65:%.*]] = zext i16 [[TMP23]] to i32
	; KRYO-NEXT: [[ADD66]] = add nsw i32 [[ADD57]], [[CONV65]]			; KRYO-NEXT: [[ADD66]] = add nsw i32 [[ADD57]], [[CONV65]]
	; KRYO-NEXT: [[INC]] = add nuw nsw i32 [[I_0103]], 1			; KRYO-NEXT: [[INC]] = add nuw nsw i32 [[I_0103]], 1
	; KRYO-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], [[N]]			; KRYO-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], [[N]]
	; KRYO-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP_LOOPEXIT]], label [[FOR_BODY]]			; KRYO-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP_LOOPEXIT]], label [[FOR_BODY]]
	;			;
	entry:			entry:
	%cmp.99 = icmp sgt i32 %n, 0			%cmp.99 = icmp sgt i32 %n, 0
	br i1 %cmp.99, label %for.body.preheader, label %for.cond.cleanup			br i1 %cmp.99, label %for.body.preheader, label %for.cond.cleanup
	▲ Show 20 Lines • Show All 115 Lines • ▼ Show 20 Lines
	; GENERIC-NEXT: br label [[FOR_COND_CLEANUP]]			; GENERIC-NEXT: br label [[FOR_COND_CLEANUP]]
	; GENERIC: for.cond.cleanup:			; GENERIC: for.cond.cleanup:
	; GENERIC-NEXT: [[SUM_0_LCSSA:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[ADD66:%.]], [[FOR_COND_CLEANUP_LOOPEXIT:%.]] ]			; GENERIC-NEXT: [[SUM_0_LCSSA:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[ADD66:%.]], [[FOR_COND_CLEANUP_LOOPEXIT:%.]] ]
	; GENERIC-NEXT: ret i32 [[SUM_0_LCSSA]]			; GENERIC-NEXT: ret i32 [[SUM_0_LCSSA]]
	; GENERIC: for.body:			; GENERIC: for.body:
	; GENERIC-NEXT: [[I_0103:%.]] = phi i32 [ [[INC:%.]], [[FOR_BODY]] ], [ 0, [[FOR_BODY_PREHEADER]] ]			; GENERIC-NEXT: [[I_0103:%.]] = phi i32 [ [[INC:%.]], [[FOR_BODY]] ], [ 0, [[FOR_BODY_PREHEADER]] ]
	; GENERIC-NEXT: [[SUM_0102:%.*]] = phi i32 [ [[ADD66]], [[FOR_BODY]] ], [ 0, [[FOR_BODY_PREHEADER]] ]			; GENERIC-NEXT: [[SUM_0102:%.*]] = phi i32 [ [[ADD66]], [[FOR_BODY]] ], [ 0, [[FOR_BODY_PREHEADER]] ]
	; GENERIC-NEXT: [[A_ADDR_0101:%.]] = phi ptr [ [[INCDEC_PTR58:%.]], [[FOR_BODY]] ], [ [[A:%.*]], [[FOR_BODY_PREHEADER]] ]			; GENERIC-NEXT: [[A_ADDR_0101:%.]] = phi ptr [ [[INCDEC_PTR58:%.]], [[FOR_BODY]] ], [ [[A:%.*]], [[FOR_BODY_PREHEADER]] ]
	; GENERIC-NEXT: [[INCDEC_PTR58]] = getelementptr inbounds i16, ptr [[A_ADDR_0101]], i64 8			; GENERIC-NEXT: [[INCDEC_PTR:%.*]] = getelementptr inbounds i16, ptr [[A_ADDR_0101]], i64 1
	; GENERIC-NEXT: [[TMP1:%.*]] = load <8 x i16>, ptr [[A_ADDR_0101]], align 2			; GENERIC-NEXT: [[TMP0:%.*]] = load i16, ptr [[A_ADDR_0101]], align 2
	; GENERIC-NEXT: [[TMP2:%.*]] = zext <8 x i16> [[TMP1]] to <8 x i32>			; GENERIC-NEXT: [[CONV:%.*]] = zext i16 [[TMP0]] to i64
	; GENERIC-NEXT: [[TMP4:%.]] = load <8 x i16>, ptr [[B:%.]], align 2			; GENERIC-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds i16, ptr [[B:%.]], i64 1
	; GENERIC-NEXT: [[TMP5:%.*]] = zext <8 x i16> [[TMP4]] to <8 x i32>			; GENERIC-NEXT: [[TMP1:%.*]] = load i16, ptr [[B]], align 2
	; GENERIC-NEXT: [[TMP6:%.*]] = sub nsw <8 x i32> [[TMP2]], [[TMP5]]			; GENERIC-NEXT: [[CONV2:%.*]] = zext i16 [[TMP1]] to i64
	; GENERIC-NEXT: [[TMP7:%.*]] = extractelement <8 x i32> [[TMP6]], i64 0			; GENERIC-NEXT: [[SUB:%.*]] = sub nsw i64 [[CONV]], [[CONV2]]
	; GENERIC-NEXT: [[TMP8:%.*]] = sext i32 [[TMP7]] to i64			; GENERIC-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i16, ptr [[G:%.]], i64 [[SUB]]
	; GENERIC-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i16, ptr [[G:%.]], i64 [[TMP8]]			; GENERIC-NEXT: [[TMP2:%.*]] = load i16, ptr [[ARRAYIDX]], align 2
	; GENERIC-NEXT: [[TMP9:%.*]] = load i16, ptr [[ARRAYIDX]], align 2			; GENERIC-NEXT: [[CONV3:%.*]] = zext i16 [[TMP2]] to i32
	; GENERIC-NEXT: [[CONV3:%.*]] = zext i16 [[TMP9]] to i32
	; GENERIC-NEXT: [[ADD:%.*]] = add nsw i32 [[SUM_0102]], [[CONV3]]			; GENERIC-NEXT: [[ADD:%.*]] = add nsw i32 [[SUM_0102]], [[CONV3]]
	; GENERIC-NEXT: [[TMP10:%.*]] = extractelement <8 x i32> [[TMP6]], i64 1			; GENERIC-NEXT: [[INCDEC_PTR4:%.*]] = getelementptr inbounds i16, ptr [[A_ADDR_0101]], i64 2
	; GENERIC-NEXT: [[TMP11:%.*]] = sext i32 [[TMP10]] to i64			; GENERIC-NEXT: [[TMP3:%.*]] = load i16, ptr [[INCDEC_PTR]], align 2
	; GENERIC-NEXT: [[ARRAYIDX10:%.*]] = getelementptr inbounds i16, ptr [[G]], i64 [[TMP11]]			; GENERIC-NEXT: [[CONV5:%.*]] = zext i16 [[TMP3]] to i64
	; GENERIC-NEXT: [[TMP12:%.*]] = load i16, ptr [[ARRAYIDX10]], align 2			; GENERIC-NEXT: [[INCDEC_PTR6:%.*]] = getelementptr inbounds i16, ptr [[B]], i64 2
	; GENERIC-NEXT: [[CONV11:%.*]] = zext i16 [[TMP12]] to i32			; GENERIC-NEXT: [[TMP4:%.*]] = load i16, ptr [[INCDEC_PTR1]], align 2
				; GENERIC-NEXT: [[CONV7:%.*]] = zext i16 [[TMP4]] to i64
				; GENERIC-NEXT: [[SUB8:%.*]] = sub nsw i64 [[CONV5]], [[CONV7]]
				; GENERIC-NEXT: [[ARRAYIDX10:%.*]] = getelementptr inbounds i16, ptr [[G]], i64 [[SUB8]]
				; GENERIC-NEXT: [[TMP5:%.*]] = load i16, ptr [[ARRAYIDX10]], align 2
				; GENERIC-NEXT: [[CONV11:%.*]] = zext i16 [[TMP5]] to i32
	; GENERIC-NEXT: [[ADD12:%.*]] = add nsw i32 [[ADD]], [[CONV11]]			; GENERIC-NEXT: [[ADD12:%.*]] = add nsw i32 [[ADD]], [[CONV11]]
	; GENERIC-NEXT: [[TMP13:%.*]] = extractelement <8 x i32> [[TMP6]], i64 2			; GENERIC-NEXT: [[INCDEC_PTR13:%.*]] = getelementptr inbounds i16, ptr [[A_ADDR_0101]], i64 3
	; GENERIC-NEXT: [[TMP14:%.*]] = sext i32 [[TMP13]] to i64			; GENERIC-NEXT: [[TMP6:%.*]] = load i16, ptr [[INCDEC_PTR4]], align 2
	; GENERIC-NEXT: [[ARRAYIDX19:%.*]] = getelementptr inbounds i16, ptr [[G]], i64 [[TMP14]]			; GENERIC-NEXT: [[CONV14:%.*]] = zext i16 [[TMP6]] to i64
	; GENERIC-NEXT: [[TMP15:%.*]] = load i16, ptr [[ARRAYIDX19]], align 2			; GENERIC-NEXT: [[INCDEC_PTR15:%.*]] = getelementptr inbounds i16, ptr [[B]], i64 3
	; GENERIC-NEXT: [[CONV20:%.*]] = zext i16 [[TMP15]] to i32			; GENERIC-NEXT: [[TMP7:%.*]] = load i16, ptr [[INCDEC_PTR6]], align 2
				; GENERIC-NEXT: [[CONV16:%.*]] = zext i16 [[TMP7]] to i64
				; GENERIC-NEXT: [[SUB17:%.*]] = sub nsw i64 [[CONV14]], [[CONV16]]
				; GENERIC-NEXT: [[ARRAYIDX19:%.*]] = getelementptr inbounds i16, ptr [[G]], i64 [[SUB17]]
				; GENERIC-NEXT: [[TMP8:%.*]] = load i16, ptr [[ARRAYIDX19]], align 2
				; GENERIC-NEXT: [[CONV20:%.*]] = zext i16 [[TMP8]] to i32
	; GENERIC-NEXT: [[ADD21:%.*]] = add nsw i32 [[ADD12]], [[CONV20]]			; GENERIC-NEXT: [[ADD21:%.*]] = add nsw i32 [[ADD12]], [[CONV20]]
	; GENERIC-NEXT: [[TMP16:%.*]] = extractelement <8 x i32> [[TMP6]], i64 3			; GENERIC-NEXT: [[INCDEC_PTR22:%.*]] = getelementptr inbounds i16, ptr [[A_ADDR_0101]], i64 4
	; GENERIC-NEXT: [[TMP17:%.*]] = sext i32 [[TMP16]] to i64			; GENERIC-NEXT: [[TMP9:%.*]] = load i16, ptr [[INCDEC_PTR13]], align 2
	; GENERIC-NEXT: [[ARRAYIDX28:%.*]] = getelementptr inbounds i16, ptr [[G]], i64 [[TMP17]]			; GENERIC-NEXT: [[CONV23:%.*]] = zext i16 [[TMP9]] to i64
	; GENERIC-NEXT: [[TMP18:%.*]] = load i16, ptr [[ARRAYIDX28]], align 2			; GENERIC-NEXT: [[INCDEC_PTR24:%.*]] = getelementptr inbounds i16, ptr [[B]], i64 4
	; GENERIC-NEXT: [[CONV29:%.*]] = zext i16 [[TMP18]] to i32			; GENERIC-NEXT: [[TMP10:%.*]] = load i16, ptr [[INCDEC_PTR15]], align 2
				; GENERIC-NEXT: [[CONV25:%.*]] = zext i16 [[TMP10]] to i64
				; GENERIC-NEXT: [[SUB26:%.*]] = sub nsw i64 [[CONV23]], [[CONV25]]
				; GENERIC-NEXT: [[ARRAYIDX28:%.*]] = getelementptr inbounds i16, ptr [[G]], i64 [[SUB26]]
				; GENERIC-NEXT: [[TMP11:%.*]] = load i16, ptr [[ARRAYIDX28]], align 2
				; GENERIC-NEXT: [[CONV29:%.*]] = zext i16 [[TMP11]] to i32
	; GENERIC-NEXT: [[ADD30:%.*]] = add nsw i32 [[ADD21]], [[CONV29]]			; GENERIC-NEXT: [[ADD30:%.*]] = add nsw i32 [[ADD21]], [[CONV29]]
	; GENERIC-NEXT: [[TMP19:%.*]] = extractelement <8 x i32> [[TMP6]], i64 4			; GENERIC-NEXT: [[INCDEC_PTR31:%.*]] = getelementptr inbounds i16, ptr [[A_ADDR_0101]], i64 5
	; GENERIC-NEXT: [[TMP20:%.*]] = sext i32 [[TMP19]] to i64			; GENERIC-NEXT: [[TMP12:%.*]] = load i16, ptr [[INCDEC_PTR22]], align 2
	; GENERIC-NEXT: [[ARRAYIDX37:%.*]] = getelementptr inbounds i16, ptr [[G]], i64 [[TMP20]]			; GENERIC-NEXT: [[CONV32:%.*]] = zext i16 [[TMP12]] to i64
	; GENERIC-NEXT: [[TMP21:%.*]] = load i16, ptr [[ARRAYIDX37]], align 2			; GENERIC-NEXT: [[INCDEC_PTR33:%.*]] = getelementptr inbounds i16, ptr [[B]], i64 5
	; GENERIC-NEXT: [[CONV38:%.*]] = zext i16 [[TMP21]] to i32			; GENERIC-NEXT: [[TMP13:%.*]] = load i16, ptr [[INCDEC_PTR24]], align 2
				; GENERIC-NEXT: [[CONV34:%.*]] = zext i16 [[TMP13]] to i64
				; GENERIC-NEXT: [[SUB35:%.*]] = sub nsw i64 [[CONV32]], [[CONV34]]
				; GENERIC-NEXT: [[ARRAYIDX37:%.*]] = getelementptr inbounds i16, ptr [[G]], i64 [[SUB35]]
				; GENERIC-NEXT: [[TMP14:%.*]] = load i16, ptr [[ARRAYIDX37]], align 2
				; GENERIC-NEXT: [[CONV38:%.*]] = zext i16 [[TMP14]] to i32
	; GENERIC-NEXT: [[ADD39:%.*]] = add nsw i32 [[ADD30]], [[CONV38]]			; GENERIC-NEXT: [[ADD39:%.*]] = add nsw i32 [[ADD30]], [[CONV38]]
	; GENERIC-NEXT: [[TMP22:%.*]] = extractelement <8 x i32> [[TMP6]], i64 5			; GENERIC-NEXT: [[INCDEC_PTR40:%.*]] = getelementptr inbounds i16, ptr [[A_ADDR_0101]], i64 6
	; GENERIC-NEXT: [[TMP23:%.*]] = sext i32 [[TMP22]] to i64			; GENERIC-NEXT: [[TMP15:%.*]] = load i16, ptr [[INCDEC_PTR31]], align 2
	; GENERIC-NEXT: [[ARRAYIDX46:%.*]] = getelementptr inbounds i16, ptr [[G]], i64 [[TMP23]]			; GENERIC-NEXT: [[CONV41:%.*]] = zext i16 [[TMP15]] to i64
	; GENERIC-NEXT: [[TMP24:%.*]] = load i16, ptr [[ARRAYIDX46]], align 2			; GENERIC-NEXT: [[INCDEC_PTR42:%.*]] = getelementptr inbounds i16, ptr [[B]], i64 6
	; GENERIC-NEXT: [[CONV47:%.*]] = zext i16 [[TMP24]] to i32			; GENERIC-NEXT: [[TMP16:%.*]] = load i16, ptr [[INCDEC_PTR33]], align 2
				; GENERIC-NEXT: [[CONV43:%.*]] = zext i16 [[TMP16]] to i64
				; GENERIC-NEXT: [[SUB44:%.*]] = sub nsw i64 [[CONV41]], [[CONV43]]
				; GENERIC-NEXT: [[ARRAYIDX46:%.*]] = getelementptr inbounds i16, ptr [[G]], i64 [[SUB44]]
				; GENERIC-NEXT: [[TMP17:%.*]] = load i16, ptr [[ARRAYIDX46]], align 2
				; GENERIC-NEXT: [[CONV47:%.*]] = zext i16 [[TMP17]] to i32
	; GENERIC-NEXT: [[ADD48:%.*]] = add nsw i32 [[ADD39]], [[CONV47]]			; GENERIC-NEXT: [[ADD48:%.*]] = add nsw i32 [[ADD39]], [[CONV47]]
	; GENERIC-NEXT: [[TMP25:%.*]] = extractelement <8 x i32> [[TMP6]], i64 6			; GENERIC-NEXT: [[INCDEC_PTR49:%.*]] = getelementptr inbounds i16, ptr [[A_ADDR_0101]], i64 7
	; GENERIC-NEXT: [[TMP26:%.*]] = sext i32 [[TMP25]] to i64			; GENERIC-NEXT: [[TMP18:%.*]] = load i16, ptr [[INCDEC_PTR40]], align 2
	; GENERIC-NEXT: [[ARRAYIDX55:%.*]] = getelementptr inbounds i16, ptr [[G]], i64 [[TMP26]]			; GENERIC-NEXT: [[CONV50:%.*]] = zext i16 [[TMP18]] to i64
	; GENERIC-NEXT: [[TMP27:%.*]] = load i16, ptr [[ARRAYIDX55]], align 2			; GENERIC-NEXT: [[INCDEC_PTR51:%.*]] = getelementptr inbounds i16, ptr [[B]], i64 7
	; GENERIC-NEXT: [[CONV56:%.*]] = zext i16 [[TMP27]] to i32			; GENERIC-NEXT: [[TMP19:%.*]] = load i16, ptr [[INCDEC_PTR42]], align 2
				; GENERIC-NEXT: [[CONV52:%.*]] = zext i16 [[TMP19]] to i64
				; GENERIC-NEXT: [[SUB53:%.*]] = sub nsw i64 [[CONV50]], [[CONV52]]
				; GENERIC-NEXT: [[ARRAYIDX55:%.*]] = getelementptr inbounds i16, ptr [[G]], i64 [[SUB53]]
				; GENERIC-NEXT: [[TMP20:%.*]] = load i16, ptr [[ARRAYIDX55]], align 2
				; GENERIC-NEXT: [[CONV56:%.*]] = zext i16 [[TMP20]] to i32
	; GENERIC-NEXT: [[ADD57:%.*]] = add nsw i32 [[ADD48]], [[CONV56]]			; GENERIC-NEXT: [[ADD57:%.*]] = add nsw i32 [[ADD48]], [[CONV56]]
	; GENERIC-NEXT: [[TMP28:%.*]] = extractelement <8 x i32> [[TMP6]], i64 7			; GENERIC-NEXT: [[INCDEC_PTR58]] = getelementptr inbounds i16, ptr [[A_ADDR_0101]], i64 8
	; GENERIC-NEXT: [[TMP29:%.*]] = sext i32 [[TMP28]] to i64			; GENERIC-NEXT: [[TMP21:%.*]] = load i16, ptr [[INCDEC_PTR49]], align 2
	; GENERIC-NEXT: [[ARRAYIDX64:%.*]] = getelementptr inbounds i16, ptr [[G]], i64 [[TMP29]]			; GENERIC-NEXT: [[CONV59:%.*]] = zext i16 [[TMP21]] to i64
	; GENERIC-NEXT: [[TMP30:%.*]] = load i16, ptr [[ARRAYIDX64]], align 2			; GENERIC-NEXT: [[TMP22:%.*]] = load i16, ptr [[INCDEC_PTR51]], align 2
	; GENERIC-NEXT: [[CONV65:%.*]] = zext i16 [[TMP30]] to i32			; GENERIC-NEXT: [[CONV61:%.*]] = zext i16 [[TMP22]] to i64
				; GENERIC-NEXT: [[SUB62:%.*]] = sub nsw i64 [[CONV59]], [[CONV61]]
				; GENERIC-NEXT: [[ARRAYIDX64:%.*]] = getelementptr inbounds i16, ptr [[G]], i64 [[SUB62]]
				; GENERIC-NEXT: [[TMP23:%.*]] = load i16, ptr [[ARRAYIDX64]], align 2
				; GENERIC-NEXT: [[CONV65:%.*]] = zext i16 [[TMP23]] to i32
	; GENERIC-NEXT: [[ADD66]] = add nsw i32 [[ADD57]], [[CONV65]]			; GENERIC-NEXT: [[ADD66]] = add nsw i32 [[ADD57]], [[CONV65]]
	; GENERIC-NEXT: [[INC]] = add nuw nsw i32 [[I_0103]], 1			; GENERIC-NEXT: [[INC]] = add nuw nsw i32 [[I_0103]], 1
	; GENERIC-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], [[N]]			; GENERIC-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], [[N]]
	; GENERIC-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP_LOOPEXIT]], label [[FOR_BODY]]			; GENERIC-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP_LOOPEXIT]], label [[FOR_BODY]]
	;			;
	; KRYO-LABEL: @gather_reduce_8x16_i64(			; KRYO-LABEL: @gather_reduce_8x16_i64(
	; KRYO-NEXT: entry:			; KRYO-NEXT: entry:
	; KRYO-NEXT: [[CMP_99:%.]] = icmp sgt i32 [[N:%.]], 0			; KRYO-NEXT: [[CMP_99:%.]] = icmp sgt i32 [[N:%.]], 0
	; KRYO-NEXT: br i1 [[CMP_99]], label [[FOR_BODY_PREHEADER:%.]], label [[FOR_COND_CLEANUP:%.]]			; KRYO-NEXT: br i1 [[CMP_99]], label [[FOR_BODY_PREHEADER:%.]], label [[FOR_COND_CLEANUP:%.]]
	; KRYO: for.body.preheader:			; KRYO: for.body.preheader:
	; KRYO-NEXT: br label [[FOR_BODY:%.*]]			; KRYO-NEXT: br label [[FOR_BODY:%.*]]
	; KRYO: for.cond.cleanup.loopexit:			; KRYO: for.cond.cleanup.loopexit:
	; KRYO-NEXT: br label [[FOR_COND_CLEANUP]]			; KRYO-NEXT: br label [[FOR_COND_CLEANUP]]
	; KRYO: for.cond.cleanup:			; KRYO: for.cond.cleanup:
	; KRYO-NEXT: [[SUM_0_LCSSA:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[ADD66:%.]], [[FOR_COND_CLEANUP_LOOPEXIT:%.]] ]			; KRYO-NEXT: [[SUM_0_LCSSA:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[ADD66:%.]], [[FOR_COND_CLEANUP_LOOPEXIT:%.]] ]
	; KRYO-NEXT: ret i32 [[SUM_0_LCSSA]]			; KRYO-NEXT: ret i32 [[SUM_0_LCSSA]]
	; KRYO: for.body:			; KRYO: for.body:
	; KRYO-NEXT: [[I_0103:%.]] = phi i32 [ [[INC:%.]], [[FOR_BODY]] ], [ 0, [[FOR_BODY_PREHEADER]] ]			; KRYO-NEXT: [[I_0103:%.]] = phi i32 [ [[INC:%.]], [[FOR_BODY]] ], [ 0, [[FOR_BODY_PREHEADER]] ]
	; KRYO-NEXT: [[SUM_0102:%.*]] = phi i32 [ [[ADD66]], [[FOR_BODY]] ], [ 0, [[FOR_BODY_PREHEADER]] ]			; KRYO-NEXT: [[SUM_0102:%.*]] = phi i32 [ [[ADD66]], [[FOR_BODY]] ], [ 0, [[FOR_BODY_PREHEADER]] ]
	; KRYO-NEXT: [[A_ADDR_0101:%.]] = phi ptr [ [[INCDEC_PTR58:%.]], [[FOR_BODY]] ], [ [[A:%.*]], [[FOR_BODY_PREHEADER]] ]			; KRYO-NEXT: [[A_ADDR_0101:%.]] = phi ptr [ [[INCDEC_PTR58:%.]], [[FOR_BODY]] ], [ [[A:%.*]], [[FOR_BODY_PREHEADER]] ]
	; KRYO-NEXT: [[INCDEC_PTR58]] = getelementptr inbounds i16, ptr [[A_ADDR_0101]], i64 8			; KRYO-NEXT: [[INCDEC_PTR:%.*]] = getelementptr inbounds i16, ptr [[A_ADDR_0101]], i64 1
	; KRYO-NEXT: [[TMP1:%.*]] = load <8 x i16>, ptr [[A_ADDR_0101]], align 2			; KRYO-NEXT: [[TMP0:%.*]] = load i16, ptr [[A_ADDR_0101]], align 2
	; KRYO-NEXT: [[TMP2:%.*]] = zext <8 x i16> [[TMP1]] to <8 x i32>			; KRYO-NEXT: [[CONV:%.*]] = zext i16 [[TMP0]] to i64
	; KRYO-NEXT: [[TMP4:%.]] = load <8 x i16>, ptr [[B:%.]], align 2			; KRYO-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds i16, ptr [[B:%.]], i64 1
	; KRYO-NEXT: [[TMP5:%.*]] = zext <8 x i16> [[TMP4]] to <8 x i32>			; KRYO-NEXT: [[TMP1:%.*]] = load i16, ptr [[B]], align 2
	; KRYO-NEXT: [[TMP6:%.*]] = sub nsw <8 x i32> [[TMP2]], [[TMP5]]			; KRYO-NEXT: [[CONV2:%.*]] = zext i16 [[TMP1]] to i64
	; KRYO-NEXT: [[TMP7:%.*]] = extractelement <8 x i32> [[TMP6]], i64 0			; KRYO-NEXT: [[SUB:%.*]] = sub nsw i64 [[CONV]], [[CONV2]]
	; KRYO-NEXT: [[TMP8:%.*]] = sext i32 [[TMP7]] to i64			; KRYO-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i16, ptr [[G:%.]], i64 [[SUB]]
	; KRYO-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i16, ptr [[G:%.]], i64 [[TMP8]]			; KRYO-NEXT: [[TMP2:%.*]] = load i16, ptr [[ARRAYIDX]], align 2
	; KRYO-NEXT: [[TMP9:%.*]] = load i16, ptr [[ARRAYIDX]], align 2			; KRYO-NEXT: [[CONV3:%.*]] = zext i16 [[TMP2]] to i32
	; KRYO-NEXT: [[CONV3:%.*]] = zext i16 [[TMP9]] to i32
	; KRYO-NEXT: [[ADD:%.*]] = add nsw i32 [[SUM_0102]], [[CONV3]]			; KRYO-NEXT: [[ADD:%.*]] = add nsw i32 [[SUM_0102]], [[CONV3]]
	; KRYO-NEXT: [[TMP10:%.*]] = extractelement <8 x i32> [[TMP6]], i64 1			; KRYO-NEXT: [[INCDEC_PTR4:%.*]] = getelementptr inbounds i16, ptr [[A_ADDR_0101]], i64 2
	; KRYO-NEXT: [[TMP11:%.*]] = sext i32 [[TMP10]] to i64			; KRYO-NEXT: [[TMP3:%.*]] = load i16, ptr [[INCDEC_PTR]], align 2
	; KRYO-NEXT: [[ARRAYIDX10:%.*]] = getelementptr inbounds i16, ptr [[G]], i64 [[TMP11]]			; KRYO-NEXT: [[CONV5:%.*]] = zext i16 [[TMP3]] to i64
	; KRYO-NEXT: [[TMP12:%.*]] = load i16, ptr [[ARRAYIDX10]], align 2			; KRYO-NEXT: [[INCDEC_PTR6:%.*]] = getelementptr inbounds i16, ptr [[B]], i64 2
	; KRYO-NEXT: [[CONV11:%.*]] = zext i16 [[TMP12]] to i32			; KRYO-NEXT: [[TMP4:%.*]] = load i16, ptr [[INCDEC_PTR1]], align 2
				; KRYO-NEXT: [[CONV7:%.*]] = zext i16 [[TMP4]] to i64
				; KRYO-NEXT: [[SUB8:%.*]] = sub nsw i64 [[CONV5]], [[CONV7]]
				; KRYO-NEXT: [[ARRAYIDX10:%.*]] = getelementptr inbounds i16, ptr [[G]], i64 [[SUB8]]
				; KRYO-NEXT: [[TMP5:%.*]] = load i16, ptr [[ARRAYIDX10]], align 2
				; KRYO-NEXT: [[CONV11:%.*]] = zext i16 [[TMP5]] to i32
	; KRYO-NEXT: [[ADD12:%.*]] = add nsw i32 [[ADD]], [[CONV11]]			; KRYO-NEXT: [[ADD12:%.*]] = add nsw i32 [[ADD]], [[CONV11]]
	; KRYO-NEXT: [[TMP13:%.*]] = extractelement <8 x i32> [[TMP6]], i64 2			; KRYO-NEXT: [[INCDEC_PTR13:%.*]] = getelementptr inbounds i16, ptr [[A_ADDR_0101]], i64 3
	; KRYO-NEXT: [[TMP14:%.*]] = sext i32 [[TMP13]] to i64			; KRYO-NEXT: [[TMP6:%.*]] = load i16, ptr [[INCDEC_PTR4]], align 2
	; KRYO-NEXT: [[ARRAYIDX19:%.*]] = getelementptr inbounds i16, ptr [[G]], i64 [[TMP14]]			; KRYO-NEXT: [[CONV14:%.*]] = zext i16 [[TMP6]] to i64
	; KRYO-NEXT: [[TMP15:%.*]] = load i16, ptr [[ARRAYIDX19]], align 2			; KRYO-NEXT: [[INCDEC_PTR15:%.*]] = getelementptr inbounds i16, ptr [[B]], i64 3
	; KRYO-NEXT: [[CONV20:%.*]] = zext i16 [[TMP15]] to i32			; KRYO-NEXT: [[TMP7:%.*]] = load i16, ptr [[INCDEC_PTR6]], align 2
				; KRYO-NEXT: [[CONV16:%.*]] = zext i16 [[TMP7]] to i64
				; KRYO-NEXT: [[SUB17:%.*]] = sub nsw i64 [[CONV14]], [[CONV16]]
				; KRYO-NEXT: [[ARRAYIDX19:%.*]] = getelementptr inbounds i16, ptr [[G]], i64 [[SUB17]]
				; KRYO-NEXT: [[TMP8:%.*]] = load i16, ptr [[ARRAYIDX19]], align 2
				; KRYO-NEXT: [[CONV20:%.*]] = zext i16 [[TMP8]] to i32
	; KRYO-NEXT: [[ADD21:%.*]] = add nsw i32 [[ADD12]], [[CONV20]]			; KRYO-NEXT: [[ADD21:%.*]] = add nsw i32 [[ADD12]], [[CONV20]]
	; KRYO-NEXT: [[TMP16:%.*]] = extractelement <8 x i32> [[TMP6]], i64 3			; KRYO-NEXT: [[INCDEC_PTR22:%.*]] = getelementptr inbounds i16, ptr [[A_ADDR_0101]], i64 4
	; KRYO-NEXT: [[TMP17:%.*]] = sext i32 [[TMP16]] to i64			; KRYO-NEXT: [[TMP9:%.*]] = load i16, ptr [[INCDEC_PTR13]], align 2
	; KRYO-NEXT: [[ARRAYIDX28:%.*]] = getelementptr inbounds i16, ptr [[G]], i64 [[TMP17]]			; KRYO-NEXT: [[CONV23:%.*]] = zext i16 [[TMP9]] to i64
	; KRYO-NEXT: [[TMP18:%.*]] = load i16, ptr [[ARRAYIDX28]], align 2			; KRYO-NEXT: [[INCDEC_PTR24:%.*]] = getelementptr inbounds i16, ptr [[B]], i64 4
	; KRYO-NEXT: [[CONV29:%.*]] = zext i16 [[TMP18]] to i32			; KRYO-NEXT: [[TMP10:%.*]] = load i16, ptr [[INCDEC_PTR15]], align 2
				; KRYO-NEXT: [[CONV25:%.*]] = zext i16 [[TMP10]] to i64
				; KRYO-NEXT: [[SUB26:%.*]] = sub nsw i64 [[CONV23]], [[CONV25]]
				; KRYO-NEXT: [[ARRAYIDX28:%.*]] = getelementptr inbounds i16, ptr [[G]], i64 [[SUB26]]
				; KRYO-NEXT: [[TMP11:%.*]] = load i16, ptr [[ARRAYIDX28]], align 2
				; KRYO-NEXT: [[CONV29:%.*]] = zext i16 [[TMP11]] to i32
	; KRYO-NEXT: [[ADD30:%.*]] = add nsw i32 [[ADD21]], [[CONV29]]			; KRYO-NEXT: [[ADD30:%.*]] = add nsw i32 [[ADD21]], [[CONV29]]
	; KRYO-NEXT: [[TMP19:%.*]] = extractelement <8 x i32> [[TMP6]], i64 4			; KRYO-NEXT: [[INCDEC_PTR31:%.*]] = getelementptr inbounds i16, ptr [[A_ADDR_0101]], i64 5
	; KRYO-NEXT: [[TMP20:%.*]] = sext i32 [[TMP19]] to i64			; KRYO-NEXT: [[TMP12:%.*]] = load i16, ptr [[INCDEC_PTR22]], align 2
	; KRYO-NEXT: [[ARRAYIDX37:%.*]] = getelementptr inbounds i16, ptr [[G]], i64 [[TMP20]]			; KRYO-NEXT: [[CONV32:%.*]] = zext i16 [[TMP12]] to i64
	; KRYO-NEXT: [[TMP21:%.*]] = load i16, ptr [[ARRAYIDX37]], align 2			; KRYO-NEXT: [[INCDEC_PTR33:%.*]] = getelementptr inbounds i16, ptr [[B]], i64 5
	; KRYO-NEXT: [[CONV38:%.*]] = zext i16 [[TMP21]] to i32			; KRYO-NEXT: [[TMP13:%.*]] = load i16, ptr [[INCDEC_PTR24]], align 2
				; KRYO-NEXT: [[CONV34:%.*]] = zext i16 [[TMP13]] to i64
				; KRYO-NEXT: [[SUB35:%.*]] = sub nsw i64 [[CONV32]], [[CONV34]]
				; KRYO-NEXT: [[ARRAYIDX37:%.*]] = getelementptr inbounds i16, ptr [[G]], i64 [[SUB35]]
				; KRYO-NEXT: [[TMP14:%.*]] = load i16, ptr [[ARRAYIDX37]], align 2
				; KRYO-NEXT: [[CONV38:%.*]] = zext i16 [[TMP14]] to i32
	; KRYO-NEXT: [[ADD39:%.*]] = add nsw i32 [[ADD30]], [[CONV38]]			; KRYO-NEXT: [[ADD39:%.*]] = add nsw i32 [[ADD30]], [[CONV38]]
	; KRYO-NEXT: [[TMP22:%.*]] = extractelement <8 x i32> [[TMP6]], i64 5			; KRYO-NEXT: [[INCDEC_PTR40:%.*]] = getelementptr inbounds i16, ptr [[A_ADDR_0101]], i64 6
	; KRYO-NEXT: [[TMP23:%.*]] = sext i32 [[TMP22]] to i64			; KRYO-NEXT: [[TMP15:%.*]] = load i16, ptr [[INCDEC_PTR31]], align 2
	; KRYO-NEXT: [[ARRAYIDX46:%.*]] = getelementptr inbounds i16, ptr [[G]], i64 [[TMP23]]			; KRYO-NEXT: [[CONV41:%.*]] = zext i16 [[TMP15]] to i64
	; KRYO-NEXT: [[TMP24:%.*]] = load i16, ptr [[ARRAYIDX46]], align 2			; KRYO-NEXT: [[INCDEC_PTR42:%.*]] = getelementptr inbounds i16, ptr [[B]], i64 6
	; KRYO-NEXT: [[CONV47:%.*]] = zext i16 [[TMP24]] to i32			; KRYO-NEXT: [[TMP16:%.*]] = load i16, ptr [[INCDEC_PTR33]], align 2
				; KRYO-NEXT: [[CONV43:%.*]] = zext i16 [[TMP16]] to i64
				; KRYO-NEXT: [[SUB44:%.*]] = sub nsw i64 [[CONV41]], [[CONV43]]
				; KRYO-NEXT: [[ARRAYIDX46:%.*]] = getelementptr inbounds i16, ptr [[G]], i64 [[SUB44]]
				; KRYO-NEXT: [[TMP17:%.*]] = load i16, ptr [[ARRAYIDX46]], align 2
				; KRYO-NEXT: [[CONV47:%.*]] = zext i16 [[TMP17]] to i32
	; KRYO-NEXT: [[ADD48:%.*]] = add nsw i32 [[ADD39]], [[CONV47]]			; KRYO-NEXT: [[ADD48:%.*]] = add nsw i32 [[ADD39]], [[CONV47]]
	; KRYO-NEXT: [[TMP25:%.*]] = extractelement <8 x i32> [[TMP6]], i64 6			; KRYO-NEXT: [[INCDEC_PTR49:%.*]] = getelementptr inbounds i16, ptr [[A_ADDR_0101]], i64 7
	; KRYO-NEXT: [[TMP26:%.*]] = sext i32 [[TMP25]] to i64			; KRYO-NEXT: [[TMP18:%.*]] = load i16, ptr [[INCDEC_PTR40]], align 2
	; KRYO-NEXT: [[ARRAYIDX55:%.*]] = getelementptr inbounds i16, ptr [[G]], i64 [[TMP26]]			; KRYO-NEXT: [[CONV50:%.*]] = zext i16 [[TMP18]] to i64
	; KRYO-NEXT: [[TMP27:%.*]] = load i16, ptr [[ARRAYIDX55]], align 2			; KRYO-NEXT: [[INCDEC_PTR51:%.*]] = getelementptr inbounds i16, ptr [[B]], i64 7
	; KRYO-NEXT: [[CONV56:%.*]] = zext i16 [[TMP27]] to i32			; KRYO-NEXT: [[TMP19:%.*]] = load i16, ptr [[INCDEC_PTR42]], align 2
				; KRYO-NEXT: [[CONV52:%.*]] = zext i16 [[TMP19]] to i64
				; KRYO-NEXT: [[SUB53:%.*]] = sub nsw i64 [[CONV50]], [[CONV52]]
				; KRYO-NEXT: [[ARRAYIDX55:%.*]] = getelementptr inbounds i16, ptr [[G]], i64 [[SUB53]]
				; KRYO-NEXT: [[TMP20:%.*]] = load i16, ptr [[ARRAYIDX55]], align 2
				; KRYO-NEXT: [[CONV56:%.*]] = zext i16 [[TMP20]] to i32
	; KRYO-NEXT: [[ADD57:%.*]] = add nsw i32 [[ADD48]], [[CONV56]]			; KRYO-NEXT: [[ADD57:%.*]] = add nsw i32 [[ADD48]], [[CONV56]]
	; KRYO-NEXT: [[TMP28:%.*]] = extractelement <8 x i32> [[TMP6]], i64 7			; KRYO-NEXT: [[INCDEC_PTR58]] = getelementptr inbounds i16, ptr [[A_ADDR_0101]], i64 8
	; KRYO-NEXT: [[TMP29:%.*]] = sext i32 [[TMP28]] to i64			; KRYO-NEXT: [[TMP21:%.*]] = load i16, ptr [[INCDEC_PTR49]], align 2
	; KRYO-NEXT: [[ARRAYIDX64:%.*]] = getelementptr inbounds i16, ptr [[G]], i64 [[TMP29]]			; KRYO-NEXT: [[CONV59:%.*]] = zext i16 [[TMP21]] to i64
	; KRYO-NEXT: [[TMP30:%.*]] = load i16, ptr [[ARRAYIDX64]], align 2			; KRYO-NEXT: [[TMP22:%.*]] = load i16, ptr [[INCDEC_PTR51]], align 2
	; KRYO-NEXT: [[CONV65:%.*]] = zext i16 [[TMP30]] to i32			; KRYO-NEXT: [[CONV61:%.*]] = zext i16 [[TMP22]] to i64
				; KRYO-NEXT: [[SUB62:%.*]] = sub nsw i64 [[CONV59]], [[CONV61]]
				; KRYO-NEXT: [[ARRAYIDX64:%.*]] = getelementptr inbounds i16, ptr [[G]], i64 [[SUB62]]
				; KRYO-NEXT: [[TMP23:%.*]] = load i16, ptr [[ARRAYIDX64]], align 2
				; KRYO-NEXT: [[CONV65:%.*]] = zext i16 [[TMP23]] to i32
	; KRYO-NEXT: [[ADD66]] = add nsw i32 [[ADD57]], [[CONV65]]			; KRYO-NEXT: [[ADD66]] = add nsw i32 [[ADD57]], [[CONV65]]
	; KRYO-NEXT: [[INC]] = add nuw nsw i32 [[I_0103]], 1			; KRYO-NEXT: [[INC]] = add nuw nsw i32 [[I_0103]], 1
	; KRYO-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], [[N]]			; KRYO-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], [[N]]
	; KRYO-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP_LOOPEXIT]], label [[FOR_BODY]]			; KRYO-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP_LOOPEXIT]], label [[FOR_BODY]]
	;			;
	entry:			entry:
	%cmp.99 = icmp sgt i32 %n, 0			%cmp.99 = icmp sgt i32 %n, 0
	br i1 %cmp.99, label %for.body.preheader, label %for.cond.cleanup			br i1 %cmp.99, label %for.body.preheader, label %for.cond.cleanup
	▲ Show 20 Lines • Show All 106 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/getelementptr.ll

Show All 18 Lines
; sum += g[2i + w]; sum += g[2i + x];		; sum += g[2i + w]; sum += g[2i + x];
; sum += g[2i + y]; sum += g[2i + z];		; sum += g[2i + y]; sum += g[2i + z];
; }		; }
; return sum;		; return sum;
; }		; }
;		;

; YAML-LABEL: Function: getelementptr_4x32		; YAML-LABEL: Function: getelementptr_4x32
; YAML: --- !Passed		; YAML: --- !Missed
; YAML-NEXT: Pass: slp-vectorizer		; YAML-NEXT: Pass: slp-vectorizer
; YAML-NEXT: Name: VectorizedList		; YAML-NEXT: Name: NotBeneficial
; YAML-NEXT: Function: getelementptr_4x32		; YAML-NEXT: Function: getelementptr_4x32
; YAML-NEXT: Args:		; YAML-NEXT: Args:
; YAML-NEXT: - String: 'SLP vectorized with cost '		; YAML-NEXT: - String: 'List vectorization was possible but not beneficial with cost '
; YAML-NEXT: - Cost: '6'		; YAML-NEXT: - Cost: '-7'
; YAML-NEXT: - String: ' and with tree size '		; YAML-NEXT: - String: ' >= '
; YAML-NEXT: - TreeSize: '3'		; YAML-NEXT: - Treshold: '7'

; YAML: --- !Passed		; YAML: --- !Missed
; YAML-NEXT: Pass: slp-vectorizer		; YAML-NEXT: Pass: slp-vectorizer
; YAML-NEXT: Name: VectorizedList		; YAML-NEXT: Name: NotBeneficial
; YAML-NEXT: Function: getelementptr_4x32		; YAML-NEXT: Function: getelementptr_4x32
; YAML-NEXT: Args:		; YAML-NEXT: Args:
; YAML-NEXT: - String: 'SLP vectorized with cost '		; YAML-NEXT: - String: 'List vectorization was possible but not beneficial with cost '
; YAML-NEXT: - Cost: '6'		; YAML-NEXT: - Cost: '-7'
; YAML-NEXT: - String: ' and with tree size '		; YAML-NEXT: - String: ' >= '
; YAML-NEXT: - TreeSize: '3'		; YAML-NEXT: - Treshold: '7'

define i32 @getelementptr_4x32(ptr nocapture readonly %g, i32 %n, i32 %x, i32 %y, i32 %z) {		define i32 @getelementptr_4x32(ptr nocapture readonly %g, i32 %n, i32 %x, i32 %y, i32 %z) {
; CHECK-LABEL: @getelementptr_4x32(		; CHECK-LABEL: @getelementptr_4x32(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[CMP31:%.]] = icmp sgt i32 [[N:%.]], 0		; CHECK-NEXT: [[CMP31:%.]] = icmp sgt i32 [[N:%.]], 0
; CHECK-NEXT: br i1 [[CMP31]], label [[FOR_BODY_PREHEADER:%.]], label [[FOR_COND_CLEANUP:%.]]		; CHECK-NEXT: br i1 [[CMP31]], label [[FOR_BODY_PREHEADER:%.]], label [[FOR_COND_CLEANUP:%.]]
; CHECK: for.body.preheader:		; CHECK: for.body.preheader:
; CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x i32> <i32 0, i32 poison>, i32 [[X:%.]], i64 1
; CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x i32> poison, i32 [[Y:%.]], i64 0
; CHECK-NEXT: [[TMP2:%.]] = insertelement <2 x i32> [[TMP1]], i32 [[Z:%.]], i64 1
; CHECK-NEXT: br label [[FOR_BODY:%.*]]		; CHECK-NEXT: br label [[FOR_BODY:%.*]]
; CHECK: for.cond.cleanup.loopexit:		; CHECK: for.cond.cleanup.loopexit:
; CHECK-NEXT: br label [[FOR_COND_CLEANUP]]		; CHECK-NEXT: br label [[FOR_COND_CLEANUP]]
; CHECK: for.cond.cleanup:		; CHECK: for.cond.cleanup:
; CHECK-NEXT: [[SUM_0_LCSSA:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[ADD16:%.]], [[FOR_COND_CLEANUP_LOOPEXIT:%.]] ]		; CHECK-NEXT: [[SUM_0_LCSSA:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[ADD16:%.]], [[FOR_COND_CLEANUP_LOOPEXIT:%.]] ]
; CHECK-NEXT: ret i32 [[SUM_0_LCSSA]]		; CHECK-NEXT: ret i32 [[SUM_0_LCSSA]]
; CHECK: for.body:		; CHECK: for.body:
; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i32 [ 0, [[FOR_BODY_PREHEADER]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]		; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i32 [ 0, [[FOR_BODY_PREHEADER]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]
; CHECK-NEXT: [[SUM_032:%.*]] = phi i32 [ 0, [[FOR_BODY_PREHEADER]] ], [ [[ADD16]], [[FOR_BODY]] ]		; CHECK-NEXT: [[SUM_032:%.*]] = phi i32 [ 0, [[FOR_BODY_PREHEADER]] ], [ [[ADD16]], [[FOR_BODY]] ]
; CHECK-NEXT: [[T4:%.*]] = shl nuw nsw i32 [[INDVARS_IV]], 1		; CHECK-NEXT: [[T4:%.*]] = shl nuw nsw i32 [[INDVARS_IV]], 1
; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x i32> poison, i32 [[T4]], i64 0		; CHECK-NEXT: [[TMP0:%.*]] = zext i32 [[T4]] to i64
; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <2 x i32> [[TMP3]], <2 x i32> poison, <2 x i32> zeroinitializer		; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, ptr [[G:%.]], i64 [[TMP0]]
; CHECK-NEXT: [[TMP5:%.*]] = add nsw <2 x i32> [[TMP4]], [[TMP0]]
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x i32> [[TMP5]], i64 0
; CHECK-NEXT: [[TMP7:%.*]] = zext i32 [[TMP6]] to i64
; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, ptr [[G:%.]], i64 [[TMP7]]
; CHECK-NEXT: [[T6:%.*]] = load i32, ptr [[ARRAYIDX]], align 4		; CHECK-NEXT: [[T6:%.*]] = load i32, ptr [[ARRAYIDX]], align 4
; CHECK-NEXT: [[ADD1:%.*]] = add nsw i32 [[T6]], [[SUM_032]]		; CHECK-NEXT: [[ADD1:%.*]] = add nsw i32 [[T6]], [[SUM_032]]
; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x i32> [[TMP5]], i64 1		; CHECK-NEXT: [[T7:%.]] = add nsw i32 [[T4]], [[X:%.]]
; CHECK-NEXT: [[TMP9:%.*]] = sext i32 [[TMP8]] to i64		; CHECK-NEXT: [[TMP1:%.*]] = sext i32 [[T7]] to i64
; CHECK-NEXT: [[ARRAYIDX5:%.*]] = getelementptr inbounds i32, ptr [[G]], i64 [[TMP9]]		; CHECK-NEXT: [[ARRAYIDX5:%.*]] = getelementptr inbounds i32, ptr [[G]], i64 [[TMP1]]
; CHECK-NEXT: [[T8:%.*]] = load i32, ptr [[ARRAYIDX5]], align 4		; CHECK-NEXT: [[T8:%.*]] = load i32, ptr [[ARRAYIDX5]], align 4
; CHECK-NEXT: [[ADD6:%.*]] = add nsw i32 [[ADD1]], [[T8]]		; CHECK-NEXT: [[ADD6:%.*]] = add nsw i32 [[ADD1]], [[T8]]
; CHECK-NEXT: [[TMP10:%.*]] = add nsw <2 x i32> [[TMP4]], [[TMP2]]		; CHECK-NEXT: [[T9:%.]] = add nsw i32 [[T4]], [[Y:%.]]
; CHECK-NEXT: [[TMP11:%.*]] = extractelement <2 x i32> [[TMP10]], i64 0		; CHECK-NEXT: [[TMP2:%.*]] = sext i32 [[T9]] to i64
; CHECK-NEXT: [[TMP12:%.*]] = sext i32 [[TMP11]] to i64		; CHECK-NEXT: [[ARRAYIDX10:%.*]] = getelementptr inbounds i32, ptr [[G]], i64 [[TMP2]]
; CHECK-NEXT: [[ARRAYIDX10:%.*]] = getelementptr inbounds i32, ptr [[G]], i64 [[TMP12]]
; CHECK-NEXT: [[T10:%.*]] = load i32, ptr [[ARRAYIDX10]], align 4		; CHECK-NEXT: [[T10:%.*]] = load i32, ptr [[ARRAYIDX10]], align 4
; CHECK-NEXT: [[ADD11:%.*]] = add nsw i32 [[ADD6]], [[T10]]		; CHECK-NEXT: [[ADD11:%.*]] = add nsw i32 [[ADD6]], [[T10]]
; CHECK-NEXT: [[TMP13:%.*]] = extractelement <2 x i32> [[TMP10]], i64 1		; CHECK-NEXT: [[T11:%.]] = add nsw i32 [[T4]], [[Z:%.]]
; CHECK-NEXT: [[TMP14:%.*]] = sext i32 [[TMP13]] to i64		; CHECK-NEXT: [[TMP3:%.*]] = sext i32 [[T11]] to i64
; CHECK-NEXT: [[ARRAYIDX15:%.*]] = getelementptr inbounds i32, ptr [[G]], i64 [[TMP14]]		; CHECK-NEXT: [[ARRAYIDX15:%.*]] = getelementptr inbounds i32, ptr [[G]], i64 [[TMP3]]
; CHECK-NEXT: [[T12:%.*]] = load i32, ptr [[ARRAYIDX15]], align 4		; CHECK-NEXT: [[T12:%.*]] = load i32, ptr [[ARRAYIDX15]], align 4
; CHECK-NEXT: [[ADD16]] = add nsw i32 [[ADD11]], [[T12]]		; CHECK-NEXT: [[ADD16]] = add nsw i32 [[ADD11]], [[T12]]
; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i32 [[INDVARS_IV]], 1		; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i32 [[INDVARS_IV]], 1
; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INDVARS_IV_NEXT]], [[N]]		; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INDVARS_IV_NEXT]], [[N]]
; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP_LOOPEXIT]], label [[FOR_BODY]]		; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP_LOOPEXIT]], label [[FOR_BODY]]
;		;
entry:		entry:
%cmp31 = icmp sgt i32 %n, 0		%cmp31 = icmp sgt i32 %n, 0
Show All 30 Lines	for.body:
%t12 = load i32, ptr %arrayidx15, align 4		%t12 = load i32, ptr %arrayidx15, align 4
%add16 = add nsw i32 %add11, %t12		%add16 = add nsw i32 %add11, %t12
%indvars.iv.next = add nuw nsw i32 %indvars.iv, 1		%indvars.iv.next = add nuw nsw i32 %indvars.iv, 1
%exitcond = icmp eq i32 %indvars.iv.next , %n		%exitcond = icmp eq i32 %indvars.iv.next , %n
br i1 %exitcond, label %for.cond.cleanup.loopexit, label %for.body		br i1 %exitcond, label %for.cond.cleanup.loopexit, label %for.body
}		}

; YAML-LABEL: Function: getelementptr_2x32		; YAML-LABEL: Function: getelementptr_2x32
; YAML: --- !Passed		; YAML: --- !Missed
; YAML-NEXT: Pass: slp-vectorizer		; YAML-NEXT: Pass: slp-vectorizer
; YAML-NEXT: Name: VectorizedList		; YAML-NEXT: Name: NotBeneficial
; YAML-NEXT: Function: getelementptr_2x32		; YAML-NEXT: Function: getelementptr_2x32
; YAML-NEXT: Args:		; YAML-NEXT: Args:
; YAML-NEXT: - String: 'SLP vectorized with cost '		; YAML-NEXT: - String: 'List vectorization was possible but not beneficial with cost '
; YAML-NEXT: - Cost: '6'		; YAML-NEXT: - Cost: '-7'
; YAML-NEXT: - String: ' and with tree size '		; YAML-NEXT: - String: ' >= '
; YAML-NEXT: - TreeSize: '3'		; YAML-NEXT: - Treshold: '7'

define i32 @getelementptr_2x32(ptr nocapture readonly %g, i32 %n, i32 %x, i32 %y, i32 %z) {		define i32 @getelementptr_2x32(ptr nocapture readonly %g, i32 %n, i32 %x, i32 %y, i32 %z) {
; CHECK-LABEL: @getelementptr_2x32(		; CHECK-LABEL: @getelementptr_2x32(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[CMP31:%.]] = icmp sgt i32 [[N:%.]], 0		; CHECK-NEXT: [[CMP31:%.]] = icmp sgt i32 [[N:%.]], 0
; CHECK-NEXT: br i1 [[CMP31]], label [[FOR_BODY_PREHEADER:%.]], label [[FOR_COND_CLEANUP:%.]]		; CHECK-NEXT: br i1 [[CMP31]], label [[FOR_BODY_PREHEADER:%.]], label [[FOR_COND_CLEANUP:%.]]
; CHECK: for.body.preheader:		; CHECK: for.body.preheader:
; CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x i32> poison, i32 [[Y:%.]], i64 0
; CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x i32> [[TMP0]], i32 [[Z:%.]], i64 1
; CHECK-NEXT: br label [[FOR_BODY:%.*]]		; CHECK-NEXT: br label [[FOR_BODY:%.*]]
; CHECK: for.cond.cleanup.loopexit:		; CHECK: for.cond.cleanup.loopexit:
; CHECK-NEXT: br label [[FOR_COND_CLEANUP]]		; CHECK-NEXT: br label [[FOR_COND_CLEANUP]]
; CHECK: for.cond.cleanup:		; CHECK: for.cond.cleanup:
; CHECK-NEXT: [[SUM_0_LCSSA:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[ADD16:%.]], [[FOR_COND_CLEANUP_LOOPEXIT:%.]] ]		; CHECK-NEXT: [[SUM_0_LCSSA:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[ADD16:%.]], [[FOR_COND_CLEANUP_LOOPEXIT:%.]] ]
; CHECK-NEXT: ret i32 [[SUM_0_LCSSA]]		; CHECK-NEXT: ret i32 [[SUM_0_LCSSA]]
; CHECK: for.body:		; CHECK: for.body:
; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i32 [ 0, [[FOR_BODY_PREHEADER]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]		; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i32 [ 0, [[FOR_BODY_PREHEADER]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]
; CHECK-NEXT: [[SUM_032:%.*]] = phi i32 [ 0, [[FOR_BODY_PREHEADER]] ], [ [[ADD16]], [[FOR_BODY]] ]		; CHECK-NEXT: [[SUM_032:%.*]] = phi i32 [ 0, [[FOR_BODY_PREHEADER]] ], [ [[ADD16]], [[FOR_BODY]] ]
; CHECK-NEXT: [[T4:%.*]] = shl nuw nsw i32 [[INDVARS_IV]], 1		; CHECK-NEXT: [[T4:%.*]] = shl nuw nsw i32 [[INDVARS_IV]], 1
; CHECK-NEXT: [[TMP2:%.*]] = zext i32 [[T4]] to i64		; CHECK-NEXT: [[TMP0:%.*]] = zext i32 [[T4]] to i64
; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, ptr [[G:%.]], i64 [[TMP2]]		; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, ptr [[G:%.]], i64 [[TMP0]]
; CHECK-NEXT: [[T6:%.*]] = load i32, ptr [[ARRAYIDX]], align 4		; CHECK-NEXT: [[T6:%.*]] = load i32, ptr [[ARRAYIDX]], align 4
; CHECK-NEXT: [[ADD1:%.*]] = add nsw i32 [[T6]], [[SUM_032]]		; CHECK-NEXT: [[ADD1:%.*]] = add nsw i32 [[T6]], [[SUM_032]]
; CHECK-NEXT: [[T7:%.*]] = or i32 [[T4]], 1		; CHECK-NEXT: [[T7:%.*]] = or i32 [[T4]], 1
; CHECK-NEXT: [[TMP3:%.*]] = zext i32 [[T7]] to i64		; CHECK-NEXT: [[TMP1:%.*]] = zext i32 [[T7]] to i64
; CHECK-NEXT: [[ARRAYIDX5:%.*]] = getelementptr inbounds i32, ptr [[G]], i64 [[TMP3]]		; CHECK-NEXT: [[ARRAYIDX5:%.*]] = getelementptr inbounds i32, ptr [[G]], i64 [[TMP1]]
; CHECK-NEXT: [[T8:%.*]] = load i32, ptr [[ARRAYIDX5]], align 4		; CHECK-NEXT: [[T8:%.*]] = load i32, ptr [[ARRAYIDX5]], align 4
; CHECK-NEXT: [[ADD6:%.*]] = add nsw i32 [[ADD1]], [[T8]]		; CHECK-NEXT: [[ADD6:%.*]] = add nsw i32 [[ADD1]], [[T8]]
; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x i32> poison, i32 [[T4]], i64 0		; CHECK-NEXT: [[T9:%.]] = add nsw i32 [[T4]], [[Y:%.]]
; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <2 x i32> [[TMP4]], <2 x i32> poison, <2 x i32> zeroinitializer		; CHECK-NEXT: [[TMP2:%.*]] = sext i32 [[T9]] to i64
; CHECK-NEXT: [[TMP6:%.*]] = add nsw <2 x i32> [[TMP5]], [[TMP1]]		; CHECK-NEXT: [[ARRAYIDX10:%.*]] = getelementptr inbounds i32, ptr [[G]], i64 [[TMP2]]
; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x i32> [[TMP6]], i64 0
; CHECK-NEXT: [[TMP8:%.*]] = sext i32 [[TMP7]] to i64
; CHECK-NEXT: [[ARRAYIDX10:%.*]] = getelementptr inbounds i32, ptr [[G]], i64 [[TMP8]]
; CHECK-NEXT: [[T10:%.*]] = load i32, ptr [[ARRAYIDX10]], align 4		; CHECK-NEXT: [[T10:%.*]] = load i32, ptr [[ARRAYIDX10]], align 4
; CHECK-NEXT: [[ADD11:%.*]] = add nsw i32 [[ADD6]], [[T10]]		; CHECK-NEXT: [[ADD11:%.*]] = add nsw i32 [[ADD6]], [[T10]]
; CHECK-NEXT: [[TMP9:%.*]] = extractelement <2 x i32> [[TMP6]], i64 1		; CHECK-NEXT: [[T11:%.]] = add nsw i32 [[T4]], [[Z:%.]]
; CHECK-NEXT: [[TMP10:%.*]] = sext i32 [[TMP9]] to i64		; CHECK-NEXT: [[TMP3:%.*]] = sext i32 [[T11]] to i64
; CHECK-NEXT: [[ARRAYIDX15:%.*]] = getelementptr inbounds i32, ptr [[G]], i64 [[TMP10]]		; CHECK-NEXT: [[ARRAYIDX15:%.*]] = getelementptr inbounds i32, ptr [[G]], i64 [[TMP3]]
; CHECK-NEXT: [[T12:%.*]] = load i32, ptr [[ARRAYIDX15]], align 4		; CHECK-NEXT: [[T12:%.*]] = load i32, ptr [[ARRAYIDX15]], align 4
; CHECK-NEXT: [[ADD16]] = add nsw i32 [[ADD11]], [[T12]]		; CHECK-NEXT: [[ADD16]] = add nsw i32 [[ADD11]], [[T12]]
; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i32 [[INDVARS_IV]], 1		; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i32 [[INDVARS_IV]], 1
; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INDVARS_IV_NEXT]], [[N]]		; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INDVARS_IV_NEXT]], [[N]]
; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP_LOOPEXIT]], label [[FOR_BODY]]		; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP_LOOPEXIT]], label [[FOR_BODY]]
;		;
entry:		entry:
%cmp31 = icmp sgt i32 %n, 0		%cmp31 = icmp sgt i32 %n, 0
Show All 37 Lines
@global = internal global { ptr } zeroinitializer, align 8		@global = internal global { ptr } zeroinitializer, align 8

; Make sure we vectorize to maximize the load with when loading i16 and		; Make sure we vectorize to maximize the load with when loading i16 and
; extending it for compute operations.		; extending it for compute operations.
define void @test_i16_extend(ptr %p.1, ptr %p.2, i32 %idx.i32) {		define void @test_i16_extend(ptr %p.1, ptr %p.2, i32 %idx.i32) {
; CHECK-LABEL: @test_i16_extend(		; CHECK-LABEL: @test_i16_extend(
; CHECK-NEXT: [[P_0:%.*]] = load ptr, ptr @global, align 8		; CHECK-NEXT: [[P_0:%.*]] = load ptr, ptr @global, align 8
; CHECK-NEXT: [[IDX_0:%.]] = zext i32 [[IDX_I32:%.]] to i64		; CHECK-NEXT: [[IDX_0:%.]] = zext i32 [[IDX_I32:%.]] to i64
		; CHECK-NEXT: [[IDX_1:%.*]] = add nuw nsw i64 [[IDX_0]], 1
		; CHECK-NEXT: [[IDX_2:%.*]] = add nuw nsw i64 [[IDX_0]], 2
		; CHECK-NEXT: [[IDX_3:%.*]] = add nuw nsw i64 [[IDX_0]], 3
		; CHECK-NEXT: [[IDX_4:%.*]] = add nuw nsw i64 [[IDX_0]], 4
		; CHECK-NEXT: [[IDX_5:%.*]] = add nuw nsw i64 [[IDX_0]], 5
		; CHECK-NEXT: [[IDX_6:%.*]] = add nuw nsw i64 [[IDX_0]], 6
		; CHECK-NEXT: [[IDX_7:%.*]] = add nuw nsw i64 [[IDX_0]], 7
; CHECK-NEXT: [[T53:%.]] = getelementptr inbounds i16, ptr [[P_1:%.]], i64 [[IDX_0]]		; CHECK-NEXT: [[T53:%.]] = getelementptr inbounds i16, ptr [[P_1:%.]], i64 [[IDX_0]]
		; CHECK-NEXT: [[OP1_L:%.*]] = load i16, ptr [[T53]], align 2
		; CHECK-NEXT: [[OP1_EXT:%.*]] = zext i16 [[OP1_L]] to i64
; CHECK-NEXT: [[T56:%.]] = getelementptr inbounds i16, ptr [[P_2:%.]], i64 [[IDX_0]]		; CHECK-NEXT: [[T56:%.]] = getelementptr inbounds i16, ptr [[P_2:%.]], i64 [[IDX_0]]
; CHECK-NEXT: [[TMP2:%.*]] = load <8 x i16>, ptr [[T53]], align 2		; CHECK-NEXT: [[OP2_L:%.*]] = load i16, ptr [[T56]], align 2
; CHECK-NEXT: [[TMP3:%.*]] = zext <8 x i16> [[TMP2]] to <8 x i32>		; CHECK-NEXT: [[OP2_EXT:%.*]] = zext i16 [[OP2_L]] to i64
; CHECK-NEXT: [[TMP5:%.*]] = load <8 x i16>, ptr [[T56]], align 2		; CHECK-NEXT: [[SUB_1:%.*]] = sub nsw i64 [[OP1_EXT]], [[OP2_EXT]]
; CHECK-NEXT: [[TMP6:%.*]] = zext <8 x i16> [[TMP5]] to <8 x i32>		; CHECK-NEXT: [[T60:%.*]] = getelementptr inbounds i32, ptr [[P_0]], i64 [[SUB_1]]
; CHECK-NEXT: [[TMP7:%.*]] = sub nsw <8 x i32> [[TMP3]], [[TMP6]]
; CHECK-NEXT: [[TMP8:%.*]] = extractelement <8 x i32> [[TMP7]], i64 0
; CHECK-NEXT: [[TMP9:%.*]] = sext i32 [[TMP8]] to i64
; CHECK-NEXT: [[T60:%.*]] = getelementptr inbounds i32, ptr [[P_0]], i64 [[TMP9]]
; CHECK-NEXT: [[L_1:%.*]] = load i32, ptr [[T60]], align 4		; CHECK-NEXT: [[L_1:%.*]] = load i32, ptr [[T60]], align 4
; CHECK-NEXT: [[TMP10:%.*]] = extractelement <8 x i32> [[TMP7]], i64 1		; CHECK-NEXT: [[T64:%.*]] = getelementptr inbounds i16, ptr [[P_1]], i64 [[IDX_1]]
; CHECK-NEXT: [[TMP11:%.*]] = sext i32 [[TMP10]] to i64		; CHECK-NEXT: [[T65:%.*]] = load i16, ptr [[T64]], align 2
; CHECK-NEXT: [[T71:%.*]] = getelementptr inbounds i32, ptr [[P_0]], i64 [[TMP11]]		; CHECK-NEXT: [[T66:%.*]] = zext i16 [[T65]] to i64
		; CHECK-NEXT: [[T67:%.*]] = getelementptr inbounds i16, ptr [[P_2]], i64 [[IDX_1]]
		; CHECK-NEXT: [[T68:%.*]] = load i16, ptr [[T67]], align 2
		; CHECK-NEXT: [[T69:%.*]] = zext i16 [[T68]] to i64
		; CHECK-NEXT: [[SUB_2:%.*]] = sub nsw i64 [[T66]], [[T69]]
		; CHECK-NEXT: [[T71:%.*]] = getelementptr inbounds i32, ptr [[P_0]], i64 [[SUB_2]]
; CHECK-NEXT: [[L_2:%.*]] = load i32, ptr [[T71]], align 4		; CHECK-NEXT: [[L_2:%.*]] = load i32, ptr [[T71]], align 4
; CHECK-NEXT: [[TMP12:%.*]] = extractelement <8 x i32> [[TMP7]], i64 2		; CHECK-NEXT: [[T75:%.*]] = getelementptr inbounds i16, ptr [[P_1]], i64 [[IDX_2]]
; CHECK-NEXT: [[TMP13:%.*]] = sext i32 [[TMP12]] to i64		; CHECK-NEXT: [[T76:%.*]] = load i16, ptr [[T75]], align 2
; CHECK-NEXT: [[T82:%.*]] = getelementptr inbounds i32, ptr [[P_0]], i64 [[TMP13]]		; CHECK-NEXT: [[T77:%.*]] = zext i16 [[T76]] to i64
		; CHECK-NEXT: [[T78:%.*]] = getelementptr inbounds i16, ptr [[P_2]], i64 [[IDX_2]]
		; CHECK-NEXT: [[T79:%.*]] = load i16, ptr [[T78]], align 2
		; CHECK-NEXT: [[T80:%.*]] = zext i16 [[T79]] to i64
		; CHECK-NEXT: [[SUB_3:%.*]] = sub nsw i64 [[T77]], [[T80]]
		; CHECK-NEXT: [[T82:%.*]] = getelementptr inbounds i32, ptr [[P_0]], i64 [[SUB_3]]
; CHECK-NEXT: [[L_3:%.*]] = load i32, ptr [[T82]], align 4		; CHECK-NEXT: [[L_3:%.*]] = load i32, ptr [[T82]], align 4
; CHECK-NEXT: [[TMP14:%.*]] = extractelement <8 x i32> [[TMP7]], i64 3		; CHECK-NEXT: [[T86:%.*]] = getelementptr inbounds i16, ptr [[P_1]], i64 [[IDX_3]]
; CHECK-NEXT: [[TMP15:%.*]] = sext i32 [[TMP14]] to i64		; CHECK-NEXT: [[T87:%.*]] = load i16, ptr [[T86]], align 2
; CHECK-NEXT: [[T93:%.*]] = getelementptr inbounds i32, ptr [[P_0]], i64 [[TMP15]]		; CHECK-NEXT: [[T88:%.*]] = zext i16 [[T87]] to i64
		; CHECK-NEXT: [[T89:%.*]] = getelementptr inbounds i16, ptr [[P_2]], i64 [[IDX_3]]
		; CHECK-NEXT: [[T90:%.*]] = load i16, ptr [[T89]], align 2
		; CHECK-NEXT: [[T91:%.*]] = zext i16 [[T90]] to i64
		; CHECK-NEXT: [[SUB_4:%.*]] = sub nsw i64 [[T88]], [[T91]]
		; CHECK-NEXT: [[T93:%.*]] = getelementptr inbounds i32, ptr [[P_0]], i64 [[SUB_4]]
; CHECK-NEXT: [[L_4:%.*]] = load i32, ptr [[T93]], align 4		; CHECK-NEXT: [[L_4:%.*]] = load i32, ptr [[T93]], align 4
; CHECK-NEXT: [[TMP16:%.*]] = extractelement <8 x i32> [[TMP7]], i64 4		; CHECK-NEXT: [[T97:%.*]] = getelementptr inbounds i16, ptr [[P_1]], i64 [[IDX_4]]
; CHECK-NEXT: [[TMP17:%.*]] = sext i32 [[TMP16]] to i64		; CHECK-NEXT: [[T98:%.*]] = load i16, ptr [[T97]], align 2
; CHECK-NEXT: [[T104:%.*]] = getelementptr inbounds i32, ptr [[P_0]], i64 [[TMP17]]		; CHECK-NEXT: [[T99:%.*]] = zext i16 [[T98]] to i64
		; CHECK-NEXT: [[T100:%.*]] = getelementptr inbounds i16, ptr [[P_2]], i64 [[IDX_4]]
		; CHECK-NEXT: [[T101:%.*]] = load i16, ptr [[T100]], align 2
		; CHECK-NEXT: [[T102:%.*]] = zext i16 [[T101]] to i64
		; CHECK-NEXT: [[SUB_5:%.*]] = sub nsw i64 [[T99]], [[T102]]
		; CHECK-NEXT: [[T104:%.*]] = getelementptr inbounds i32, ptr [[P_0]], i64 [[SUB_5]]
; CHECK-NEXT: [[L_5:%.*]] = load i32, ptr [[T104]], align 4		; CHECK-NEXT: [[L_5:%.*]] = load i32, ptr [[T104]], align 4
; CHECK-NEXT: [[TMP18:%.*]] = extractelement <8 x i32> [[TMP7]], i64 5		; CHECK-NEXT: [[T108:%.*]] = getelementptr inbounds i16, ptr [[P_1]], i64 [[IDX_5]]
; CHECK-NEXT: [[TMP19:%.*]] = sext i32 [[TMP18]] to i64		; CHECK-NEXT: [[T109:%.*]] = load i16, ptr [[T108]], align 2
; CHECK-NEXT: [[T115:%.*]] = getelementptr inbounds i32, ptr [[P_0]], i64 [[TMP19]]		; CHECK-NEXT: [[T110:%.*]] = zext i16 [[T109]] to i64
		; CHECK-NEXT: [[T111:%.*]] = getelementptr inbounds i16, ptr [[P_2]], i64 [[IDX_5]]
		; CHECK-NEXT: [[T112:%.*]] = load i16, ptr [[T111]], align 2
		; CHECK-NEXT: [[T113:%.*]] = zext i16 [[T112]] to i64
		; CHECK-NEXT: [[SUB_6:%.*]] = sub nsw i64 [[T110]], [[T113]]
		; CHECK-NEXT: [[T115:%.*]] = getelementptr inbounds i32, ptr [[P_0]], i64 [[SUB_6]]
; CHECK-NEXT: [[L_6:%.*]] = load i32, ptr [[T115]], align 4		; CHECK-NEXT: [[L_6:%.*]] = load i32, ptr [[T115]], align 4
; CHECK-NEXT: [[TMP20:%.*]] = extractelement <8 x i32> [[TMP7]], i64 6		; CHECK-NEXT: [[T119:%.*]] = getelementptr inbounds i16, ptr [[P_1]], i64 [[IDX_6]]
; CHECK-NEXT: [[TMP21:%.*]] = sext i32 [[TMP20]] to i64		; CHECK-NEXT: [[T120:%.*]] = load i16, ptr [[T119]], align 2
; CHECK-NEXT: [[T126:%.*]] = getelementptr inbounds i32, ptr [[P_0]], i64 [[TMP21]]		; CHECK-NEXT: [[T121:%.*]] = zext i16 [[T120]] to i64
		; CHECK-NEXT: [[T122:%.*]] = getelementptr inbounds i16, ptr [[P_2]], i64 [[IDX_6]]
		; CHECK-NEXT: [[T123:%.*]] = load i16, ptr [[T122]], align 2
		; CHECK-NEXT: [[T124:%.*]] = zext i16 [[T123]] to i64
		; CHECK-NEXT: [[SUB_7:%.*]] = sub nsw i64 [[T121]], [[T124]]
		; CHECK-NEXT: [[T126:%.*]] = getelementptr inbounds i32, ptr [[P_0]], i64 [[SUB_7]]
; CHECK-NEXT: [[L_7:%.*]] = load i32, ptr [[T126]], align 4		; CHECK-NEXT: [[L_7:%.*]] = load i32, ptr [[T126]], align 4
; CHECK-NEXT: [[TMP22:%.*]] = extractelement <8 x i32> [[TMP7]], i64 7		; CHECK-NEXT: [[T130:%.*]] = getelementptr inbounds i16, ptr [[P_1]], i64 [[IDX_7]]
; CHECK-NEXT: [[TMP23:%.*]] = sext i32 [[TMP22]] to i64		; CHECK-NEXT: [[T131:%.*]] = load i16, ptr [[T130]], align 2
; CHECK-NEXT: [[T137:%.*]] = getelementptr inbounds i32, ptr [[P_0]], i64 [[TMP23]]		; CHECK-NEXT: [[T132:%.*]] = zext i16 [[T131]] to i64
		; CHECK-NEXT: [[T133:%.*]] = getelementptr inbounds i16, ptr [[P_2]], i64 [[IDX_7]]
		; CHECK-NEXT: [[T134:%.*]] = load i16, ptr [[T133]], align 2
		; CHECK-NEXT: [[T135:%.*]] = zext i16 [[T134]] to i64
		; CHECK-NEXT: [[SUB_8:%.*]] = sub nsw i64 [[T132]], [[T135]]
		; CHECK-NEXT: [[T137:%.*]] = getelementptr inbounds i32, ptr [[P_0]], i64 [[SUB_8]]
; CHECK-NEXT: [[L_8:%.*]] = load i32, ptr [[T137]], align 4		; CHECK-NEXT: [[L_8:%.*]] = load i32, ptr [[T137]], align 4
; CHECK-NEXT: call void @use(i32 [[L_1]], i32 [[L_2]], i32 [[L_3]], i32 [[L_4]], i32 [[L_5]], i32 [[L_6]], i32 [[L_7]], i32 [[L_8]])		; CHECK-NEXT: call void @use(i32 [[L_1]], i32 [[L_2]], i32 [[L_3]], i32 [[L_4]], i32 [[L_5]], i32 [[L_6]], i32 [[L_7]], i32 [[L_8]])
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
%p.0 = load ptr, ptr @global, align 8		%p.0 = load ptr, ptr @global, align 8

%idx.0 = zext i32 %idx.i32 to i64		%idx.0 = zext i32 %idx.i32 to i64
%idx.1 = add nsw i64 %idx.0, 1		%idx.1 = add nsw i64 %idx.0, 1
▲ Show 20 Lines • Show All 100 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/SystemZ/gep-indices.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt -mtriple=s390x-unknown-linux -mcpu=z15 -passes=slp-vectorizer %s -S -o - \
				; RUN: \| FileCheck %s
				;
				; Test that gep indices are not first vectorized and then extracted (into address registers).

				%StructTy = type { i8, i64, i64, i64, i64 }
				declare void @bar(ptr, ptr)

				define void @fun(ptr %Addr) {
				; CHECK-LABEL: @fun(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: br label [[FOR_COND:%.*]]
				; CHECK: for.cond:
				; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[INDVARS_IV_NEXT:%.]], [[FOR_COND]] ], [ 0, [[ENTRY:%.*]] ]
				; CHECK-NEXT: [[P2472:%.]] = getelementptr inbounds [[STRUCTTY:%.]], ptr [[ADDR:%.*]], i64 [[INDVARS_IV]], i32 3
				; CHECK-NEXT: [[TMP0:%.*]] = load i64, ptr [[P2472]], align 8
				; CHECK-NEXT: [[P3476:%.*]] = getelementptr inbounds [[STRUCTTY]], ptr [[ADDR]], i64 [[INDVARS_IV]], i32 4
				; CHECK-NEXT: [[TMP1:%.*]] = load i64, ptr [[P3476]], align 8
				; CHECK-NEXT: [[SEXT:%.*]] = shl i64 [[TMP0]], 32
				; CHECK-NEXT: [[IDXPROM495:%.*]] = ashr exact i64 [[SEXT]], 32
				; CHECK-NEXT: [[ARRAYIDX496:%.*]] = getelementptr inbounds [3 x float], ptr null, i64 [[IDXPROM495]]
				; CHECK-NEXT: [[SEXT4:%.*]] = shl i64 [[TMP1]], 32
				; CHECK-NEXT: [[IDXPROM499:%.*]] = ashr exact i64 [[SEXT4]], 32
				; CHECK-NEXT: [[ARRAYIDX500:%.*]] = getelementptr inbounds [3 x float], ptr null, i64 [[IDXPROM499]]
				; CHECK-NEXT: tail call void @bar(ptr noundef poison, ptr noundef [[ARRAYIDX500]])
				; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add i64 [[INDVARS_IV]], 1
				; CHECK-NEXT: br label [[FOR_COND]]
				;
				entry:
				br label %for.cond

				for.cond:
				%indvars.iv = phi i64 [ %indvars.iv.next, %for.cond ], [ 0, %entry ]
				%P2472 = getelementptr inbounds %StructTy, ptr %Addr, i64 %indvars.iv, i32 3
				%0 = load i64, ptr %P2472, align 8
				%P3476 = getelementptr inbounds %StructTy, ptr %Addr, i64 %indvars.iv, i32 4
				%1 = load i64, ptr %P3476, align 8
				%sext = shl i64 %0, 32
				%idxprom495 = ashr exact i64 %sext, 32
				%arrayidx496 = getelementptr inbounds [3 x float], ptr null, i64 %idxprom495
				%sext4 = shl i64 %1, 32
				%idxprom499 = ashr exact i64 %sext4, 32
				%arrayidx500 = getelementptr inbounds [3 x float], ptr null, i64 %idxprom499
				tail call void @bar(ptr noundef poison, ptr noundef %arrayidx500)
				%indvars.iv.next = add i64 %indvars.iv, 1
				br label %for.cond
				}

llvm/test/Transforms/SLPVectorizer/X86/load-merge-inseltpoison.ll

Show First 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	entry:
%shl10 = shl nuw i32 %conv9, 24		%shl10 = shl nuw i32 %conv9, 24
%or11 = or i32 %or7, %shl10		%or11 = or i32 %or7, %shl10
ret i32 %or11		ret i32 %or11
}		}

define <4 x float> @PR16739_byref(ptr nocapture readonly dereferenceable(16) %x) {		define <4 x float> @PR16739_byref(ptr nocapture readonly dereferenceable(16) %x) {
; CHECK-LABEL: @PR16739_byref(		; CHECK-LABEL: @PR16739_byref(
; CHECK-NEXT: [[GEP2:%.]] = getelementptr inbounds <4 x float>, ptr [[X:%.]], i64 0, i64 2		; CHECK-NEXT: [[GEP2:%.]] = getelementptr inbounds <4 x float>, ptr [[X:%.]], i64 0, i64 2
; CHECK-NEXT: [[TMP2:%.*]] = load <2 x float>, ptr [[X]], align 4		; CHECK-NEXT: [[TMP1:%.*]] = load <2 x float>, ptr [[X]], align 4
; CHECK-NEXT: [[X2:%.*]] = load float, ptr [[GEP2]], align 4		; CHECK-NEXT: [[X2:%.*]] = load float, ptr [[GEP2]], align 4
; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <2 x float> [[TMP1]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
; CHECK-NEXT: [[I2:%.*]] = insertelement <4 x float> [[TMP3]], float [[X2]], i32 2		; CHECK-NEXT: [[I2:%.*]] = insertelement <4 x float> [[TMP2]], float [[X2]], i32 2
; CHECK-NEXT: [[I3:%.*]] = insertelement <4 x float> [[I2]], float [[X2]], i32 3		; CHECK-NEXT: [[I3:%.*]] = insertelement <4 x float> [[I2]], float [[X2]], i32 3
; CHECK-NEXT: ret <4 x float> [[I3]]		; CHECK-NEXT: ret <4 x float> [[I3]]
;		;
%gep1 = getelementptr inbounds <4 x float>, ptr %x, i64 0, i64 1		%gep1 = getelementptr inbounds <4 x float>, ptr %x, i64 0, i64 1
%gep2 = getelementptr inbounds <4 x float>, ptr %x, i64 0, i64 2		%gep2 = getelementptr inbounds <4 x float>, ptr %x, i64 0, i64 2
%x0 = load float, ptr %x		%x0 = load float, ptr %x
%x1 = load float, ptr %gep1		%x1 = load float, ptr %gep1
%x2 = load float, ptr %gep2		%x2 = load float, ptr %gep2
%i0 = insertelement <4 x float> poison, float %x0, i32 0		%i0 = insertelement <4 x float> poison, float %x0, i32 0
%i1 = insertelement <4 x float> %i0, float %x1, i32 1		%i1 = insertelement <4 x float> %i0, float %x1, i32 1
%i2 = insertelement <4 x float> %i1, float %x2, i32 2		%i2 = insertelement <4 x float> %i1, float %x2, i32 2
%i3 = insertelement <4 x float> %i2, float %x2, i32 3		%i3 = insertelement <4 x float> %i2, float %x2, i32 3
ret <4 x float> %i3		ret <4 x float> %i3
}		}

define <4 x float> @PR16739_byref_alt(ptr nocapture readonly dereferenceable(16) %x) {		define <4 x float> @PR16739_byref_alt(ptr nocapture readonly dereferenceable(16) %x) {
; CHECK-LABEL: @PR16739_byref_alt(		; CHECK-LABEL: @PR16739_byref_alt(
; CHECK-NEXT: [[TMP2:%.]] = load <2 x float>, ptr [[X:%.]], align 4		; CHECK-NEXT: [[TMP1:%.]] = load <2 x float>, ptr [[X:%.]], align 4
; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <4 x i32> <i32 0, i32 0, i32 1, i32 1>		; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <2 x float> [[TMP1]], <2 x float> poison, <4 x i32> <i32 0, i32 0, i32 1, i32 1>
; CHECK-NEXT: ret <4 x float> [[SHUFFLE]]		; CHECK-NEXT: ret <4 x float> [[TMP2]]
;		;
%gep1 = getelementptr inbounds <4 x float>, ptr %x, i64 0, i64 1		%gep1 = getelementptr inbounds <4 x float>, ptr %x, i64 0, i64 1
%x0 = load float, ptr %x		%x0 = load float, ptr %x
%x1 = load float, ptr %gep1		%x1 = load float, ptr %gep1
%i0 = insertelement <4 x float> poison, float %x0, i32 0		%i0 = insertelement <4 x float> poison, float %x0, i32 0
%i1 = insertelement <4 x float> %i0, float %x0, i32 1		%i1 = insertelement <4 x float> %i0, float %x0, i32 1
%i2 = insertelement <4 x float> %i1, float %x1, i32 2		%i2 = insertelement <4 x float> %i1, float %x1, i32 2
%i3 = insertelement <4 x float> %i2, float %x1, i32 3		%i3 = insertelement <4 x float> %i2, float %x1, i32 3
Show All 32 Lines	;
%t13 = bitcast i32 %t12 to float		%t13 = bitcast i32 %t12 to float
%t14 = insertelement <4 x float> %t11, float %t13, i32 2		%t14 = insertelement <4 x float> %t11, float %t13, i32 2
%t15 = insertelement <4 x float> %t14, float %t13, i32 3		%t15 = insertelement <4 x float> %t14, float %t13, i32 3
ret <4 x float> %t15		ret <4 x float> %t15
}		}

define void @PR43578_prefer128(ptr %r, ptr %p, ptr %q) #0 {		define void @PR43578_prefer128(ptr %r, ptr %p, ptr %q) #0 {
; CHECK-LABEL: @PR43578_prefer128(		; CHECK-LABEL: @PR43578_prefer128(
; CHECK-NEXT: [[P2:%.]] = getelementptr inbounds i64, ptr [[P:%.]], i64 2		; CHECK-NEXT: [[P1:%.]] = getelementptr inbounds i64, ptr [[P:%.]], i64 1
; CHECK-NEXT: [[Q2:%.]] = getelementptr inbounds i64, ptr [[Q:%.]], i64 2		; CHECK-NEXT: [[P2:%.*]] = getelementptr inbounds i64, ptr [[P]], i64 2
; CHECK-NEXT: [[TMP2:%.*]] = load <2 x i64>, ptr [[P]], align 2		; CHECK-NEXT: [[P3:%.*]] = getelementptr inbounds i64, ptr [[P]], i64 3
; CHECK-NEXT: [[TMP4:%.*]] = load <2 x i64>, ptr [[Q]], align 2		; CHECK-NEXT: [[Q1:%.]] = getelementptr inbounds i64, ptr [[Q:%.]], i64 1
; CHECK-NEXT: [[TMP5:%.*]] = sub nsw <2 x i64> [[TMP2]], [[TMP4]]		; CHECK-NEXT: [[Q2:%.*]] = getelementptr inbounds i64, ptr [[Q]], i64 2
; CHECK-NEXT: [[TMP7:%.*]] = load <2 x i64>, ptr [[P2]], align 2		; CHECK-NEXT: [[Q3:%.*]] = getelementptr inbounds i64, ptr [[Q]], i64 3
; CHECK-NEXT: [[TMP9:%.*]] = load <2 x i64>, ptr [[Q2]], align 2		; CHECK-NEXT: [[X0:%.*]] = load i64, ptr [[P]], align 2
; CHECK-NEXT: [[TMP10:%.*]] = sub nsw <2 x i64> [[TMP7]], [[TMP9]]		; CHECK-NEXT: [[X1:%.*]] = load i64, ptr [[P1]], align 2
; CHECK-NEXT: [[TMP11:%.*]] = extractelement <2 x i64> [[TMP5]], i32 0		; CHECK-NEXT: [[X2:%.*]] = load i64, ptr [[P2]], align 2
; CHECK-NEXT: [[G0:%.]] = getelementptr inbounds i32, ptr [[R:%.]], i64 [[TMP11]]		; CHECK-NEXT: [[X3:%.*]] = load i64, ptr [[P3]], align 2
; CHECK-NEXT: [[TMP12:%.*]] = extractelement <2 x i64> [[TMP5]], i32 1		; CHECK-NEXT: [[Y0:%.*]] = load i64, ptr [[Q]], align 2
; CHECK-NEXT: [[G1:%.*]] = getelementptr inbounds i32, ptr [[R]], i64 [[TMP12]]		; CHECK-NEXT: [[Y1:%.*]] = load i64, ptr [[Q1]], align 2
; CHECK-NEXT: [[TMP13:%.*]] = extractelement <2 x i64> [[TMP10]], i32 0		; CHECK-NEXT: [[Y2:%.*]] = load i64, ptr [[Q2]], align 2
; CHECK-NEXT: [[G2:%.*]] = getelementptr inbounds i32, ptr [[R]], i64 [[TMP13]]		; CHECK-NEXT: [[Y3:%.*]] = load i64, ptr [[Q3]], align 2
; CHECK-NEXT: [[TMP14:%.*]] = extractelement <2 x i64> [[TMP10]], i32 1		; CHECK-NEXT: [[SUB0:%.*]] = sub nsw i64 [[X0]], [[Y0]]
; CHECK-NEXT: [[G3:%.*]] = getelementptr inbounds i32, ptr [[R]], i64 [[TMP14]]		; CHECK-NEXT: [[SUB1:%.*]] = sub nsw i64 [[X1]], [[Y1]]
		; CHECK-NEXT: [[SUB2:%.*]] = sub nsw i64 [[X2]], [[Y2]]
		; CHECK-NEXT: [[SUB3:%.*]] = sub nsw i64 [[X3]], [[Y3]]
		; CHECK-NEXT: [[G0:%.]] = getelementptr inbounds i32, ptr [[R:%.]], i64 [[SUB0]]
		; CHECK-NEXT: [[G1:%.*]] = getelementptr inbounds i32, ptr [[R]], i64 [[SUB1]]
		; CHECK-NEXT: [[G2:%.*]] = getelementptr inbounds i32, ptr [[R]], i64 [[SUB2]]
		; CHECK-NEXT: [[G3:%.*]] = getelementptr inbounds i32, ptr [[R]], i64 [[SUB3]]
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
%p1 = getelementptr inbounds i64, ptr %p, i64 1		%p1 = getelementptr inbounds i64, ptr %p, i64 1
%p2 = getelementptr inbounds i64, ptr %p, i64 2		%p2 = getelementptr inbounds i64, ptr %p, i64 2
%p3 = getelementptr inbounds i64, ptr %p, i64 3		%p3 = getelementptr inbounds i64, ptr %p, i64 3

%q1 = getelementptr inbounds i64, ptr %q, i64 1		%q1 = getelementptr inbounds i64, ptr %q, i64 1
%q2 = getelementptr inbounds i64, ptr %q, i64 2		%q2 = getelementptr inbounds i64, ptr %q, i64 2
Show All 25 Lines

llvm/test/Transforms/SLPVectorizer/X86/load-merge.ll

Show First 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	entry:
%shl10 = shl nuw i32 %conv9, 24		%shl10 = shl nuw i32 %conv9, 24
%or11 = or i32 %or7, %shl10		%or11 = or i32 %or7, %shl10
ret i32 %or11		ret i32 %or11
}		}

define <4 x float> @PR16739_byref(ptr nocapture readonly dereferenceable(16) %x) {		define <4 x float> @PR16739_byref(ptr nocapture readonly dereferenceable(16) %x) {
; CHECK-LABEL: @PR16739_byref(		; CHECK-LABEL: @PR16739_byref(
; CHECK-NEXT: [[GEP2:%.]] = getelementptr inbounds <4 x float>, ptr [[X:%.]], i64 0, i64 2		; CHECK-NEXT: [[GEP2:%.]] = getelementptr inbounds <4 x float>, ptr [[X:%.]], i64 0, i64 2
; CHECK-NEXT: [[TMP2:%.*]] = load <2 x float>, ptr [[X]], align 4		; CHECK-NEXT: [[TMP1:%.*]] = load <2 x float>, ptr [[X]], align 4
; CHECK-NEXT: [[X2:%.*]] = load float, ptr [[GEP2]], align 4		; CHECK-NEXT: [[X2:%.*]] = load float, ptr [[GEP2]], align 4
; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <2 x float> [[TMP1]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
; CHECK-NEXT: [[I2:%.*]] = insertelement <4 x float> [[TMP3]], float [[X2]], i32 2		; CHECK-NEXT: [[I2:%.*]] = insertelement <4 x float> [[TMP2]], float [[X2]], i32 2
; CHECK-NEXT: [[I3:%.*]] = insertelement <4 x float> [[I2]], float [[X2]], i32 3		; CHECK-NEXT: [[I3:%.*]] = insertelement <4 x float> [[I2]], float [[X2]], i32 3
; CHECK-NEXT: ret <4 x float> [[I3]]		; CHECK-NEXT: ret <4 x float> [[I3]]
;		;
%gep1 = getelementptr inbounds <4 x float>, ptr %x, i64 0, i64 1		%gep1 = getelementptr inbounds <4 x float>, ptr %x, i64 0, i64 1
%gep2 = getelementptr inbounds <4 x float>, ptr %x, i64 0, i64 2		%gep2 = getelementptr inbounds <4 x float>, ptr %x, i64 0, i64 2
%x0 = load float, ptr %x		%x0 = load float, ptr %x
%x1 = load float, ptr %gep1		%x1 = load float, ptr %gep1
%x2 = load float, ptr %gep2		%x2 = load float, ptr %gep2
%i0 = insertelement <4 x float> undef, float %x0, i32 0		%i0 = insertelement <4 x float> undef, float %x0, i32 0
%i1 = insertelement <4 x float> %i0, float %x1, i32 1		%i1 = insertelement <4 x float> %i0, float %x1, i32 1
%i2 = insertelement <4 x float> %i1, float %x2, i32 2		%i2 = insertelement <4 x float> %i1, float %x2, i32 2
%i3 = insertelement <4 x float> %i2, float %x2, i32 3		%i3 = insertelement <4 x float> %i2, float %x2, i32 3
ret <4 x float> %i3		ret <4 x float> %i3
}		}

define <4 x float> @PR16739_byref_alt(ptr nocapture readonly dereferenceable(16) %x) {		define <4 x float> @PR16739_byref_alt(ptr nocapture readonly dereferenceable(16) %x) {
; CHECK-LABEL: @PR16739_byref_alt(		; CHECK-LABEL: @PR16739_byref_alt(
; CHECK-NEXT: [[TMP2:%.]] = load <2 x float>, ptr [[X:%.]], align 4		; CHECK-NEXT: [[TMP1:%.]] = load <2 x float>, ptr [[X:%.]], align 4
; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <4 x i32> <i32 0, i32 0, i32 1, i32 1>		; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <2 x float> [[TMP1]], <2 x float> poison, <4 x i32> <i32 0, i32 0, i32 1, i32 1>
; CHECK-NEXT: ret <4 x float> [[SHUFFLE]]		; CHECK-NEXT: ret <4 x float> [[TMP2]]
;		;
%gep1 = getelementptr inbounds <4 x float>, ptr %x, i64 0, i64 1		%gep1 = getelementptr inbounds <4 x float>, ptr %x, i64 0, i64 1
%x0 = load float, ptr %x		%x0 = load float, ptr %x
%x1 = load float, ptr %gep1		%x1 = load float, ptr %gep1
%i0 = insertelement <4 x float> undef, float %x0, i32 0		%i0 = insertelement <4 x float> undef, float %x0, i32 0
%i1 = insertelement <4 x float> %i0, float %x0, i32 1		%i1 = insertelement <4 x float> %i0, float %x0, i32 1
%i2 = insertelement <4 x float> %i1, float %x1, i32 2		%i2 = insertelement <4 x float> %i1, float %x1, i32 2
%i3 = insertelement <4 x float> %i2, float %x1, i32 3		%i3 = insertelement <4 x float> %i2, float %x1, i32 3
Show All 32 Lines	;
%t13 = bitcast i32 %t12 to float		%t13 = bitcast i32 %t12 to float
%t14 = insertelement <4 x float> %t11, float %t13, i32 2		%t14 = insertelement <4 x float> %t11, float %t13, i32 2
%t15 = insertelement <4 x float> %t14, float %t13, i32 3		%t15 = insertelement <4 x float> %t14, float %t13, i32 3
ret <4 x float> %t15		ret <4 x float> %t15
}		}

define void @PR43578_prefer128(ptr %r, ptr %p, ptr %q) #0 {		define void @PR43578_prefer128(ptr %r, ptr %p, ptr %q) #0 {
; CHECK-LABEL: @PR43578_prefer128(		; CHECK-LABEL: @PR43578_prefer128(
; CHECK-NEXT: [[P2:%.]] = getelementptr inbounds i64, ptr [[P:%.]], i64 2		; CHECK-NEXT: [[P1:%.]] = getelementptr inbounds i64, ptr [[P:%.]], i64 1
; CHECK-NEXT: [[Q2:%.]] = getelementptr inbounds i64, ptr [[Q:%.]], i64 2		; CHECK-NEXT: [[P2:%.*]] = getelementptr inbounds i64, ptr [[P]], i64 2
; CHECK-NEXT: [[TMP2:%.*]] = load <2 x i64>, ptr [[P]], align 2		; CHECK-NEXT: [[P3:%.*]] = getelementptr inbounds i64, ptr [[P]], i64 3
; CHECK-NEXT: [[TMP4:%.*]] = load <2 x i64>, ptr [[Q]], align 2		; CHECK-NEXT: [[Q1:%.]] = getelementptr inbounds i64, ptr [[Q:%.]], i64 1
; CHECK-NEXT: [[TMP5:%.*]] = sub nsw <2 x i64> [[TMP2]], [[TMP4]]		; CHECK-NEXT: [[Q2:%.*]] = getelementptr inbounds i64, ptr [[Q]], i64 2
; CHECK-NEXT: [[TMP7:%.*]] = load <2 x i64>, ptr [[P2]], align 2		; CHECK-NEXT: [[Q3:%.*]] = getelementptr inbounds i64, ptr [[Q]], i64 3
; CHECK-NEXT: [[TMP9:%.*]] = load <2 x i64>, ptr [[Q2]], align 2		; CHECK-NEXT: [[X0:%.*]] = load i64, ptr [[P]], align 2
; CHECK-NEXT: [[TMP10:%.*]] = sub nsw <2 x i64> [[TMP7]], [[TMP9]]		; CHECK-NEXT: [[X1:%.*]] = load i64, ptr [[P1]], align 2
; CHECK-NEXT: [[TMP11:%.*]] = extractelement <2 x i64> [[TMP5]], i32 0		; CHECK-NEXT: [[X2:%.*]] = load i64, ptr [[P2]], align 2
; CHECK-NEXT: [[G0:%.]] = getelementptr inbounds i32, ptr [[R:%.]], i64 [[TMP11]]		; CHECK-NEXT: [[X3:%.*]] = load i64, ptr [[P3]], align 2
; CHECK-NEXT: [[TMP12:%.*]] = extractelement <2 x i64> [[TMP5]], i32 1		; CHECK-NEXT: [[Y0:%.*]] = load i64, ptr [[Q]], align 2
; CHECK-NEXT: [[G1:%.*]] = getelementptr inbounds i32, ptr [[R]], i64 [[TMP12]]		; CHECK-NEXT: [[Y1:%.*]] = load i64, ptr [[Q1]], align 2
; CHECK-NEXT: [[TMP13:%.*]] = extractelement <2 x i64> [[TMP10]], i32 0		; CHECK-NEXT: [[Y2:%.*]] = load i64, ptr [[Q2]], align 2
; CHECK-NEXT: [[G2:%.*]] = getelementptr inbounds i32, ptr [[R]], i64 [[TMP13]]		; CHECK-NEXT: [[Y3:%.*]] = load i64, ptr [[Q3]], align 2
; CHECK-NEXT: [[TMP14:%.*]] = extractelement <2 x i64> [[TMP10]], i32 1		; CHECK-NEXT: [[SUB0:%.*]] = sub nsw i64 [[X0]], [[Y0]]
; CHECK-NEXT: [[G3:%.*]] = getelementptr inbounds i32, ptr [[R]], i64 [[TMP14]]		; CHECK-NEXT: [[SUB1:%.*]] = sub nsw i64 [[X1]], [[Y1]]
		; CHECK-NEXT: [[SUB2:%.*]] = sub nsw i64 [[X2]], [[Y2]]
		; CHECK-NEXT: [[SUB3:%.*]] = sub nsw i64 [[X3]], [[Y3]]
		; CHECK-NEXT: [[G0:%.]] = getelementptr inbounds i32, ptr [[R:%.]], i64 [[SUB0]]
		; CHECK-NEXT: [[G1:%.*]] = getelementptr inbounds i32, ptr [[R]], i64 [[SUB1]]
		; CHECK-NEXT: [[G2:%.*]] = getelementptr inbounds i32, ptr [[R]], i64 [[SUB2]]
		; CHECK-NEXT: [[G3:%.*]] = getelementptr inbounds i32, ptr [[R]], i64 [[SUB3]]
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
%p1 = getelementptr inbounds i64, ptr %p, i64 1		%p1 = getelementptr inbounds i64, ptr %p, i64 1
%p2 = getelementptr inbounds i64, ptr %p, i64 2		%p2 = getelementptr inbounds i64, ptr %p, i64 2
%p3 = getelementptr inbounds i64, ptr %p, i64 3		%p3 = getelementptr inbounds i64, ptr %p, i64 3

%q1 = getelementptr inbounds i64, ptr %q, i64 1		%q1 = getelementptr inbounds i64, ptr %q, i64 1
%q2 = getelementptr inbounds i64, ptr %q, i64 2		%q2 = getelementptr inbounds i64, ptr %q, i64 2
Show All 25 Lines

llvm/test/Transforms/SLPVectorizer/X86/minimum-sizes.ll

	Show All 9 Lines
	; the SLP threshold to force vectorization even when not profitable.			; the SLP threshold to force vectorization even when not profitable.

	; When computing minimum sizes, if we can prove the sign bit is zero, we can			; When computing minimum sizes, if we can prove the sign bit is zero, we can
	; zero-extend the roots back to their original sizes.			; zero-extend the roots back to their original sizes.
	;			;
	define i8 @PR31243_zext(i8 %v0, i8 %v1, i8 %v2, i8 %v3, ptr %ptr) {			define i8 @PR31243_zext(i8 %v0, i8 %v1, i8 %v2, i8 %v3, ptr %ptr) {
	; CHECK-LABEL: @PR31243_zext(			; CHECK-LABEL: @PR31243_zext(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x i8> poison, i8 [[V0:%.]], i64 0			; CHECK-NEXT: [[TMP0:%.]] = or i8 [[V0:%.]], 1
	; CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x i8> [[TMP0]], i8 [[V1:%.]], i64 1			; CHECK-NEXT: [[TMP1:%.]] = or i8 [[V1:%.]], 1
	; CHECK-NEXT: [[TMP2:%.*]] = or <2 x i8> [[TMP1]], <i8 1, i8 1>			; CHECK-NEXT: [[TMP2:%.*]] = zext i8 [[TMP0]] to i64
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x i8> [[TMP2]], i64 0			; CHECK-NEXT: [[TMP_4:%.]] = getelementptr inbounds i8, ptr [[PTR:%.]], i64 [[TMP2]]
	; CHECK-NEXT: [[TMP4:%.*]] = zext i8 [[TMP3]] to i64			; CHECK-NEXT: [[TMP3:%.*]] = zext i8 [[TMP1]] to i64
	; CHECK-NEXT: [[TMP_4:%.]] = getelementptr inbounds i8, ptr [[PTR:%.]], i64 [[TMP4]]			; CHECK-NEXT: [[TMP_5:%.*]] = getelementptr inbounds i8, ptr [[PTR]], i64 [[TMP3]]
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x i8> [[TMP2]], i64 1
	; CHECK-NEXT: [[TMP6:%.*]] = zext i8 [[TMP5]] to i64
	; CHECK-NEXT: [[TMP_5:%.*]] = getelementptr inbounds i8, ptr [[PTR]], i64 [[TMP6]]
	; CHECK-NEXT: [[TMP_6:%.*]] = load i8, ptr [[TMP_4]], align 1			; CHECK-NEXT: [[TMP_6:%.*]] = load i8, ptr [[TMP_4]], align 1
	; CHECK-NEXT: [[TMP_7:%.*]] = load i8, ptr [[TMP_5]], align 1			; CHECK-NEXT: [[TMP_7:%.*]] = load i8, ptr [[TMP_5]], align 1
	; CHECK-NEXT: [[TMP_8:%.*]] = add i8 [[TMP_6]], [[TMP_7]]			; CHECK-NEXT: [[TMP_8:%.*]] = add i8 [[TMP_6]], [[TMP_7]]
	; CHECK-NEXT: ret i8 [[TMP_8]]			; CHECK-NEXT: ret i8 [[TMP_8]]
	;			;
	entry:			entry:
	%tmp_0 = zext i8 %v0 to i32			%tmp_0 = zext i8 %v0 to i32
	%tmp_1 = zext i8 %v1 to i32			%tmp_1 = zext i8 %v1 to i32
	Show All 16 Lines
	; if we can't prove that the upper bit of the original type is equal to			; if we can't prove that the upper bit of the original type is equal to
	; the upper bit of the proposed smaller type. If these two bits are the			; the upper bit of the proposed smaller type. If these two bits are the
	; same (either zero or one) we know that sign-extending from the smaller			; same (either zero or one) we know that sign-extending from the smaller
	; type will result in the same value. Since we don't yet perform this			; type will result in the same value. Since we don't yet perform this
	; optimization, we make the proposed smaller type (i8) larger (i16) to			; optimization, we make the proposed smaller type (i8) larger (i16) to
	; ensure correctness.			; ensure correctness.
	;			;
	define i8 @PR31243_sext(i8 %v0, i8 %v1, i8 %v2, i8 %v3, ptr %ptr) {			define i8 @PR31243_sext(i8 %v0, i8 %v1, i8 %v2, i8 %v3, ptr %ptr) {
	; SSE-LABEL: @PR31243_sext(			; CHECK-LABEL: @PR31243_sext(
	; SSE-NEXT: entry:			; CHECK-NEXT: entry:
	; SSE-NEXT: [[TMP0:%.]] = or i8 [[V0:%.]], 1			; CHECK-NEXT: [[TMP0:%.]] = or i8 [[V0:%.]], 1
	; SSE-NEXT: [[TMP1:%.]] = or i8 [[V1:%.]], 1			; CHECK-NEXT: [[TMP1:%.]] = or i8 [[V1:%.]], 1
	; SSE-NEXT: [[TMP2:%.*]] = sext i8 [[TMP0]] to i64			; CHECK-NEXT: [[TMP2:%.*]] = sext i8 [[TMP0]] to i64
	; SSE-NEXT: [[TMP4:%.]] = getelementptr inbounds i8, ptr [[PTR:%.]], i64 [[TMP2]]			; CHECK-NEXT: [[TMP4:%.]] = getelementptr inbounds i8, ptr [[PTR:%.]], i64 [[TMP2]]
	; SSE-NEXT: [[TMP3:%.*]] = sext i8 [[TMP1]] to i64			; CHECK-NEXT: [[TMP3:%.*]] = sext i8 [[TMP1]] to i64
	; SSE-NEXT: [[TMP5:%.*]] = getelementptr inbounds i8, ptr [[PTR]], i64 [[TMP3]]			; CHECK-NEXT: [[TMP5:%.*]] = getelementptr inbounds i8, ptr [[PTR]], i64 [[TMP3]]
	; SSE-NEXT: [[TMP6:%.*]] = load i8, ptr [[TMP4]], align 1			; CHECK-NEXT: [[TMP6:%.*]] = load i8, ptr [[TMP4]], align 1
	; SSE-NEXT: [[TMP7:%.*]] = load i8, ptr [[TMP5]], align 1			; CHECK-NEXT: [[TMP7:%.*]] = load i8, ptr [[TMP5]], align 1
	; SSE-NEXT: [[TMP8:%.*]] = add i8 [[TMP6]], [[TMP7]]			; CHECK-NEXT: [[TMP8:%.*]] = add i8 [[TMP6]], [[TMP7]]
	; SSE-NEXT: ret i8 [[TMP8]]			; CHECK-NEXT: ret i8 [[TMP8]]
	;
	; AVX-LABEL: @PR31243_sext(
	; AVX-NEXT: entry:
	; AVX-NEXT: [[TMP0:%.]] = insertelement <2 x i8> poison, i8 [[V0:%.]], i64 0
	; AVX-NEXT: [[TMP1:%.]] = insertelement <2 x i8> [[TMP0]], i8 [[V1:%.]], i64 1
	; AVX-NEXT: [[TMP2:%.*]] = or <2 x i8> [[TMP1]], <i8 1, i8 1>
	; AVX-NEXT: [[TMP3:%.*]] = sext <2 x i8> [[TMP2]] to <2 x i16>
	; AVX-NEXT: [[TMP4:%.*]] = extractelement <2 x i16> [[TMP3]], i64 0
	; AVX-NEXT: [[TMP5:%.*]] = sext i16 [[TMP4]] to i64
	; AVX-NEXT: [[TMP4:%.]] = getelementptr inbounds i8, ptr [[PTR:%.]], i64 [[TMP5]]
	; AVX-NEXT: [[TMP6:%.*]] = extractelement <2 x i16> [[TMP3]], i64 1
	; AVX-NEXT: [[TMP7:%.*]] = sext i16 [[TMP6]] to i64
	; AVX-NEXT: [[TMP5:%.*]] = getelementptr inbounds i8, ptr [[PTR]], i64 [[TMP7]]
	; AVX-NEXT: [[TMP6:%.*]] = load i8, ptr [[TMP4]], align 1
	; AVX-NEXT: [[TMP7:%.*]] = load i8, ptr [[TMP5]], align 1
	; AVX-NEXT: [[TMP8:%.*]] = add i8 [[TMP6]], [[TMP7]]
	; AVX-NEXT: ret i8 [[TMP8]]
	;			;
	entry:			entry:
	%tmp0 = sext i8 %v0 to i32			%tmp0 = sext i8 %v0 to i32
	%tmp1 = sext i8 %v1 to i32			%tmp1 = sext i8 %v1 to i32
	%tmp2 = or i32 %tmp0, 1			%tmp2 = or i32 %tmp0, 1
	%tmp3 = or i32 %tmp1, 1			%tmp3 = or i32 %tmp1, 1
	%tmp4 = getelementptr inbounds i8, ptr %ptr, i32 %tmp2			%tmp4 = getelementptr inbounds i8, ptr %ptr, i32 %tmp2
	%tmp5 = getelementptr inbounds i8, ptr %ptr, i32 %tmp3			%tmp5 = getelementptr inbounds i8, ptr %ptr, i32 %tmp3
	%tmp6 = load i8, ptr %tmp4			%tmp6 = load i8, ptr %tmp4
	%tmp7 = load i8, ptr %tmp5			%tmp7 = load i8, ptr %tmp5
	%tmp8 = add i8 %tmp6, %tmp7			%tmp8 = add i8 %tmp6, %tmp7
	ret i8 %tmp8			ret i8 %tmp8
	}			}
				;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
				; AVX: {{.*}}
				; SSE: {{.*}}

llvm/test/Transforms/SLPVectorizer/X86/opaque-ptr.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -S -passes=slp-vectorizer -mtriple=x86_64-apple-macosx -mcpu=haswell -opaque-pointers < %s \| FileCheck %s			; RUN: opt -S -passes=slp-vectorizer -mtriple=x86_64-apple-macosx -mcpu=haswell -opaque-pointers < %s \| FileCheck %s

	define void @test(ptr %r, ptr %p, ptr %q) #0 {			define void @test(ptr %r, ptr %p, ptr %q) #0 {
	; CHECK-LABEL: @test(			; CHECK-LABEL: @test(
	; CHECK-NEXT: [[P0:%.]] = getelementptr inbounds i64, ptr [[P:%.]], i64 0			; CHECK-NEXT: [[P0:%.]] = getelementptr inbounds i64, ptr [[P:%.]], i64 0
				; CHECK-NEXT: [[P1:%.*]] = getelementptr inbounds i64, ptr [[P]], i64 1
				; CHECK-NEXT: [[P2:%.*]] = getelementptr inbounds i64, ptr [[P]], i64 2
				; CHECK-NEXT: [[P3:%.*]] = getelementptr inbounds i64, ptr [[P]], i64 3
	; CHECK-NEXT: [[Q0:%.]] = getelementptr inbounds i64, ptr [[Q:%.]], i64 0			; CHECK-NEXT: [[Q0:%.]] = getelementptr inbounds i64, ptr [[Q:%.]], i64 0
	; CHECK-NEXT: [[TMP1:%.*]] = load <4 x i64>, ptr [[P0]], align 2			; CHECK-NEXT: [[Q1:%.*]] = getelementptr inbounds i64, ptr [[Q]], i64 1
	; CHECK-NEXT: [[TMP2:%.*]] = load <4 x i64>, ptr [[Q0]], align 2			; CHECK-NEXT: [[Q2:%.*]] = getelementptr inbounds i64, ptr [[Q]], i64 2
	; CHECK-NEXT: [[TMP3:%.*]] = sub nsw <4 x i64> [[TMP1]], [[TMP2]]			; CHECK-NEXT: [[Q3:%.*]] = getelementptr inbounds i64, ptr [[Q]], i64 3
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP3]], i32 0			; CHECK-NEXT: [[X0:%.*]] = load i64, ptr [[P0]], align 2
	; CHECK-NEXT: [[G0:%.]] = getelementptr inbounds i32, ptr [[R:%.]], i64 [[TMP4]]			; CHECK-NEXT: [[X1:%.*]] = load i64, ptr [[P1]], align 2
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[TMP3]], i32 1			; CHECK-NEXT: [[X2:%.*]] = load i64, ptr [[P2]], align 2
	; CHECK-NEXT: [[G1:%.*]] = getelementptr inbounds i32, ptr [[R]], i64 [[TMP5]]			; CHECK-NEXT: [[X3:%.*]] = load i64, ptr [[P3]], align 2
	; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP3]], i32 2			; CHECK-NEXT: [[Y0:%.*]] = load i64, ptr [[Q0]], align 2
	; CHECK-NEXT: [[G2:%.*]] = getelementptr inbounds i32, ptr [[R]], i64 [[TMP6]]			; CHECK-NEXT: [[Y1:%.*]] = load i64, ptr [[Q1]], align 2
	; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP3]], i32 3			; CHECK-NEXT: [[Y2:%.*]] = load i64, ptr [[Q2]], align 2
	; CHECK-NEXT: [[G3:%.*]] = getelementptr inbounds i32, ptr [[R]], i64 [[TMP7]]			; CHECK-NEXT: [[Y3:%.*]] = load i64, ptr [[Q3]], align 2
				; CHECK-NEXT: [[SUB0:%.*]] = sub nsw i64 [[X0]], [[Y0]]
				; CHECK-NEXT: [[SUB1:%.*]] = sub nsw i64 [[X1]], [[Y1]]
				; CHECK-NEXT: [[SUB2:%.*]] = sub nsw i64 [[X2]], [[Y2]]
				; CHECK-NEXT: [[SUB3:%.*]] = sub nsw i64 [[X3]], [[Y3]]
				; CHECK-NEXT: [[G0:%.]] = getelementptr inbounds i32, ptr [[R:%.]], i64 [[SUB0]]
				; CHECK-NEXT: [[G1:%.*]] = getelementptr inbounds i32, ptr [[R]], i64 [[SUB1]]
				; CHECK-NEXT: [[G2:%.*]] = getelementptr inbounds i32, ptr [[R]], i64 [[SUB2]]
				; CHECK-NEXT: [[G3:%.*]] = getelementptr inbounds i32, ptr [[R]], i64 [[SUB3]]
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%p0 = getelementptr inbounds i64, ptr %p, i64 0			%p0 = getelementptr inbounds i64, ptr %p, i64 0
	%p1 = getelementptr inbounds i64, ptr %p, i64 1			%p1 = getelementptr inbounds i64, ptr %p, i64 1
	%p2 = getelementptr inbounds i64, ptr %p, i64 2			%p2 = getelementptr inbounds i64, ptr %p, i64 2
	%p3 = getelementptr inbounds i64, ptr %p, i64 3			%p3 = getelementptr inbounds i64, ptr %p, i64 3

	%q0 = getelementptr inbounds i64, ptr %q, i64 0			%q0 = getelementptr inbounds i64, ptr %q, i64 0
	▲ Show 20 Lines • Show All 51 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/partail.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -passes=slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu -mcpu=bdver2 < %s \| FileCheck %s			; RUN: opt -passes=slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu -mcpu=bdver2 < %s \| FileCheck %s

	; Function Attrs: nounwind uwtable			; Function Attrs: nounwind uwtable
	define void @get_block(i32 %y_pos) local_unnamed_addr #0 {			define void @get_block(i32 %y_pos) local_unnamed_addr #0 {
	; CHECK-LABEL: @get_block(			; CHECK-LABEL: @get_block(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[LAND_LHS_TRUE:%.*]]			; CHECK-NEXT: br label [[LAND_LHS_TRUE:%.*]]
	; CHECK: land.lhs.true:			; CHECK: land.lhs.true:
	; CHECK-NEXT: br i1 undef, label [[IF_THEN:%.]], label [[IF_END:%.]]			; CHECK-NEXT: br i1 undef, label [[IF_THEN:%.]], label [[IF_END:%.]]
	; CHECK: if.then:			; CHECK: if.then:
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	; CHECK: if.end:			; CHECK: if.end:
	; CHECK-NEXT: [[SUB14:%.]] = sub nsw i32 [[Y_POS:%.]], undef			; CHECK-NEXT: [[SUB14:%.]] = sub nsw i32 [[Y_POS:%.]], undef
	; CHECK-NEXT: [[SHR15:%.*]] = ashr i32 [[SUB14]], 2			; CHECK-NEXT: [[SHR15:%.*]] = ashr i32 [[SUB14]], 2
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <4 x i32> poison, i32 [[SHR15]], i32 0			; CHECK-NEXT: [[CMP_I_I:%.*]] = icmp sgt i32 [[SHR15]], 0
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x i32> [[TMP0]], i32 [[SUB14]], i32 1			; CHECK-NEXT: [[COND_I_I:%.*]] = select i1 [[CMP_I_I]], i32 [[SHR15]], i32 0
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <4 x i32> <i32 0, i32 1, i32 1, i32 1>			; CHECK-NEXT: [[CMP_I4_I:%.*]] = icmp slt i32 [[COND_I_I]], undef
	; CHECK-NEXT: [[TMP2:%.*]] = icmp sgt <4 x i32> [[SHUFFLE]], <i32 0, i32 -1, i32 -5, i32 -9>			; CHECK-NEXT: [[COND_I5_I:%.*]] = select i1 [[CMP_I4_I]], i32 [[COND_I_I]], i32 undef
	; CHECK-NEXT: [[TMP3:%.*]] = freeze <4 x i32> [[TMP0]]			; CHECK-NEXT: [[IDXPROM30:%.*]] = sext i32 [[COND_I5_I]] to i64
	; CHECK-NEXT: [[TMP4:%.*]] = select <4 x i1> [[TMP2]], <4 x i32> [[TMP3]], <4 x i32> zeroinitializer			; CHECK-NEXT: [[ARRAYIDX31:%.*]] = getelementptr inbounds ptr, ptr undef, i64 [[IDXPROM30]]
	; CHECK-NEXT: [[TMP5:%.*]] = icmp slt <4 x i32> [[TMP4]], undef			; CHECK-NEXT: [[CMP_I_I_1:%.*]] = icmp sgt i32 [[SUB14]], -1
	; CHECK-NEXT: [[TMP6:%.*]] = select <4 x i1> [[TMP5]], <4 x i32> [[TMP4]], <4 x i32> undef			; CHECK-NEXT: [[COND_I_I_1:%.*]] = select i1 [[CMP_I_I_1]], i32 undef, i32 0
	; CHECK-NEXT: [[TMP7:%.*]] = sext <4 x i32> [[TMP6]] to <4 x i64>			; CHECK-NEXT: [[CMP_I4_I_1:%.*]] = icmp slt i32 [[COND_I_I_1]], undef
	; CHECK-NEXT: [[TMP8:%.*]] = trunc <4 x i64> [[TMP7]] to <4 x i32>			; CHECK-NEXT: [[COND_I5_I_1:%.*]] = select i1 [[CMP_I4_I_1]], i32 [[COND_I_I_1]], i32 undef
	; CHECK-NEXT: [[TMP9:%.*]] = extractelement <4 x i32> [[TMP8]], i32 0			; CHECK-NEXT: [[IDXPROM30_1:%.*]] = sext i32 [[COND_I5_I_1]] to i64
	; CHECK-NEXT: [[TMP10:%.*]] = sext i32 [[TMP9]] to i64			; CHECK-NEXT: [[ARRAYIDX31_1:%.*]] = getelementptr inbounds ptr, ptr undef, i64 [[IDXPROM30_1]]
	; CHECK-NEXT: [[ARRAYIDX31:%.*]] = getelementptr inbounds ptr, ptr undef, i64 [[TMP10]]			; CHECK-NEXT: [[CMP_I_I_2:%.*]] = icmp sgt i32 [[SUB14]], -5
	; CHECK-NEXT: [[TMP11:%.*]] = extractelement <4 x i32> [[TMP8]], i32 1			; CHECK-NEXT: [[COND_I_I_2:%.*]] = select i1 [[CMP_I_I_2]], i32 undef, i32 0
	; CHECK-NEXT: [[TMP12:%.*]] = sext i32 [[TMP11]] to i64			; CHECK-NEXT: [[CMP_I4_I_2:%.*]] = icmp slt i32 [[COND_I_I_2]], undef
	; CHECK-NEXT: [[ARRAYIDX31_1:%.*]] = getelementptr inbounds ptr, ptr undef, i64 [[TMP12]]			; CHECK-NEXT: [[COND_I5_I_2:%.*]] = select i1 [[CMP_I4_I_2]], i32 [[COND_I_I_2]], i32 undef
	; CHECK-NEXT: [[TMP13:%.*]] = extractelement <4 x i32> [[TMP8]], i32 2			; CHECK-NEXT: [[IDXPROM30_2:%.*]] = sext i32 [[COND_I5_I_2]] to i64
	; CHECK-NEXT: [[TMP14:%.*]] = sext i32 [[TMP13]] to i64			; CHECK-NEXT: [[ARRAYIDX31_2:%.*]] = getelementptr inbounds ptr, ptr undef, i64 [[IDXPROM30_2]]
	; CHECK-NEXT: [[ARRAYIDX31_2:%.*]] = getelementptr inbounds ptr, ptr undef, i64 [[TMP14]]			; CHECK-NEXT: [[CMP_I_I_3:%.*]] = icmp sgt i32 [[SUB14]], -9
	; CHECK-NEXT: [[TMP15:%.*]] = extractelement <4 x i32> [[TMP8]], i32 3			; CHECK-NEXT: [[COND_I_I_3:%.*]] = select i1 [[CMP_I_I_3]], i32 undef, i32 0
	; CHECK-NEXT: [[TMP16:%.*]] = sext i32 [[TMP15]] to i64			; CHECK-NEXT: [[CMP_I4_I_3:%.*]] = icmp slt i32 [[COND_I_I_3]], undef
	; CHECK-NEXT: [[ARRAYIDX31_3:%.*]] = getelementptr inbounds ptr, ptr undef, i64 [[TMP16]]			; CHECK-NEXT: [[COND_I5_I_3:%.*]] = select i1 [[CMP_I4_I_3]], i32 [[COND_I_I_3]], i32 undef
				; CHECK-NEXT: [[IDXPROM30_3:%.*]] = sext i32 [[COND_I5_I_3]] to i64
				; CHECK-NEXT: [[ARRAYIDX31_3:%.*]] = getelementptr inbounds ptr, ptr undef, i64 [[IDXPROM30_3]]
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	;			;
	entry:			entry:
	br label %land.lhs.true			br label %land.lhs.true

	land.lhs.true: ; preds = %entry			land.lhs.true: ; preds = %entry
	br i1 undef, label %if.then, label %if.end			br i1 undef, label %if.then, label %if.end

	Show All 32 Lines