This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Analysis/
-
llvm/
-
Analysis/
-
LoopAccessAnalysis.h
-
lib/
-
Analysis/
-
LoopAccessAnalysis.cpp
-
Transforms/Vectorize/
-
Vectorize/
4/19
LoopVectorize.cpp
-
test/Transforms/LoopVectorize/
-
Transforms/
-
LoopVectorize/
-
X86/
-
libm-vector-calls-VF2-VF8.ll
-
libm-vector-calls-finite.ll
2
libm-vector-calls.ll
-
svml-calls-finite.ll
-
metadata-width.ll
-
preserve-dbg-loc-and-loop-metadata.ll
-
runtime-check.ll
2/9
unsafe-vf-remark.ll

Differential D90687

[LV] Clamp VF hint when unsafe
ClosedPublic

Authored by c-rhodes on Nov 3 2020, 8:07 AM.

Download Raw Diff

Details

Reviewers

sdesmalen
efriedma
fhahn
mkuper
RKSimon
spatel
Meinersbur

Commits

rGcba4accda08f: [LV] Clamp VF hint when unsafe

Summary

In the following loop the dependence distance is 2 and can only be
vectorized if the vector length is no larger than this.

void foo(int *a, int *b, int N) {
  #pragma clang loop vectorize(enable) vectorize_width(4)
  for (int i=0; i<N; ++i) {
    a[i + 2] = a[i] + b[i];
  }
}

However, when specifying a VF of 4 via a loop hint this loop is
vectorized. According to [1][2], loop hints are ignored if the
optimization is not safe to apply.

This patch introduces a check to bail of vectorization if the user
specified VF is greater than the maximum feasible VF, unless explicitly
forced with '-force-vector-width=X'.

[1] https://llvm.org/docs/LangRef.html#llvm-loop-vectorize-and-llvm-loop-interleave
[2] https://clang.llvm.org/docs/LanguageExtensions.html#extensions-for-loop-hint-optimizations

Diff Detail

Event Timeline

c-rhodes created this revision.Nov 3 2020, 8:07 AM

Herald added a project: Restricted Project. · View Herald TranscriptNov 3 2020, 8:07 AM

Herald added a subscriber: hiraditya. · View Herald Transcript

c-rhodes requested review of this revision.Nov 3 2020, 8:07 AM

Harbormaster completed remote builds in B77412: Diff 302584.Nov 3 2020, 9:34 AM

dmgreen added a reviewer: Meinersbur.Nov 3 2020, 9:35 AM

dmgreen added a subscriber: dmgreen.

dmgreen added inline comments.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5238	I don't think computeFeasibleMaxVF is the maximum safe width. It does a lot of things and is largely based on the backend vector widths. I'm guessing that's why so many tests are changing. It should probably be based on Legal.getMaxSafeRegisterWidth()?

fhahn added inline comments.Nov 3 2020, 9:46 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5238	It should probably be based on Legal.getMaxSafeRegisterWidth()? computeFeasibleMaxVF should only return legal vectorization factors (using getMaxSafeRegisterWidht) I think. Given that we are free to ignore the hint, if it is not useful, why not just use the largest safe vectorization factors instead of bailing out? Unfortunately the handling of UserVF is a bit of a mess, but I think it might be preferable to clamp the UserVF to the maximum vectorization factor in the caller of computeMaxVF where UserVF is actually used.
llvm/test/Transforms/LoopVectorize/X86/libm-vector-calls.ll
192	Why those test changes?

#pragma clang loop vectorize_witdh(..) ignoring safety checks is indeed bad. Instead of not vectorizing at all in this case, did you consider using min(UserVF,FeasibleVF) instead?

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5238	`Legal.getMaxSafeRegisterWidth()` is called within `computeFeasibleMaxVF`. With `computeFeasibleMaxVF` considering additional architecture concerns, can these be just ignored?
5238	With `computeFeasibleMaxVF` already called later, could you store its result for both uses? I suggest to move `UserVF ? UserVF : computeFeasibleMaxVF(TC)` which is duplicated multiple times below before this condition. Such as: auto MaxVF = computeFeasibleMaxVF(TC); if (UserVF) { if (UserVF > MaxVF) { ... } MaxVF = UserFC; }

c-rhodes added inline comments.Nov 3 2020, 12:48 PM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5238	It should probably be based on Legal.getMaxSafeRegisterWidth()? computeFeasibleMaxVF should only return legal vectorization factors (using getMaxSafeRegisterWidht) I think. Given that we are free to ignore the hint, if it is not useful, why not just use the largest safe vectorization factors instead of bailing out? Yeah that's right, `computeFeasibleMaxVF` is based on `Legal->getMaxSafeRegisterWidth()` and bounded by the widest register according to the TTI. Unfortunately the handling of UserVF is a bit of a mess, but I think it might be preferable to clamp the UserVF to the maximum vectorization factor in the caller of computeMaxVF where UserVF is actually used. Sure, clamping sounds good to me and @Meinersbur suggested this as well, I'll update the patch.
5238	I don't think computeFeasibleMaxVF is the maximum safe width. It does a lot of things and is largely based on the backend vector widths. I'm guessing that's why so many tests are changing. I did find for the X86 tests that have changed it was because of a backend vector width of 128-bit, I expect these changes will still be required when clamping the UserVF to the maximum vectorization factor as suggested.
llvm/test/Transforms/LoopVectorize/X86/libm-vector-calls.ll
192	Why those test changes? The maximum VF=2, it's computed as `WidestRegister / WidestType` where the the widest register is 128-bit.

Rather than bail out of vectorization if UserVF > MaxVF, clamp UserVF to the maximum feasible VF.

dmgreen added inline comments.Nov 4 2020, 10:56 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5237	Clamping sounds good. But I think that there is a difference between the "maximum safe vector width" and "the maximum register bitwidth". The maximum safe distance should be a legality constraint that doesn't depend on the size of the backend registers. It's perhaps easier to show with examples. This code should produce the same thing (the same width vectors) with and without this patch, I believe, as there is nothing "unsafe" about vectorizing at a higher bitwidth than the vector registers: https://godbolt.org/z/ePdv3K (I am trying to not use the term "legal", as it has too many meanings. There is a difference between "legal to vectorize" (as the safety constraints in LoopVectorizationLegality) and "legal vector widths" which just means that the llvm-ir vector can be lowered to a single vector register and I don't think should be very relevant here).

Address @dmgreen's comments, response in thread.

c-rhodes added inline comments.Nov 5 2020, 7:40 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5237	But I think that there is a difference between the "maximum safe vector width" and "the maximum register bitwidth". The maximum safe distance should be a legality constraint that doesn't depend on the size of the backend registers. So only considering the VF computed by LAA for dependencies rather the backend register widths, I think that makes sense. Whilst looking into this I discovered the example I gave in the commit message doesn't actually vectorize when only specifying `-force-vector-width=4` and no loop hint: void foo(int a, int b, int N) { for (int i=0; i<N; ++i) { a[i + 2] = a[i] + b[i]; } } clang -S -emit-llvm -o - ../dependence.c -O3 -Rpass-missed=loop-vectorize -mllvm -force-vector-width=4 ../dependence.c:2:3: remark: loop not vectorized [-Rpass-missed=loop-vectorize] for (int i=0; i<N; ++i) { ^ It's a bit of a mess how UserVF is handled, it seems LAA only knows about the UserVF specified by `-force-vector-width`. With the pragma LAA is operating on VF=2 and LV on VF=4. What's also interesting is the loop metadata takes precedence over the flag since in `LoopVectorizationLegality` the vector width is initialized with the flag then populated with loop metadata, so the VF according to LAA would come from the flag and in LV the loop hint, assuming both were specified by the user that is. I've updated the patch such that `computeFeasibleMaxVF` now takes `UserVF` and clamps this to VF based on `Legal->getMaxSafeRegisterWidth();` if it exceeds it. This simplifies the patch a fair bit since no existing tests change, although I think the handling of `UserVF` needs refactoring. Now I know `-force-vector-width` won't apply unless it's safe, the loop hint and flag feel semantically equivalent but I'm not sure if there's anything I'm missing. Any thoughts?

sdesmalen added inline comments.Nov 5 2020, 7:46 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5237	That's a good point @dmgreen, `MaxSafeRegisterWidth` and corresponding `getMaxSafeRegisterWidth` in LoopVectorize.cpp and LoopAccessAnalysis.cpp are actually misnomers because it isn't the maximum safe register width that is calculated, but rather the maximum safe vector bitwidth. Without this patch, this example is vectorized with VF=8 when compiling for Neon (128bit vectors): void foo(int a, int b, int c, int N) { #pragma clang loop vectorize(enable) vectorize_width(8) for (int i=0; i<N; ++i) { a[i + 16] = a[i] + b[i]; } } Where with this patch, it is now vectorized with VF=4. It seems like the limitation with regards to the actual physical vector register can and should be removed.

sdesmalen added inline comments.Nov 5 2020, 7:53 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5237	the loop hint and flag feel semantically equivalent but I'm not sure if there's anything I'm missing. Any thoughts? To me they read as slightly different things, but it's good to clear their semantics up. I interpret: `-force-vector-width` as "vectorize it with this width and this width only, and fail if not legal". The `LoopHint` as "try to vectorize with this width but if not legal, feel free to ignore the hint and pick a different width". At least, that's what the implementation does today and I assumed that was deliberate. Maybe someone else can clarify that?

Meinersbur added inline comments.Nov 9 2020, 4:26 PM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5237	`-force-vector-width` is internal and should be used for regression tests only. Whether it applies a different vector width or does not vectorize at all, the regression tests should fail. Because it is internal, we it shouldn't matter which semantics we chose. Personally, I'd chose the LoopHint semantics to reduce the number of code paths.
5348	[style] LLVM coding style prefers explicit types in declarations.
5352–5354	Should there be (also) a diagnostic warning (-Rpass) to inform the user that the value has been clamped? (Or maybe there is already and I don't see where it is done)

fhahn added inline comments.Nov 10 2020, 2:35 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5237	I've updated the patch such that computeFeasibleMaxVF now takes UserVF and clamps this to VF based on Legal->getMaxSafeRegisterWidth(); if it exceeds it. This simplifies the patch a fair bit since no existing tests change, although I think the handling of UserVF needs refactoring. Now I know -force-vector-width won't apply unless it's safe, the loop hint and flag feel semantically equivalent but I'm not sure if there's anything I'm missing. Any thoughts? The logic looks good I think. My only concern is that passing UserVF even further down seems to make handling of it even more complicated to follow, but I don't think there's a good way around that because we need some info not available here. At least, that's what the implementation does today and I assumed that was deliberate. Maybe someone else can clarify that? -force-vector-width is internal and should be used for regression tests only. Whether it applies a different vector width or does not vectorize at all, the regression tests should fail. Because it is internal, we it shouldn't matter which semantics we chose. Personally, I'd chose the LoopHint semantics to reduce the number of code paths. Yes that is how it is used today AFAIK, to write LV tests independent of cost-modeling. Agreed with @Meinersbur that it should not matter for regression tests, because I think it is mostly used for testing codegen in legal scenarios. If we decide to adjust the behavior, that's probably best done in a separate patch.
5352–5354	I think we currently would only issue a remark with the chosen VF, but it would probably good to add a separate remark so it is easy for users to spot (`OptimizationRemarkAnalysis` seems to be a suitable category here).

Fix style issue.
Add optimization remark when clamping.

c-rhodes marked 3 inline comments as done.Nov 10 2020, 9:03 AM

c-rhodes added inline comments.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5237	@fhahn @Meinersbur thanks for clarifying
5352–5354	I think we currently would only issue a remark with the chosen VF, but it would probably good to add a separate remark so it is easy for users to spot (OptimizationRemarkAnalysis seems to be a suitable category here). Done, thanks for pointing out OptimizationRemarkAnalysis

Thanks, looks good to me.

This revision is now accepted and ready to land.Nov 10 2020, 1:46 PM

LGTM as well, thanks!

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5353	nit: it seems like the message here and of the remark slightly diverged. I think it would be worth using the same 'clamping' wording as in the remark here.
llvm/test/Transforms/LoopVectorize/unsafe-vf-remark.ll
4	It might also be interesting to add a test cases where the user provided VF is large (say 64), the max legal width is something like 32 and the profitable width selected by the cost model is something smaller (might be easier if this is a target-specific test for a specific architecture ).

LGTM as well. It seems the current patch fixes the example I pasted in my previous comment.

LGTM as well. It seems the current patch fixes the example I pasted in my previous comment.

Yeah, Thanks for that.

c-rhodes added inline comments.Nov 11 2020, 5:32 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5353	nit: it seems like the message here and of the remark slightly diverged. I think it would be worth using the same 'clamping' wording as in the remark here. Good spot, I'll fix it before merging or update this patch once I get a better idea about the other suggestion you made.
llvm/test/Transforms/LoopVectorize/unsafe-vf-remark.ll
4	It might also be interesting to add a test cases where the user provided VF is large (say 64), the max legal width is something like 32 and the profitable width selected by the cost model is something smaller (might be easier if this is a target-specific test for a specific architecture ). Is this loop like what you had in mind? void foo(int a, int b, int N) { #pragma clang loop vectorize(enable) vectorize_width(64) for (int i=0; i<N; ++i) { a[i + 32] = a[i] / b[i]; } } When compiling with: ./bin/clang -S -emit-llvm -o - ../dependence.c -O2 -mllvm -debug-only=loop-vectorize,loop-accesses -target aarch64-linux-gnu The user VF of 64 is unsafe so it's clamped to 32 and the vector loop of width 32 is more expensive (cost 13) than the scalar loop (cost 10), although the vectorization is forced so the VF=32 is still chosen.

Rebased (computeFeasibleMaxVF now returns an ElementCount).
Address one of @fhahn's comments.
Added an assert at the top of computeFeasibleMaxVF that UserVF isn't scalable.

c-rhodes marked an inline comment as done.Nov 18 2020, 5:31 AM

c-rhodes added a child revision: D91718: [LV] Legalize scalable VF hints.Nov 18 2020, 8:32 AM

c-rhodes added inline comments.Nov 20 2020, 3:24 AM

llvm/test/Transforms/LoopVectorize/unsafe-vf-remark.ll
4	It might also be interesting to add a test cases where the user provided VF is large (say 64), the max legal width is something like 32 and the profitable width selected by the cost model is something smaller (might be easier if this is a target-specific test for a specific architecture ). I might be missing something but I don't think this test will add any value, although I'm not comfortable landing this patch until this is resolved. I suggest landing this patch as is, @fhahn unless you feel strongly about it?

fhahn added inline comments.Nov 24 2020, 11:37 AM

llvm/test/Transforms/LoopVectorize/unsafe-vf-remark.ll
4	It might also be interesting to add a test cases where the user provided VF is large (say 64), the max legal width is something like 32 and the profitable width selected by the cost model is something smaller (might be easier if this is a target-specific test for a specific architecture ). I might be missing something but I don't think this test will add any value, although I'm not comfortable landing this patch until this is resolved. I suggest landing this patch as is, @fhahn unless you feel strongly about it? Agreed, such a test doesn't really add much. What I was suggesting was one where the cost model does pick a different VF than the maximum safe one. This is the case that should be handled differently with the current version compared to the first version. I was thinking about a test like the one below. It requests VF = 32, but only 16 is safe. When built with `opt -loop-vectorize -mtriple=arm64-apple-iphoneos`, the cost model should pick VF = 2 instead of the higher alternatives. define void @test(i64* nocapture %a, i64* nocapture readonly %b) { entry: br label %loop.header loop.header: %iv = phi i64 [ 0, %entry ], [ %iv.next, %latch ] %arrayidx = getelementptr inbounds i64, i64* %a, i64 %iv %0 = load i64, i64* %arrayidx, align 4 %arrayidx2 = getelementptr inbounds i64, i64* %b, i64 %iv %1 = load i64, i64* %arrayidx2, align 4 %add = add nsw i64 %1, %0 %2 = add nuw nsw i64 %iv, 16 %arrayidx5 = getelementptr inbounds i64, i64* %a, i64 %2 %c = icmp eq i64 %1, 120 br i1 %c, label %then, label %latch then: store i64 %add, i64* %arrayidx5, align 4 br label %latch latch: %iv.next = add nuw nsw i64 %iv, 1 %exitcond.not = icmp eq i64 %iv.next, 1024 br i1 %exitcond.not, label %exit, label %loop.header, !llvm.loop !0 exit: ret void } !0 = !{!0, !1, !2} !1 = !{!"llvm.loop.vectorize.width", i64 32} !2 = !{!"llvm.loop.vectorize.enable", i1 true}
21	nit: Is this required? LV will add a minimum iteration check anyways, so this could be dropped to make the IR more compact? Alternatively we could also use a constant trip count.
25	nit: Is this required? We could just change `%N` to be a `i64` to make the IR more compact.
32	nit: can we drop the `indvars.` prefix to make the IR slightly more readable?

Address Florian's comments

c-rhodes marked 2 inline comments as done.Nov 25 2020, 4:03 AM

c-rhodes added inline comments.

llvm/test/Transforms/LoopVectorize/unsafe-vf-remark.ll
4	I was thinking about a test like the one below. It requests VF = 32, but only 16 is safe. When built with opt -loop-vectorize -mtriple=arm64-apple-iphoneos, the cost model should pick VF = 2 instead of the higher alternatives. That makes sense cheers, I've added the test
21	nit: Is this required? LV will add a minimum iteration check anyways, so this could be dropped to make the IR more compact? Alternatively we could also use a constant trip count. Changed it to use a constant trip count

c-rhodes mentioned this in D91718: [LV] Legalize scalable VF hints.Nov 25 2020, 7:41 AM

I think I've addressed all comments now so I'll land this at the beginning of next week unless there's any objections between then, thanks for reviewing all!

Closed by commit rGcba4accda08f: [LV] Clamp VF hint when unsafe (authored by c-rhodes). · Explain WhyDec 1 2020, 3:31 AM

This revision was automatically updated to reflect the committed changes.

c-rhodes added a commit: rGcba4accda08f: [LV] Clamp VF hint when unsafe.

sdesmalen mentioned this in D91077: [LoopVectorizer][SVE] Vectorize a simple loop with with a scalable VF..Dec 8 2020, 6:17 AM

I configured and built llvm with:
cmake -G Ninja -DLLVM_TARGETS_TO_BUILD:STRING=Hexagon -DLLVM_DEFAULT_TARGET_TRIPLE:STRING=hexagon-unknown-elf -DLLVM_TARGET_ARCH:STRING=hexagon-unknown-elf -DLLVM_ENABLE_ASSERTIONS:BOOL=ON -DLLVM_PARALLEL_LINK_JOBS=1 '-DLLVM_ENABLE_PROJECTS=llvm;clang' -DBUILD_SHARED_LIBS:BOOL=ON ..

This patch causes an assert with this testcase:
typedef struct {

char a;

} b;
b *c;
int d, e;
int f() {

int g = 0;
for (; d; d++) {
  e = 0;
  for (; e < c[d].a; e++)
    g++;
}
return g;

}
clang -Os -mhvx -fvectorize -mv67 testcase.i -S -o -

llvm::LoopVectorizationCostModel::computeMaxVF(llvm::ElementCount, unsigned int): Assertion `WideningDecisions.empty() && Uniforms.empty() && Scalars.empty() && "No decisions should have been taken at this point"' failed.

This revision is now accepted and ready to land.Jan 15 2021, 8:53 AM

clang can also be configured for all targets
cmake -G Ninja -DLLVM_ENABLE_ASSERTIONS:BOOL=ON -DLLVM_PARALLEL_LINK_JOBS=1 '-DLLVM_ENABLE_PROJECTS=llvm;clang' -DBUILD_SHARED_LIBS:BOOL=ON ..

clang -Os -target hexagon -mhvx -fvectorize -mv67 testcase.i -S -o -

In D90687#2501292, @iajbar wrote:
I configured and built llvm with:
cmake -G Ninja -DLLVM_TARGETS_TO_BUILD:STRING=Hexagon -DLLVM_DEFAULT_TARGET_TRIPLE:STRING=hexagon-unknown-elf -DLLVM_TARGET_ARCH:STRING=hexagon-unknown-elf -DLLVM_ENABLE_ASSERTIONS:BOOL=ON -DLLVM_PARALLEL_LINK_JOBS=1 '-DLLVM_ENABLE_PROJECTS=llvm;clang' -DBUILD_SHARED_LIBS:BOOL=ON ..

This patch causes an assert with this testcase:
typedef struct {
char a;
} b;
b *c;
int d, e;
int f() {
int g = 0;
for (; d; d++) {
  e = 0;
  for (; e < c[d].a; e++)
    g++;
}
return g;
}
clang -Os -mhvx -fvectorize -mv67 testcase.i -S -o -

llvm::LoopVectorizationCostModel::computeMaxVF(llvm::ElementCount, unsigned int): Assertion `WideningDecisions.empty() && Uniforms.empty() && Scalars.empty() && "No decisions should have been taken at this point"' failed.

Thanks for reporting. I reproduced the crash but found it also occurs before this landed (I tried commit eeba70a), so I'm not sure it's this patch causing the issue. Could you confirm this?

The patch that causes the assert is patch cba4accda08f90. I tried commit c63799fc52ff247 that is before your patch and there is no assert. Thank you.

c-rhodes mentioned this in D94869: [LV] Fix crash when computing max VF too early.Jan 16 2021, 9:00 AM

In D90687#2501881, @iajbar wrote:

The patch that causes the assert is patch cba4accda08f90. I tried commit c63799fc52ff247 that is before your patch and there is no assert. Thank you.

Apologies, I got this patch mixed up with D91718. You're right the crash is introduced by this patch. I've posted a fix D94869, see patch for more details.

Unrelated to this issue but might be of interest to you, I hit a crash:

Loop IV: clang: /home/culrho01/llvm-project/llvm/include/llvm/Support/Casting.h:104: static bool llvm::isa_impl_cl<To, const From*>::doit(const From*) [with To = llvm::Instruction; From = llvm::Value]: Assertion `Val && "isa<> used on a null pointer"' failed.

when compiling your testcase with -mllvm -debug. Full invocation: ./bin/clang -Os -mhvx -fvectorize -mv67 ../testcase.c -S -o - -mllvm -debug.

bjope added a subscriber: bjope.Jan 17 2021, 12:56 AM

Thanks Cullen! I tried your fix D94869 and my benchmark passes.

c-rhodes mentioned this in rG8cda227432f1: [LV] Fix crash when computing max VF too early.Feb 1 2021, 4:15 AM

In D90687#2512510, @iajbar wrote:

Thanks Cullen! I tried your fix D94869 and my benchmark passes.

Fix has landed, closing this again.

Revision Contents

Path

Size

llvm/

include/

llvm/

Analysis/

LoopAccessAnalysis.h

2 lines

lib/

Analysis/

LoopAccessAnalysis.cpp

4 lines

Transforms/

Vectorize/

LoopVectorize.cpp

25 lines

test/

Transforms/

LoopVectorize/

X86/

libm-vector-calls-VF2-VF8.ll

119 lines

libm-vector-calls-finite.ll

43 lines

libm-vector-calls.ll

51 lines

svml-calls-finite.ll

89 lines

metadata-width.ll

7 lines

preserve-dbg-loc-and-loop-metadata.ll

4 lines

runtime-check.ll

7 lines

unsafe-vf-remark.ll

44 lines

Diff 302809

llvm/include/llvm/Analysis/LoopAccessAnalysis.h

Show All 38 Lines	struct VectorizerParams {
static const unsigned MaxVectorWidth;		static const unsigned MaxVectorWidth;

/// VF as overridden by the user.		/// VF as overridden by the user.
static unsigned VectorizationFactor;		static unsigned VectorizationFactor;
/// Interleave factor as overridden by the user.		/// Interleave factor as overridden by the user.
static unsigned VectorizationInterleave;		static unsigned VectorizationInterleave;
/// True if force-vector-interleave was specified by the user.		/// True if force-vector-interleave was specified by the user.
static bool isInterleaveForced();		static bool isInterleaveForced();
		/// True if force-vector-width was specified by the user.
		static bool isVFForced();

/// \When performing memory disambiguation checks at runtime do not		/// \When performing memory disambiguation checks at runtime do not
/// make more than this number of comparisons.		/// make more than this number of comparisons.
static unsigned RuntimeMemoryCheckThreshold;		static unsigned RuntimeMemoryCheckThreshold;
};		};

/// Checks memory dependences among accesses to the same underlying		/// Checks memory dependences among accesses to the same underlying
/// object to determine whether there vectorization is legal or not (and at		/// object to determine whether there vectorization is legal or not (and at
▲ Show 20 Lines • Show All 710 Lines • Show Last 20 Lines

llvm/lib/Analysis/LoopAccessAnalysis.cpp

Show First 20 Lines • Show All 127 Lines • ▼ Show 20 Lines	static cl::opt<bool> EnableForwardingConflictDetection(
"store-to-load-forwarding-conflict-detection", cl::Hidden,		"store-to-load-forwarding-conflict-detection", cl::Hidden,
cl::desc("Enable conflict detection in loop-access analysis"),		cl::desc("Enable conflict detection in loop-access analysis"),
cl::init(true));		cl::init(true));

bool VectorizerParams::isInterleaveForced() {		bool VectorizerParams::isInterleaveForced() {
return ::VectorizationInterleave.getNumOccurrences() > 0;		return ::VectorizationInterleave.getNumOccurrences() > 0;
}		}

		bool VectorizerParams::isVFForced() {
		return ::VectorizationFactor.getNumOccurrences() > 0;
		}

Value llvm::stripIntegerCast(Value V) {		Value llvm::stripIntegerCast(Value V) {
if (auto *CI = dyn_cast<CastInst>(V))		if (auto *CI = dyn_cast<CastInst>(V))
if (CI->getOperand(0)->getType()->isIntegerTy())		if (CI->getOperand(0)->getType()->isIntegerTy())
return CI->getOperand(0);		return CI->getOperand(0);
return V;		return V;
}		}

const SCEV *llvm::replaceSymbolicStrideSCEV(PredicatedScalarEvolution &PSE,		const SCEV *llvm::replaceSymbolicStrideSCEV(PredicatedScalarEvolution &PSE,
▲ Show 20 Lines • Show All 2,196 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,228 Lines • ▼ Show 20 Lines	Optional<unsigned> LoopVectorizationCostModel::computeMaxVF(unsigned UserVF,
LLVM_DEBUG(dbgs() << "LV: Found trip count: " << TC << '\n');		LLVM_DEBUG(dbgs() << "LV: Found trip count: " << TC << '\n');
if (TC == 1) {		if (TC == 1) {
reportVectorizationFailure("Single iteration (non) loop",		reportVectorizationFailure("Single iteration (non) loop",
"loop trip count is one, irrelevant for vectorization",		"loop trip count is one, irrelevant for vectorization",
"SingleIterationLoop", ORE, TheLoop);		"SingleIterationLoop", ORE, TheLoop);
return None;		return None;
}		}

		auto MaxVF = computeFeasibleMaxVF(TC);
		dmgreenUnsubmitted Not Done Reply Inline Actions Clamping sounds good. But I think that there is a difference between the "maximum safe vector width" and "the maximum register bitwidth". The maximum safe distance should be a legality constraint that doesn't depend on the size of the backend registers. It's perhaps easier to show with examples. This code should produce the same thing (the same width vectors) with and without this patch, I believe, as there is nothing "unsafe" about vectorizing at a higher bitwidth than the vector registers: https://godbolt.org/z/ePdv3K (I am trying to not use the term "legal", as it has too many meanings. There is a difference between "legal to vectorize" (as the safety constraints in LoopVectorizationLegality) and "legal vector widths" which just means that the llvm-ir vector can be lowered to a single vector register and I don't think should be very relevant here). dmgreen: Clamping sounds good. But I think that there is a difference between the "maximum safe vector…
		c-rhodesAuthorUnsubmitted Not Done Reply Inline Actions But I think that there is a difference between the "maximum safe vector width" and "the maximum register bitwidth". The maximum safe distance should be a legality constraint that doesn't depend on the size of the backend registers. So only considering the VF computed by LAA for dependencies rather the backend register widths, I think that makes sense. Whilst looking into this I discovered the example I gave in the commit message doesn't actually vectorize when only specifying `-force-vector-width=4` and no loop hint: void foo(int a, int b, int N) { for (int i=0; i<N; ++i) { a[i + 2] = a[i] + b[i]; } } clang -S -emit-llvm -o - ../dependence.c -O3 -Rpass-missed=loop-vectorize -mllvm -force-vector-width=4 ../dependence.c:2:3: remark: loop not vectorized [-Rpass-missed=loop-vectorize] for (int i=0; i<N; ++i) { ^ It's a bit of a mess how UserVF is handled, it seems LAA only knows about the UserVF specified by `-force-vector-width`. With the pragma LAA is operating on VF=2 and LV on VF=4. What's also interesting is the loop metadata takes precedence over the flag since in `LoopVectorizationLegality` the vector width is initialized with the flag then populated with loop metadata, so the VF according to LAA would come from the flag and in LV the loop hint, assuming both were specified by the user that is. I've updated the patch such that `computeFeasibleMaxVF` now takes `UserVF` and clamps this to VF based on `Legal->getMaxSafeRegisterWidth();` if it exceeds it. This simplifies the patch a fair bit since no existing tests change, although I think the handling of `UserVF` needs refactoring. Now I know `-force-vector-width` won't apply unless it's safe, the loop hint and flag feel semantically equivalent but I'm not sure if there's anything I'm missing. Any thoughts? c-rhodes: > But I think that there is a difference between the "maximum safe vector width" and "the…
		sdesmalenUnsubmitted Not Done Reply Inline Actions the loop hint and flag feel semantically equivalent but I'm not sure if there's anything I'm missing. Any thoughts? To me they read as slightly different things, but it's good to clear their semantics up. I interpret: `-force-vector-width` as "vectorize it with this width and this width only, and fail if not legal". The `LoopHint` as "try to vectorize with this width but if not legal, feel free to ignore the hint and pick a different width". At least, that's what the implementation does today and I assumed that was deliberate. Maybe someone else can clarify that? sdesmalen: > the loop hint and flag feel semantically equivalent but I'm not sure if there's anything I'm…
		MeinersburUnsubmitted Not Done Reply Inline Actions `-force-vector-width` is internal and should be used for regression tests only. Whether it applies a different vector width or does not vectorize at all, the regression tests should fail. Because it is internal, we it shouldn't matter which semantics we chose. Personally, I'd chose the LoopHint semantics to reduce the number of code paths. Meinersbur: `-force-vector-width` is internal and should be used for regression tests only. Whether it…
		fhahnUnsubmitted Not Done Reply Inline Actions I've updated the patch such that computeFeasibleMaxVF now takes UserVF and clamps this to VF based on Legal->getMaxSafeRegisterWidth(); if it exceeds it. This simplifies the patch a fair bit since no existing tests change, although I think the handling of UserVF needs refactoring. Now I know -force-vector-width won't apply unless it's safe, the loop hint and flag feel semantically equivalent but I'm not sure if there's anything I'm missing. Any thoughts? The logic looks good I think. My only concern is that passing UserVF even further down seems to make handling of it even more complicated to follow, but I don't think there's a good way around that because we need some info not available here. At least, that's what the implementation does today and I assumed that was deliberate. Maybe someone else can clarify that? -force-vector-width is internal and should be used for regression tests only. Whether it applies a different vector width or does not vectorize at all, the regression tests should fail. Because it is internal, we it shouldn't matter which semantics we chose. Personally, I'd chose the LoopHint semantics to reduce the number of code paths. Yes that is how it is used today AFAIK, to write LV tests independent of cost-modeling. Agreed with @Meinersbur that it should not matter for regression tests, because I think it is mostly used for testing codegen in legal scenarios. If we decide to adjust the behavior, that's probably best done in a separate patch. fhahn: > I've updated the patch such that computeFeasibleMaxVF now takes UserVF and clamps this to VF…
		c-rhodesAuthorUnsubmitted Not Done Reply Inline Actions @fhahn @Meinersbur thanks for clarifying c-rhodes: @fhahn @Meinersbur thanks for clarifying
		sdesmalenUnsubmitted Not Done Reply Inline Actions That's a good point @dmgreen, `MaxSafeRegisterWidth` and corresponding `getMaxSafeRegisterWidth` in LoopVectorize.cpp and LoopAccessAnalysis.cpp are actually misnomers because it isn't the maximum safe register width that is calculated, but rather the maximum safe vector bitwidth. Without this patch, this example is vectorized with VF=8 when compiling for Neon (128bit vectors): void foo(int a, int b, int c, int N) { #pragma clang loop vectorize(enable) vectorize_width(8) for (int i=0; i<N; ++i) { a[i + 16] = a[i] + b[i]; } } Where with this patch, it is now vectorized with VF=4. It seems like the limitation with regards to the actual physical vector register can and should be removed. sdesmalen: That's a good point @dmgreen, `MaxSafeRegisterWidth` and corresponding…
		if (UserVF) {
		dmgreenUnsubmitted Not Done Reply Inline Actions I don't think computeFeasibleMaxVF is the maximum safe width. It does a lot of things and is largely based on the backend vector widths. I'm guessing that's why so many tests are changing. It should probably be based on Legal.getMaxSafeRegisterWidth()? dmgreen: I don't think computeFeasibleMaxVF is the maximum safe width. It does a lot of things and is…
		fhahnUnsubmitted Not Done Reply Inline Actions It should probably be based on Legal.getMaxSafeRegisterWidth()? computeFeasibleMaxVF should only return legal vectorization factors (using getMaxSafeRegisterWidht) I think. Given that we are free to ignore the hint, if it is not useful, why not just use the largest safe vectorization factors instead of bailing out? Unfortunately the handling of UserVF is a bit of a mess, but I think it might be preferable to clamp the UserVF to the maximum vectorization factor in the caller of computeMaxVF where UserVF is actually used. fhahn: > It should probably be based on Legal.getMaxSafeRegisterWidth()? computeFeasibleMaxVF should…
		c-rhodesAuthorUnsubmitted Not Done Reply Inline Actions It should probably be based on Legal.getMaxSafeRegisterWidth()? computeFeasibleMaxVF should only return legal vectorization factors (using getMaxSafeRegisterWidht) I think. Given that we are free to ignore the hint, if it is not useful, why not just use the largest safe vectorization factors instead of bailing out? Yeah that's right, `computeFeasibleMaxVF` is based on `Legal->getMaxSafeRegisterWidth()` and bounded by the widest register according to the TTI. Unfortunately the handling of UserVF is a bit of a mess, but I think it might be preferable to clamp the UserVF to the maximum vectorization factor in the caller of computeMaxVF where UserVF is actually used. Sure, clamping sounds good to me and @Meinersbur suggested this as well, I'll update the patch. c-rhodes: > > It should probably be based on Legal.getMaxSafeRegisterWidth()? > > computeFeasibleMaxVF…
		MeinersburUnsubmitted Not Done Reply Inline Actions `Legal.getMaxSafeRegisterWidth()` is called within `computeFeasibleMaxVF`. With `computeFeasibleMaxVF` considering additional architecture concerns, can these be just ignored? Meinersbur: `Legal.getMaxSafeRegisterWidth()` is called within `computeFeasibleMaxVF`. With…
		c-rhodesAuthorUnsubmitted Not Done Reply Inline Actions I don't think computeFeasibleMaxVF is the maximum safe width. It does a lot of things and is largely based on the backend vector widths. I'm guessing that's why so many tests are changing. I did find for the X86 tests that have changed it was because of a backend vector width of 128-bit, I expect these changes will still be required when clamping the UserVF to the maximum vectorization factor as suggested. c-rhodes: > I don't think computeFeasibleMaxVF is the maximum safe width. It does a lot of things and is…
		MeinersburUnsubmitted Not Done Reply Inline Actions With `computeFeasibleMaxVF` already called later, could you store its result for both uses? I suggest to move `UserVF ? UserVF : computeFeasibleMaxVF(TC)` which is duplicated multiple times below before this condition. Such as: auto MaxVF = computeFeasibleMaxVF(TC); if (UserVF) { if (UserVF > MaxVF) { ... } MaxVF = UserFC; } Meinersbur: With `computeFeasibleMaxVF` already called later, could you store its result for both uses? I…
		if (!VectorizerParams::isVFForced() && UserVF > MaxVF)
		LLVM_DEBUG(
		dbgs() << "LV: User VF=" << UserVF
		<< " is unsafe, using maximum safe VF=" << MaxVF
		<< ". This can be overridden with '-force-vector-width=X'.\n");
		else
		MaxVF = UserVF;
		}

switch (ScalarEpilogueStatus) {		switch (ScalarEpilogueStatus) {
case CM_ScalarEpilogueAllowed:		case CM_ScalarEpilogueAllowed:
return UserVF ? UserVF : computeFeasibleMaxVF(TC);		return MaxVF;
case CM_ScalarEpilogueNotNeededUsePredicate:		case CM_ScalarEpilogueNotNeededUsePredicate:
LLVM_DEBUG(		LLVM_DEBUG(
dbgs() << "LV: vector predicate hint/switch found.\n"		dbgs() << "LV: vector predicate hint/switch found.\n"
<< "LV: Not allowing scalar epilogue, creating predicated "		<< "LV: Not allowing scalar epilogue, creating predicated "
<< "vector loop.\n");		<< "vector loop.\n");
break;		break;
case CM_ScalarEpilogueNotAllowedLowTripLoop:		case CM_ScalarEpilogueNotAllowedLowTripLoop:
// fallthrough as a special case of OptForSize		// fallthrough as a special case of OptForSize
Show All 19 Lines	Optional<unsigned> LoopVectorizationCostModel::computeMaxVF(unsigned UserVF,
if (!useMaskedInterleavedAccesses(TTI)) {		if (!useMaskedInterleavedAccesses(TTI)) {
assert(WideningDecisions.empty() && Uniforms.empty() && Scalars.empty() &&		assert(WideningDecisions.empty() && Uniforms.empty() && Scalars.empty() &&
"No decisions should have been taken at this point");		"No decisions should have been taken at this point");
// Note: There is no need to invalidate any cost modeling decisions here, as		// Note: There is no need to invalidate any cost modeling decisions here, as
// non where taken so far.		// non where taken so far.
InterleaveInfo.invalidateGroupsRequiringScalarEpilogue();		InterleaveInfo.invalidateGroupsRequiringScalarEpilogue();
}		}

unsigned MaxVF = UserVF ? UserVF : computeFeasibleMaxVF(TC);		assert(isPowerOf2_32(MaxVF) && "MaxVF must be a power of 2");
assert((UserVF \|\| isPowerOf2_32(MaxVF)) && "MaxVF must be a power of 2");
unsigned MaxVFtimesIC = UserIC ? MaxVF * UserIC : MaxVF;		unsigned MaxVFtimesIC = UserIC ? MaxVF * UserIC : MaxVF;
if (TC > 0 && TC % MaxVFtimesIC == 0) {		if (TC > 0 && TC % MaxVFtimesIC == 0) {
// Accept MaxVF if we do not have a tail.		// Accept MaxVF if we do not have a tail.
LLVM_DEBUG(dbgs() << "LV: No tail will remain for any chosen VF.\n");		LLVM_DEBUG(dbgs() << "LV: No tail will remain for any chosen VF.\n");
return MaxVF;		return MaxVF;
}		}

// If we don't know the precise trip count, or if the trip count that we		// If we don't know the precise trip count, or if the trip count that we
▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	LoopVectorizationCostModel::computeFeasibleMaxVF(unsigned ConstTripCount) {
// Get the maximum safe dependence distance in bits computed by LAA.		// Get the maximum safe dependence distance in bits computed by LAA.
// It is computed by MaxVF * sizeOf(type) * 8, where type is taken from		// It is computed by MaxVF * sizeOf(type) * 8, where type is taken from
// the memory accesses that is most restrictive (involved in the smallest		// the memory accesses that is most restrictive (involved in the smallest
// dependence distance).		// dependence distance).
unsigned MaxSafeRegisterWidth = Legal->getMaxSafeRegisterWidth();		unsigned MaxSafeRegisterWidth = Legal->getMaxSafeRegisterWidth();

WidestRegister = std::min(WidestRegister, MaxSafeRegisterWidth);		WidestRegister = std::min(WidestRegister, MaxSafeRegisterWidth);

// Ensure MaxVF is a power of 2; the dependence distance bound may not be.		// Ensure MaxVF is a power of 2; the dependence distance bound may not be.
		MeinersburUnsubmitted Done Reply Inline Actions [style] LLVM coding style prefers explicit types in declarations. Meinersbur: [style] LLVM coding style [[ https://llvm.org/docs/CodingStandards.html#use-auto-type-deduction…
// Note that both WidestRegister and WidestType may not be a powers of 2.		// Note that both WidestRegister and WidestType may not be a powers of 2.
unsigned MaxVectorSize = PowerOf2Floor(WidestRegister / WidestType);		unsigned MaxVectorSize = PowerOf2Floor(WidestRegister / WidestType);

LLVM_DEBUG(dbgs() << "LV: The Smallest and Widest types: " << SmallestType		LLVM_DEBUG(dbgs() << "LV: The Smallest and Widest types: " << SmallestType
<< " / " << WidestType << " bits.\n");		<< " / " << WidestType << " bits.\n");
		fhahnUnsubmitted Not Done Reply Inline Actions nit: it seems like the message here and of the remark slightly diverged. I think it would be worth using the same 'clamping' wording as in the remark here. fhahn: nit: it seems like the message here and of the remark slightly diverged. I think it would be…
		c-rhodesAuthorUnsubmitted Done Reply Inline Actions nit: it seems like the message here and of the remark slightly diverged. I think it would be worth using the same 'clamping' wording as in the remark here. Good spot, I'll fix it before merging or update this patch once I get a better idea about the other suggestion you made. c-rhodes: > nit: it seems like the message here and of the remark slightly diverged. I think it would be…
LLVM_DEBUG(dbgs() << "LV: The Widest register safe to use is: "		LLVM_DEBUG(dbgs() << "LV: The Widest register safe to use is: "
		MeinersburUnsubmitted Done Reply Inline Actions Should there be (also) a diagnostic warning (-Rpass) to inform the user that the value has been clamped? (Or maybe there is already and I don't see where it is done) Meinersbur: Should there be (also) a diagnostic warning (-Rpass) to inform the user that the value has been…
		fhahnUnsubmitted Done Reply Inline Actions I think we currently would only issue a remark with the chosen VF, but it would probably good to add a separate remark so it is easy for users to spot (`OptimizationRemarkAnalysis` seems to be a suitable category here). fhahn: I think we currently would only issue a remark with the chosen VF, but it would probably good…
		c-rhodesAuthorUnsubmitted Not Done Reply Inline Actions I think we currently would only issue a remark with the chosen VF, but it would probably good to add a separate remark so it is easy for users to spot (OptimizationRemarkAnalysis seems to be a suitable category here). Done, thanks for pointing out OptimizationRemarkAnalysis c-rhodes: > I think we currently would only issue a remark with the chosen VF, but it would probably good…
<< WidestRegister << " bits.\n");		<< WidestRegister << " bits.\n");

assert(MaxVectorSize <= 256 && "Did not expect to pack so many elements"		assert(MaxVectorSize <= 256 && "Did not expect to pack so many elements"
" into one vector!");		" into one vector!");
if (MaxVectorSize == 0) {		if (MaxVectorSize == 0) {
LLVM_DEBUG(dbgs() << "LV: The target has no vector registers.\n");		LLVM_DEBUG(dbgs() << "LV: The target has no vector registers.\n");
MaxVectorSize = 1;		MaxVectorSize = 1;
return MaxVectorSize;		return MaxVectorSize;
▲ Show 20 Lines • Show All 1,642 Lines • ▼ Show 20 Lines	LLVM_DEBUG(
"which requires masked-interleaved support.\n");		"which requires masked-interleaved support.\n");
if (CM.InterleaveInfo.invalidateGroups())		if (CM.InterleaveInfo.invalidateGroups())
// Invalidating interleave groups also requires invalidating all decisions		// Invalidating interleave groups also requires invalidating all decisions
// based on them, which includes widening decisions and uniform and scalar		// based on them, which includes widening decisions and uniform and scalar
// values.		// values.
CM.invalidateCostModelingDecisions();		CM.invalidateCostModelingDecisions();
}		}

if (!UserVF.isZero()) {		unsigned MaxVF = MaybeMaxVF.getValue();
		assert(MaxVF != 0 && "MaxVF is zero.");

		if (!UserVF.isZero() &&
		(UserVF.getKnownMinValue() <= MaxVF \|\| VectorizerParams::isVFForced())) {
LLVM_DEBUG(dbgs() << "LV: Using user VF " << UserVF << ".\n");		LLVM_DEBUG(dbgs() << "LV: Using user VF " << UserVF << ".\n");
assert(isPowerOf2_32(UserVF.getKnownMinValue()) &&		assert(isPowerOf2_32(UserVF.getKnownMinValue()) &&
"VF needs to be a power of two");		"VF needs to be a power of two");
// Collect the instructions (and their associated costs) that will be more		// Collect the instructions (and their associated costs) that will be more
// profitable to scalarize.		// profitable to scalarize.
CM.selectUserVectorizationFactor(UserVF);		CM.selectUserVectorizationFactor(UserVF);
CM.collectInLoopReductions();		CM.collectInLoopReductions();
buildVPlansWithVPRecipes(UserVF.getKnownMinValue(),		buildVPlansWithVPRecipes(UserVF.getKnownMinValue(),
UserVF.getKnownMinValue());		UserVF.getKnownMinValue());
LLVM_DEBUG(printPlans(dbgs()));		LLVM_DEBUG(printPlans(dbgs()));
return {{UserVF, 0}};		return {{UserVF, 0}};
}		}

unsigned MaxVF = MaybeMaxVF.getValue();
assert(MaxVF != 0 && "MaxVF is zero.");

for (unsigned VF = 1; VF <= MaxVF; VF *= 2) {		for (unsigned VF = 1; VF <= MaxVF; VF *= 2) {
// Collect Uniform and Scalar instructions after vectorization with VF.		// Collect Uniform and Scalar instructions after vectorization with VF.
CM.collectUniformsAndScalars(ElementCount::getFixed(VF));		CM.collectUniformsAndScalars(ElementCount::getFixed(VF));

// Collect the instructions (and their associated costs) that will be more		// Collect the instructions (and their associated costs) that will be more
// profitable to scalarize.		// profitable to scalarize.
if (VF > 1)		if (VF > 1)
CM.collectInstsToScalarize(ElementCount::getFixed(VF));		CM.collectInstsToScalarize(ElementCount::getFixed(VF));
▲ Show 20 Lines • Show All 1,678 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/X86/libm-vector-calls-VF2-VF8.ll

; RUN: opt -vector-library=LIBMVEC-X86 -inject-tli-mappings -loop-vectorize -S < %s \| FileCheck %s		; RUN: opt -vector-library=LIBMVEC-X86 -inject-tli-mappings -loop-vectorize -S < %s \| FileCheck %s --check-prefix=CHECK-VF2
		; RUN: opt -vector-library=LIBMVEC-X86 -inject-tli-mappings -loop-vectorize -force-vector-width=8 -S < %s \| FileCheck %s --check-prefix=CHECK-VF8

target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"		target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"		target triple = "x86_64-unknown-linux-gnu"

define void @sin_f64(double* nocapture %varray) {		define void @sin_f64(double* nocapture %varray) {
; CHECK-LABEL: @sin_f64(		; CHECK-VF2-LABEL: @sin_f64(
; CHECK-LABEL: vector.body		; CHECK-VF2-LABEL: vector.body
; CHECK: [[TMP5:%.]] = call <2 x double> @_ZGVbN2v_sin(<2 x double> [[TMP4:%.]])		; CHECK-VF2: [[TMP5:%.]] = call <2 x double> @_ZGVbN2v_sin(<2 x double> [[TMP4:%.]])
;		;
entry:		entry:
br label %for.body		br label %for.body

for.body:		for.body:
%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]		%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
%tmp = trunc i64 %iv to i32		%tmp = trunc i64 %iv to i32
%conv = sitofp i32 %tmp to double		%conv = sitofp i32 %tmp to double
Show All 9 Lines
}		}

!1 = distinct !{!1, !2, !3}		!1 = distinct !{!1, !2, !3}
!2 = !{!"llvm.loop.vectorize.width", i32 2}		!2 = !{!"llvm.loop.vectorize.width", i32 2}
!3 = !{!"llvm.loop.vectorize.enable", i1 true}		!3 = !{!"llvm.loop.vectorize.enable", i1 true}


define void @sin_f32(float* nocapture %varray) {		define void @sin_f32(float* nocapture %varray) {
; CHECK-LABEL: @sin_f32(		; CHECK-VF8-LABEL: @sin_f32(
; CHECK-LABEL: vector.body		; CHECK-VF8-LABEL: vector.body
; CHECK: [[TMP5:%.]] = call <8 x float> @_ZGVdN8v_sinf(<8 x float> [[TMP4:%.]])		; CHECK-VF8: [[TMP5:%.]] = call <8 x float> @_ZGVdN8v_sinf(<8 x float> [[TMP4:%.]])
;		;
entry:		entry:
br label %for.body		br label %for.body

for.body:		for.body:
%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]		%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
%tmp = trunc i64 %iv to i32		%tmp = trunc i64 %iv to i32
%conv = sitofp i32 %tmp to float		%conv = sitofp i32 %tmp to float
%call = tail call float @sinf(float %conv)		%call = tail call float @sinf(float %conv)
%arrayidx = getelementptr inbounds float, float* %varray, i64 %iv		%arrayidx = getelementptr inbounds float, float* %varray, i64 %iv
store float %call, float* %arrayidx, align 4		store float %call, float* %arrayidx, align 4
%iv.next = add nuw nsw i64 %iv, 1		%iv.next = add nuw nsw i64 %iv, 1
%exitcond = icmp eq i64 %iv.next, 1000		%exitcond = icmp eq i64 %iv.next, 1000
br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !21		br i1 %exitcond, label %for.end, label %for.body

for.end:		for.end:
ret void		ret void
}		}

!21 = distinct !{!21, !22, !23}
!22 = !{!"llvm.loop.vectorize.width", i32 8}
!23 = !{!"llvm.loop.vectorize.enable", i1 true}

define void @sin_f64_intrinsic(double* nocapture %varray) {		define void @sin_f64_intrinsic(double* nocapture %varray) {
; CHECK-LABEL: @sin_f64_intrinsic(		; CHECK-LABEL: @sin_f64_intrinsic(
; CHECK-LABEL: vector.body		; CHECK-LABEL: vector.body
; CHECK: [[TMP5:%.]] = call <2 x double> @_ZGVbN2v_sin(<2 x double> [[TMP4:%.]])		; CHECK: [[TMP5:%.]] = call <2 x double> @_ZGVbN2v_sin(<2 x double> [[TMP4:%.]])
;		;
entry:		entry:
br label %for.body		br label %for.body

Show All 12 Lines	for.end:
ret void		ret void
}		}

!31 = distinct !{!31, !32, !33}		!31 = distinct !{!31, !32, !33}
!32 = !{!"llvm.loop.vectorize.width", i32 2}		!32 = !{!"llvm.loop.vectorize.width", i32 2}
!33 = !{!"llvm.loop.vectorize.enable", i1 true}		!33 = !{!"llvm.loop.vectorize.enable", i1 true}

define void @sin_f32_intrinsic(float* nocapture %varray) {		define void @sin_f32_intrinsic(float* nocapture %varray) {
; CHECK-LABEL: @sin_f32_intrinsic(		; CHECK-VF8-LABEL: @sin_f32_intrinsic(
; CHECK-LABEL: vector.body		; CHECK-VF8-LABEL: vector.body
; CHECK: [[TMP5:%.]] = call <8 x float> @_ZGVdN8v_sinf(<8 x float> [[TMP4:%.]])		; CHECK-VF8: [[TMP5:%.]] = call <8 x float> @_ZGVdN8v_sinf(<8 x float> [[TMP4:%.]])
;		;
entry:		entry:
br label %for.body		br label %for.body

for.body:		for.body:
%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]		%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
%tmp = trunc i64 %iv to i32		%tmp = trunc i64 %iv to i32
%conv = sitofp i32 %tmp to float		%conv = sitofp i32 %tmp to float
%call = tail call float @llvm.sin.f32(float %conv)		%call = tail call float @llvm.sin.f32(float %conv)
%arrayidx = getelementptr inbounds float, float* %varray, i64 %iv		%arrayidx = getelementptr inbounds float, float* %varray, i64 %iv
store float %call, float* %arrayidx, align 4		store float %call, float* %arrayidx, align 4
%iv.next = add nuw nsw i64 %iv, 1		%iv.next = add nuw nsw i64 %iv, 1
%exitcond = icmp eq i64 %iv.next, 1000		%exitcond = icmp eq i64 %iv.next, 1000
br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !41		br i1 %exitcond, label %for.end, label %for.body

for.end:		for.end:
ret void		ret void
}		}

!41 = distinct !{!41, !42, !43}
!42 = !{!"llvm.loop.vectorize.width", i32 8}
!43 = !{!"llvm.loop.vectorize.enable", i1 true}

define void @cos_f64(double* nocapture %varray) {		define void @cos_f64(double* nocapture %varray) {
; CHECK-LABEL: @cos_f64(		; CHECK-LABEL: @cos_f64(
; CHECK-LABEL: vector.body		; CHECK-LABEL: vector.body
; CHECK: [[TMP5:%.]] = call <2 x double> @_ZGVbN2v_cos(<2 x double> [[TMP4:%.]])		; CHECK: [[TMP5:%.]] = call <2 x double> @_ZGVbN2v_cos(<2 x double> [[TMP4:%.]])
;		;
entry:		entry:
br label %for.body		br label %for.body

Show All 12 Lines	for.end:
ret void		ret void
}		}

!51 = distinct !{!51, !52, !53}		!51 = distinct !{!51, !52, !53}
!52 = !{!"llvm.loop.vectorize.width", i32 2}		!52 = !{!"llvm.loop.vectorize.width", i32 2}
!53 = !{!"llvm.loop.vectorize.enable", i1 true}		!53 = !{!"llvm.loop.vectorize.enable", i1 true}

define void @cos_f32(float* nocapture %varray) {		define void @cos_f32(float* nocapture %varray) {
; CHECK-LABEL: @cos_f32(		; CHECK-VF8-LABEL: @cos_f32(
; CHECK-LABEL: vector.body		; CHECK-VF8-LABEL: vector.body
; CHECK: [[TMP5:%.]] = call <8 x float> @_ZGVdN8v_cosf(<8 x float> [[TMP4:%.]])		; CHECK-VF8: [[TMP5:%.]] = call <8 x float> @_ZGVdN8v_cosf(<8 x float> [[TMP4:%.]])
;		;
entry:		entry:
br label %for.body		br label %for.body

for.body:		for.body:
%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]		%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
%tmp = trunc i64 %iv to i32		%tmp = trunc i64 %iv to i32
%conv = sitofp i32 %tmp to float		%conv = sitofp i32 %tmp to float
%call = tail call float @cosf(float %conv)		%call = tail call float @cosf(float %conv)
%arrayidx = getelementptr inbounds float, float* %varray, i64 %iv		%arrayidx = getelementptr inbounds float, float* %varray, i64 %iv
store float %call, float* %arrayidx, align 4		store float %call, float* %arrayidx, align 4
%iv.next = add nuw nsw i64 %iv, 1		%iv.next = add nuw nsw i64 %iv, 1
%exitcond = icmp eq i64 %iv.next, 1000		%exitcond = icmp eq i64 %iv.next, 1000
br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !61		br i1 %exitcond, label %for.end, label %for.body

for.end:		for.end:
ret void		ret void
}		}

!61 = distinct !{!61, !62, !63}
!62 = !{!"llvm.loop.vectorize.width", i32 8}
!63 = !{!"llvm.loop.vectorize.enable", i1 true}

define void @cos_f64_intrinsic(double* nocapture %varray) {		define void @cos_f64_intrinsic(double* nocapture %varray) {
; CHECK-LABEL: @cos_f64_intrinsic(		; CHECK-LABEL: @cos_f64_intrinsic(
; CHECK-LABEL: vector.body		; CHECK-LABEL: vector.body
; CHECK: [[TMP5:%.]] = call <2 x double> @_ZGVbN2v_cos(<2 x double> [[TMP4:%.]])		; CHECK: [[TMP5:%.]] = call <2 x double> @_ZGVbN2v_cos(<2 x double> [[TMP4:%.]])
;		;
entry:		entry:
br label %for.body		br label %for.body

Show All 12 Lines	for.end:
ret void		ret void
}		}

!71 = distinct !{!71, !72, !73}		!71 = distinct !{!71, !72, !73}
!72 = !{!"llvm.loop.vectorize.width", i32 2}		!72 = !{!"llvm.loop.vectorize.width", i32 2}
!73 = !{!"llvm.loop.vectorize.enable", i1 true}		!73 = !{!"llvm.loop.vectorize.enable", i1 true}

define void @cos_f32_intrinsic(float* nocapture %varray) {		define void @cos_f32_intrinsic(float* nocapture %varray) {
; CHECK-LABEL: @cos_f32_intrinsic(		; CHECK-VF8-LABEL: @cos_f32_intrinsic(
; CHECK-LABEL: vector.body		; CHECK-VF8-LABEL: vector.body
; CHECK: [[TMP5:%.]] = call <8 x float> @_ZGVdN8v_cosf(<8 x float> [[TMP4:%.]])		; CHECK-VF8: [[TMP5:%.]] = call <8 x float> @_ZGVdN8v_cosf(<8 x float> [[TMP4:%.]])
;		;
entry:		entry:
br label %for.body		br label %for.body

for.body:		for.body:
%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]		%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
%tmp = trunc i64 %iv to i32		%tmp = trunc i64 %iv to i32
%conv = sitofp i32 %tmp to float		%conv = sitofp i32 %tmp to float
%call = tail call float @llvm.cos.f32(float %conv)		%call = tail call float @llvm.cos.f32(float %conv)
%arrayidx = getelementptr inbounds float, float* %varray, i64 %iv		%arrayidx = getelementptr inbounds float, float* %varray, i64 %iv
store float %call, float* %arrayidx, align 4		store float %call, float* %arrayidx, align 4
%iv.next = add nuw nsw i64 %iv, 1		%iv.next = add nuw nsw i64 %iv, 1
%exitcond = icmp eq i64 %iv.next, 1000		%exitcond = icmp eq i64 %iv.next, 1000
br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !81		br i1 %exitcond, label %for.end, label %for.body

for.end:		for.end:
ret void		ret void
}		}

!81 = distinct !{!81, !82, !83}
!82 = !{!"llvm.loop.vectorize.width", i32 8}
!83 = !{!"llvm.loop.vectorize.enable", i1 true}


define void @exp_f32(float* nocapture %varray) {		define void @exp_f32(float* nocapture %varray) {
; CHECK-LABEL: @exp_f32		; CHECK-VF8-LABEL: @exp_f32
; CHECK-LABEL: vector.body		; CHECK-VF8-LABEL: vector.body
; CHECK: <8 x float> @_ZGVdN8v_expf		; CHECK-VF8: <8 x float> @_ZGVdN8v_expf
entry:		entry:
br label %for.body		br label %for.body

for.body: ; preds = %for.body, %entry		for.body: ; preds = %for.body, %entry
%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]		%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
%tmp = trunc i64 %indvars.iv to i32		%tmp = trunc i64 %indvars.iv to i32
%conv = sitofp i32 %tmp to float		%conv = sitofp i32 %tmp to float
%call = tail call fast float @expf(float %conv)		%call = tail call fast float @expf(float %conv)
%arrayidx = getelementptr inbounds float, float* %varray, i64 %indvars.iv		%arrayidx = getelementptr inbounds float, float* %varray, i64 %indvars.iv
store float %call, float* %arrayidx, align 4		store float %call, float* %arrayidx, align 4
%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1		%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
%exitcond = icmp eq i64 %indvars.iv.next, 1000		%exitcond = icmp eq i64 %indvars.iv.next, 1000
br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !91		br i1 %exitcond, label %for.end, label %for.body

for.end: ; preds = %for.body		for.end: ; preds = %for.body
ret void		ret void
}		}

!91 = distinct !{!91, !92, !93}
!92 = !{!"llvm.loop.vectorize.width", i32 8}
!93 = !{!"llvm.loop.vectorize.enable", i1 true}

define void @exp_f32_intrin(float* nocapture %varray) {		define void @exp_f32_intrin(float* nocapture %varray) {
; CHECK-LABEL: @exp_f32_intrin		; CHECK-VF8-LABEL: @exp_f32_intrin
; CHECK-LABEL: vector.body		; CHECK-VF8-LABEL: vector.body
; CHECK: <8 x float> @_ZGVdN8v_expf		; CHECK-VF8: <8 x float> @_ZGVdN8v_expf
entry:		entry:
br label %for.body		br label %for.body

for.body: ; preds = %for.body, %entry		for.body: ; preds = %for.body, %entry
%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]		%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
%tmp = trunc i64 %indvars.iv to i32		%tmp = trunc i64 %indvars.iv to i32
%conv = sitofp i32 %tmp to float		%conv = sitofp i32 %tmp to float
%call = tail call fast float @llvm.exp.f32(float %conv)		%call = tail call fast float @llvm.exp.f32(float %conv)
%arrayidx = getelementptr inbounds float, float* %varray, i64 %indvars.iv		%arrayidx = getelementptr inbounds float, float* %varray, i64 %indvars.iv
store float %call, float* %arrayidx, align 4		store float %call, float* %arrayidx, align 4
%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1		%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
%exitcond = icmp eq i64 %indvars.iv.next, 1000		%exitcond = icmp eq i64 %indvars.iv.next, 1000
br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !101		br i1 %exitcond, label %for.end, label %for.body

for.end: ; preds = %for.body		for.end: ; preds = %for.body
ret void		ret void
}		}

!101 = distinct !{!101, !102, !103}
!102 = !{!"llvm.loop.vectorize.width", i32 8}
!103 = !{!"llvm.loop.vectorize.enable", i1 true}


define void @log_f32(float* nocapture %varray) {		define void @log_f32(float* nocapture %varray) {
; CHECK-LABEL: @log_f32		; CHECK-VF8-LABEL: @log_f32
; CHECK-LABEL: vector.body		; CHECK-VF8-LABEL: vector.body
; CHECK: <8 x float> @_ZGVdN8v_logf		; CHECK-VF8: <8 x float> @_ZGVdN8v_logf
entry:		entry:
br label %for.body		br label %for.body

for.body: ; preds = %for.body, %entry		for.body: ; preds = %for.body, %entry
%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]		%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
%tmp = trunc i64 %indvars.iv to i32		%tmp = trunc i64 %indvars.iv to i32
%conv = sitofp i32 %tmp to float		%conv = sitofp i32 %tmp to float
%call = tail call fast float @logf(float %conv)		%call = tail call fast float @logf(float %conv)
%arrayidx = getelementptr inbounds float, float* %varray, i64 %indvars.iv		%arrayidx = getelementptr inbounds float, float* %varray, i64 %indvars.iv
store float %call, float* %arrayidx, align 4		store float %call, float* %arrayidx, align 4
%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1		%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
%exitcond = icmp eq i64 %indvars.iv.next, 1000		%exitcond = icmp eq i64 %indvars.iv.next, 1000
br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !111		br i1 %exitcond, label %for.end, label %for.body

for.end: ; preds = %for.body		for.end: ; preds = %for.body
ret void		ret void
}		}

!111 = distinct !{!111, !112, !113}
!112 = !{!"llvm.loop.vectorize.width", i32 8}
!113 = !{!"llvm.loop.vectorize.enable", i1 true}

define void @pow_f32(float* nocapture %varray, float* nocapture readonly %exp) {		define void @pow_f32(float* nocapture %varray, float* nocapture readonly %exp) {
; CHECK-LABEL: @pow_f32		; CHECK-VF8-LABEL: @pow_f32
; CHECK-LABEL: vector.body		; CHECK-VF8-LABEL: vector.body
; CHECK: <8 x float> @_ZGVdN8vv_powf		; CHECK-VF8: <8 x float> @_ZGVdN8vv_powf
entry:		entry:
br label %for.body		br label %for.body

for.body: ; preds = %for.body, %entry		for.body: ; preds = %for.body, %entry
%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]		%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
%tmp = trunc i64 %indvars.iv to i32		%tmp = trunc i64 %indvars.iv to i32
%conv = sitofp i32 %tmp to float		%conv = sitofp i32 %tmp to float
%arrayidx = getelementptr inbounds float, float* %exp, i64 %indvars.iv		%arrayidx = getelementptr inbounds float, float* %exp, i64 %indvars.iv
%tmp1 = load float, float* %arrayidx, align 4		%tmp1 = load float, float* %arrayidx, align 4
%tmp2 = tail call fast float @powf(float %conv, float %tmp1)		%tmp2 = tail call fast float @powf(float %conv, float %tmp1)
%arrayidx2 = getelementptr inbounds float, float* %varray, i64 %indvars.iv		%arrayidx2 = getelementptr inbounds float, float* %varray, i64 %indvars.iv
store float %tmp2, float* %arrayidx2, align 4		store float %tmp2, float* %arrayidx2, align 4
%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1		%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
%exitcond = icmp eq i64 %indvars.iv.next, 1000		%exitcond = icmp eq i64 %indvars.iv.next, 1000
br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !121		br i1 %exitcond, label %for.end, label %for.body

for.end: ; preds = %for.body		for.end: ; preds = %for.body
ret void		ret void
}		}

!121 = distinct !{!121, !122, !123}
!122 = !{!"llvm.loop.vectorize.width", i32 8}
!123 = !{!"llvm.loop.vectorize.enable", i1 true}

define void @pow_f32_intrin(float* nocapture %varray, float* nocapture readonly %exp) {		define void @pow_f32_intrin(float* nocapture %varray, float* nocapture readonly %exp) {
; CHECK-LABEL: @pow_f32_intrin		; CHECK-VF8-LABEL: @pow_f32_intrin
; CHECK-LABEL: vector.body		; CHECK-VF8-LABEL: vector.body
; CHECK: <8 x float> @_ZGVdN8vv_powf		; CHECK-VF8: <8 x float> @_ZGVdN8vv_powf
entry:		entry:
br label %for.body		br label %for.body

for.body: ; preds = %for.body, %entry		for.body: ; preds = %for.body, %entry
%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]		%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
%tmp = trunc i64 %indvars.iv to i32		%tmp = trunc i64 %indvars.iv to i32
%conv = sitofp i32 %tmp to float		%conv = sitofp i32 %tmp to float
%arrayidx = getelementptr inbounds float, float* %exp, i64 %indvars.iv		%arrayidx = getelementptr inbounds float, float* %exp, i64 %indvars.iv
%tmp1 = load float, float* %arrayidx, align 4		%tmp1 = load float, float* %arrayidx, align 4
%tmp2 = tail call fast float @llvm.pow.f32(float %conv, float %tmp1)		%tmp2 = tail call fast float @llvm.pow.f32(float %conv, float %tmp1)
%arrayidx2 = getelementptr inbounds float, float* %varray, i64 %indvars.iv		%arrayidx2 = getelementptr inbounds float, float* %varray, i64 %indvars.iv
store float %tmp2, float* %arrayidx2, align 4		store float %tmp2, float* %arrayidx2, align 4
%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1		%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
%exitcond = icmp eq i64 %indvars.iv.next, 1000		%exitcond = icmp eq i64 %indvars.iv.next, 1000
br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !131		br i1 %exitcond, label %for.end, label %for.body

for.end: ; preds = %for.body		for.end: ; preds = %for.body
ret void		ret void
}		}

!131 = distinct !{!131, !132, !133}
!132 = !{!"llvm.loop.vectorize.width", i32 8}
!133 = !{!"llvm.loop.vectorize.enable", i1 true}

attributes #0 = { nounwind readnone }		attributes #0 = { nounwind readnone }

declare double @sin(double) #0		declare double @sin(double) #0
declare float @sinf(float) #0		declare float @sinf(float) #0
declare double @llvm.sin.f64(double) #0		declare double @llvm.sin.f64(double) #0
declare float @llvm.sin.f32(float) #0		declare float @llvm.sin.f32(float) #0
declare double @cos(double) #0		declare double @cos(double) #0
declare float @cosf(float) #0		declare float @cosf(float) #0
declare double @llvm.cos.f64(double) #0		declare double @llvm.cos.f64(double) #0
declare float @llvm.cos.f32(float) #0		declare float @llvm.cos.f32(float) #0
declare float @expf(float) #0		declare float @expf(float) #0
declare float @powf(float, float) #0		declare float @powf(float, float) #0
declare float @llvm.exp.f32(float) #0		declare float @llvm.exp.f32(float) #0
declare float @logf(float) #0		declare float @logf(float) #0
declare float @llvm.pow.f32(float, float) #0		declare float @llvm.pow.f32(float, float) #0

llvm/test/Transforms/LoopVectorize/X86/libm-vector-calls-finite.ll

; RUN: opt -vector-library=LIBMVEC-X86 -inject-tli-mappings -loop-vectorize -S < %s \| FileCheck %s		; RUN: opt -vector-library=LIBMVEC-X86 -inject-tli-mappings -loop-vectorize -S < %s \| FileCheck %s
		; RUN: opt -vector-library=LIBMVEC-X86 -inject-tli-mappings -loop-vectorize -force-vector-width=4 -S < %s \| FileCheck %s --check-prefix=CHECK-FORCE-VF4
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"		target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"		target triple = "x86_64-unknown-linux-gnu"

define void @exp_f32(float* nocapture %varray) {		define void @exp_f32(float* nocapture %varray) {
; CHECK-LABEL: @exp_f32		; CHECK-LABEL: @exp_f32
; CHECK-LABEL: vector.body		; CHECK-LABEL: vector.body
; CHECK: <4 x float> @_ZGVbN4v___expf_finite		; CHECK: <4 x float> @_ZGVbN4v___expf_finite
; CHECK: ret		; CHECK: ret
Show All 15 Lines	for.end: ; preds = %for.body
ret void		ret void
}		}

!1 = distinct !{!1, !2, !3}		!1 = distinct !{!1, !2, !3}
!2 = !{!"llvm.loop.vectorize.width", i32 4}		!2 = !{!"llvm.loop.vectorize.width", i32 4}
!3 = !{!"llvm.loop.vectorize.enable", i1 true}		!3 = !{!"llvm.loop.vectorize.enable", i1 true}

define void @exp_f64(double* nocapture %varray) {		define void @exp_f64(double* nocapture %varray) {
; CHECK-LABEL: @exp_f64		; CHECK-FORCE-VF4-LABEL: @exp_f64
; CHECK-LABEL: vector.body		; CHECK-FORCE-VF4-LABEL: vector.body
; CHECK: <4 x double> @_ZGVdN4v___exp_finite		; CHECK-FORCE-VF4: <4 x double> @_ZGVdN4v___exp_finite
; CHECK: ret		; CHECK-FORCE-VF4: ret
entry:		entry:
br label %for.body		br label %for.body

for.body: ; preds = %for.body, %entry		for.body: ; preds = %for.body, %entry
%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]		%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
%tmp = trunc i64 %indvars.iv to i32		%tmp = trunc i64 %indvars.iv to i32
%conv = sitofp i32 %tmp to double		%conv = sitofp i32 %tmp to double
%call = tail call fast double @__exp_finite(double %conv)		%call = tail call fast double @__exp_finite(double %conv)
%arrayidx = getelementptr inbounds double, double* %varray, i64 %indvars.iv		%arrayidx = getelementptr inbounds double, double* %varray, i64 %indvars.iv
store double %call, double* %arrayidx, align 4		store double %call, double* %arrayidx, align 4
%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1		%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
%exitcond = icmp eq i64 %indvars.iv.next, 1000		%exitcond = icmp eq i64 %indvars.iv.next, 1000
br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !11		br i1 %exitcond, label %for.end, label %for.body

for.end: ; preds = %for.body		for.end: ; preds = %for.body
ret void		ret void
}		}

!11 = distinct !{!11, !12, !13}
!12 = !{!"llvm.loop.vectorize.width", i32 4}
!13 = !{!"llvm.loop.vectorize.enable", i1 true}

define void @log_f32(float* nocapture %varray) {		define void @log_f32(float* nocapture %varray) {
; CHECK-LABEL: @log_f32		; CHECK-LABEL: @log_f32
; CHECK-LABEL: vector.body		; CHECK-LABEL: vector.body
; CHECK: <4 x float> @_ZGVbN4v___logf_finite		; CHECK: <4 x float> @_ZGVbN4v___logf_finite
; CHECK: ret		; CHECK: ret
entry:		entry:
br label %for.body		br label %for.body

Show All 12 Lines	for.end: ; preds = %for.body
ret void		ret void
}		}

!21 = distinct !{!21, !22, !23}		!21 = distinct !{!21, !22, !23}
!22 = !{!"llvm.loop.vectorize.width", i32 4}		!22 = !{!"llvm.loop.vectorize.width", i32 4}
!23 = !{!"llvm.loop.vectorize.enable", i1 true}		!23 = !{!"llvm.loop.vectorize.enable", i1 true}

define void @log_f64(double* nocapture %varray) {		define void @log_f64(double* nocapture %varray) {
; CHECK-LABEL: @log_f64		; CHECK-FORCE-VF4-LABEL: @log_f64
; CHECK-LABEL: vector.body		; CHECK-FORCE-VF4-LABEL: vector.body
; CHECK: <4 x double> @_ZGVdN4v___log_finite		; CHECK-FORCE-VF4: <4 x double> @_ZGVdN4v___log_finite
; CHECK: ret		; CHECK-FORCE-VF4: ret
entry:		entry:
br label %for.body		br label %for.body

for.body: ; preds = %for.body, %entry		for.body: ; preds = %for.body, %entry
%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]		%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
%tmp = trunc i64 %indvars.iv to i32		%tmp = trunc i64 %indvars.iv to i32
%conv = sitofp i32 %tmp to double		%conv = sitofp i32 %tmp to double
%call = tail call fast double @__log_finite(double %conv)		%call = tail call fast double @__log_finite(double %conv)
%arrayidx = getelementptr inbounds double, double* %varray, i64 %indvars.iv		%arrayidx = getelementptr inbounds double, double* %varray, i64 %indvars.iv
store double %call, double* %arrayidx, align 4		store double %call, double* %arrayidx, align 4
%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1		%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
%exitcond = icmp eq i64 %indvars.iv.next, 1000		%exitcond = icmp eq i64 %indvars.iv.next, 1000
br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !31		br i1 %exitcond, label %for.end, label %for.body

for.end: ; preds = %for.body		for.end: ; preds = %for.body
ret void		ret void
}		}

!31 = distinct !{!31, !32, !33}
!32 = !{!"llvm.loop.vectorize.width", i32 4}
!33 = !{!"llvm.loop.vectorize.enable", i1 true}

define void @pow_f32(float* nocapture %varray, float* nocapture readonly %exp) {		define void @pow_f32(float* nocapture %varray, float* nocapture readonly %exp) {
; CHECK-LABEL: @pow_f32		; CHECK-LABEL: @pow_f32
; CHECK-LABEL: vector.body		; CHECK-LABEL: vector.body
; CHECK: <4 x float> @_ZGVbN4vv___powf_finite		; CHECK: <4 x float> @_ZGVbN4vv___powf_finite
; CHECK: ret		; CHECK: ret
entry:		entry:
br label %for.body		br label %for.body

Show All 14 Lines	for.end: ; preds = %for.body
ret void		ret void
}		}

!41 = distinct !{!41, !42, !43}		!41 = distinct !{!41, !42, !43}
!42 = !{!"llvm.loop.vectorize.width", i32 4}		!42 = !{!"llvm.loop.vectorize.width", i32 4}
!43 = !{!"llvm.loop.vectorize.enable", i1 true}		!43 = !{!"llvm.loop.vectorize.enable", i1 true}

define void @pow_f64(double* nocapture %varray, double* nocapture readonly %exp) {		define void @pow_f64(double* nocapture %varray, double* nocapture readonly %exp) {
; CHECK-LABEL: @pow_f64		; CHECK-FORCE-VF4-LABEL: @pow_f64
; CHECK-LABEL: vector.body		; CHECK-FORCE-VF4-LABEL: vector.body
; CHECK: <4 x double> @_ZGVdN4vv___pow_finite		; CHECK-FORCE-VF4: <4 x double> @_ZGVdN4vv___pow_finite
; CHECK: ret		; CHECK-FORCE-VF4: ret
entry:		entry:
br label %for.body		br label %for.body

for.body: ; preds = %for.body, %entry		for.body: ; preds = %for.body, %entry
%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]		%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
%tmp = trunc i64 %indvars.iv to i32		%tmp = trunc i64 %indvars.iv to i32
%conv = sitofp i32 %tmp to double		%conv = sitofp i32 %tmp to double
%arrayidx = getelementptr inbounds double, double* %exp, i64 %indvars.iv		%arrayidx = getelementptr inbounds double, double* %exp, i64 %indvars.iv
%tmp1 = load double, double* %arrayidx, align 4		%tmp1 = load double, double* %arrayidx, align 4
%tmp2 = tail call fast double @__pow_finite(double %conv, double %tmp1)		%tmp2 = tail call fast double @__pow_finite(double %conv, double %tmp1)
%arrayidx2 = getelementptr inbounds double, double* %varray, i64 %indvars.iv		%arrayidx2 = getelementptr inbounds double, double* %varray, i64 %indvars.iv
store double %tmp2, double* %arrayidx2, align 4		store double %tmp2, double* %arrayidx2, align 4
%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1		%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
%exitcond = icmp eq i64 %indvars.iv.next, 1000		%exitcond = icmp eq i64 %indvars.iv.next, 1000
br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !51		br i1 %exitcond, label %for.end, label %for.body

for.end: ; preds = %for.body		for.end: ; preds = %for.body
ret void		ret void
}		}

!51 = distinct !{!51, !52, !53}
!52 = !{!"llvm.loop.vectorize.width", i32 4}
!53 = !{!"llvm.loop.vectorize.enable", i1 true}

declare float @__expf_finite(float) #0		declare float @__expf_finite(float) #0
declare double @__exp_finite(double) #0		declare double @__exp_finite(double) #0
declare float @__logf_finite(float) #0		declare float @__logf_finite(float) #0
declare double @__log_finite(double) #0		declare double @__log_finite(double) #0
declare float @__powf_finite(float, float) #0		declare float @__powf_finite(float, float) #0
declare double @__pow_finite(double, double) #0		declare double @__pow_finite(double, double) #0

llvm/test/Transforms/LoopVectorize/X86/libm-vector-calls.ll

; RUN: opt -vector-library=LIBMVEC-X86 -inject-tli-mappings -loop-vectorize -S < %s \| FileCheck %s		; RUN: opt -vector-library=LIBMVEC-X86 -inject-tli-mappings -loop-vectorize -S < %s \| FileCheck %s
		; RUN: opt -vector-library=LIBMVEC-X86 -inject-tli-mappings -loop-vectorize -force-vector-width=4 -S < %s \| FileCheck %s --check-prefix=CHECK-FORCE-VF4

target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"		target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"		target triple = "x86_64-unknown-linux-gnu"

define void @sin_f64(double* nocapture %varray) {		define void @sin_f64(double* nocapture %varray) {
; CHECK-LABEL: @sin_f64(		; CHECK-FORCE-VF4-LABEL: @sin_f64(
; CHECK-LABEL: vector.body		; CHECK-FORCE-VF4-LABEL: vector.body
; CHECK: [[TMP5:%.]] = call <4 x double> @_ZGVdN4v_sin(<4 x double> [[TMP4:%.]])		; CHECK-FORCE-VF4: [[TMP5:%.]] = call <4 x double> @_ZGVdN4v_sin(<4 x double> [[TMP4:%.]])
;		;
entry:		entry:
br label %for.body		br label %for.body

for.body:		for.body:
%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]		%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
%tmp = trunc i64 %iv to i32		%tmp = trunc i64 %iv to i32
%conv = sitofp i32 %tmp to double		%conv = sitofp i32 %tmp to double
%call = tail call double @sin(double %conv)		%call = tail call double @sin(double %conv)
%arrayidx = getelementptr inbounds double, double* %varray, i64 %iv		%arrayidx = getelementptr inbounds double, double* %varray, i64 %iv
store double %call, double* %arrayidx, align 4		store double %call, double* %arrayidx, align 4
%iv.next = add nuw nsw i64 %iv, 1		%iv.next = add nuw nsw i64 %iv, 1
%exitcond = icmp eq i64 %iv.next, 1000		%exitcond = icmp eq i64 %iv.next, 1000
br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !1		br i1 %exitcond, label %for.end, label %for.body

for.end:		for.end:
ret void		ret void
}		}

!1 = distinct !{!1, !2, !3}
!2 = !{!"llvm.loop.vectorize.width", i32 4}
!3 = !{!"llvm.loop.vectorize.enable", i1 true}


define void @sin_f32(float* nocapture %varray) {		define void @sin_f32(float* nocapture %varray) {
; CHECK-LABEL: @sin_f32(		; CHECK-LABEL: @sin_f32(
; CHECK-LABEL: vector.body		; CHECK-LABEL: vector.body
; CHECK: [[TMP5:%.]] = call <4 x float> @_ZGVbN4v_sinf(<4 x float> [[TMP4:%.]])		; CHECK: [[TMP5:%.]] = call <4 x float> @_ZGVbN4v_sinf(<4 x float> [[TMP4:%.]])
;		;
entry:		entry:
br label %for.body		br label %for.body

Show All 12 Lines	for.end:
ret void		ret void
}		}

!21 = distinct !{!21, !22, !23}		!21 = distinct !{!21, !22, !23}
!22 = !{!"llvm.loop.vectorize.width", i32 4}		!22 = !{!"llvm.loop.vectorize.width", i32 4}
!23 = !{!"llvm.loop.vectorize.enable", i1 true}		!23 = !{!"llvm.loop.vectorize.enable", i1 true}

define void @sin_f64_intrinsic(double* nocapture %varray) {		define void @sin_f64_intrinsic(double* nocapture %varray) {
; CHECK-LABEL: @sin_f64_intrinsic(		; CHECK-FORCE-VF4-LABEL: @sin_f64_intrinsic(
; CHECK-LABEL: vector.body		; CHECK-FORCE-VF4-LABEL: vector.body
; CHECK: [[TMP5:%.]] = call <4 x double> @_ZGVdN4v_sin(<4 x double> [[TMP4:%.]])		; CHECK-FORCE-VF4: [[TMP5:%.]] = call <4 x double> @_ZGVdN4v_sin(<4 x double> [[TMP4:%.]])
;		;
entry:		entry:
br label %for.body		br label %for.body

for.body:		for.body:
%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]		%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
%tmp = trunc i64 %iv to i32		%tmp = trunc i64 %iv to i32
%conv = sitofp i32 %tmp to double		%conv = sitofp i32 %tmp to double
%call = tail call double @llvm.sin.f64(double %conv)		%call = tail call double @llvm.sin.f64(double %conv)
%arrayidx = getelementptr inbounds double, double* %varray, i64 %iv		%arrayidx = getelementptr inbounds double, double* %varray, i64 %iv
store double %call, double* %arrayidx, align 4		store double %call, double* %arrayidx, align 4
%iv.next = add nuw nsw i64 %iv, 1		%iv.next = add nuw nsw i64 %iv, 1
%exitcond = icmp eq i64 %iv.next, 1000		%exitcond = icmp eq i64 %iv.next, 1000
br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !31		br i1 %exitcond, label %for.end, label %for.body

for.end:		for.end:
ret void		ret void
}		}

!31 = distinct !{!31, !32, !33}
!32 = !{!"llvm.loop.vectorize.width", i32 4}
!33 = !{!"llvm.loop.vectorize.enable", i1 true}

define void @sin_f32_intrinsic(float* nocapture %varray) {		define void @sin_f32_intrinsic(float* nocapture %varray) {
; CHECK-LABEL: @sin_f32_intrinsic(		; CHECK-LABEL: @sin_f32_intrinsic(
; CHECK-LABEL: vector.body		; CHECK-LABEL: vector.body
; CHECK: [[TMP5:%.]] = call <4 x float> @_ZGVbN4v_sinf(<4 x float> [[TMP4:%.]])		; CHECK: [[TMP5:%.]] = call <4 x float> @_ZGVbN4v_sinf(<4 x float> [[TMP4:%.]])
;		;
entry:		entry:
br label %for.body		br label %for.body

Show All 12 Lines	for.end:
ret void		ret void
}		}

!41 = distinct !{!41, !42, !43}		!41 = distinct !{!41, !42, !43}
!42 = !{!"llvm.loop.vectorize.width", i32 4}		!42 = !{!"llvm.loop.vectorize.width", i32 4}
!43 = !{!"llvm.loop.vectorize.enable", i1 true}		!43 = !{!"llvm.loop.vectorize.enable", i1 true}

define void @cos_f64(double* nocapture %varray) {		define void @cos_f64(double* nocapture %varray) {
; CHECK-LABEL: @cos_f64(		; CHECK-FORCE-VF4-LABEL: @cos_f64(
; CHECK-LABEL: vector.body		; CHECK-FORCE-VF4-LABEL: vector.body
; CHECK: [[TMP5:%.]] = call <4 x double> @_ZGVdN4v_cos(<4 x double> [[TMP4:%.]])		; CHECK-FORCE-VF4: [[TMP5:%.]] = call <4 x double> @_ZGVdN4v_cos(<4 x double> [[TMP4:%.]])
;		;
entry:		entry:
br label %for.body		br label %for.body

for.body:		for.body:
%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]		%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
%tmp = trunc i64 %iv to i32		%tmp = trunc i64 %iv to i32
%conv = sitofp i32 %tmp to double		%conv = sitofp i32 %tmp to double
%call = tail call double @cos(double %conv)		%call = tail call double @cos(double %conv)
%arrayidx = getelementptr inbounds double, double* %varray, i64 %iv		%arrayidx = getelementptr inbounds double, double* %varray, i64 %iv
store double %call, double* %arrayidx, align 4		store double %call, double* %arrayidx, align 4
%iv.next = add nuw nsw i64 %iv, 1		%iv.next = add nuw nsw i64 %iv, 1
%exitcond = icmp eq i64 %iv.next, 1000		%exitcond = icmp eq i64 %iv.next, 1000
br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !51		br i1 %exitcond, label %for.end, label %for.body

for.end:		for.end:
ret void		ret void
}		}

!51 = distinct !{!51, !52, !53}
!52 = !{!"llvm.loop.vectorize.width", i32 4}
!53 = !{!"llvm.loop.vectorize.enable", i1 true}

define void @cos_f32(float* nocapture %varray) {		define void @cos_f32(float* nocapture %varray) {
; CHECK-LABEL: @cos_f32(		; CHECK-LABEL: @cos_f32(
; CHECK-LABEL: vector.body		; CHECK-LABEL: vector.body
; CHECK: [[TMP5:%.]] = call <4 x float> @_ZGVbN4v_cosf(<4 x float> [[TMP4:%.]])		; CHECK: [[TMP5:%.]] = call <4 x float> @_ZGVbN4v_cosf(<4 x float> [[TMP4:%.]])
;		;
entry:		entry:
br label %for.body		br label %for.body

Show All 12 Lines	for.end:
ret void		ret void
}		}

!61 = distinct !{!61, !62, !63}		!61 = distinct !{!61, !62, !63}
!62 = !{!"llvm.loop.vectorize.width", i32 4}		!62 = !{!"llvm.loop.vectorize.width", i32 4}
!63 = !{!"llvm.loop.vectorize.enable", i1 true}		!63 = !{!"llvm.loop.vectorize.enable", i1 true}

define void @cos_f64_intrinsic(double* nocapture %varray) {		define void @cos_f64_intrinsic(double* nocapture %varray) {
; CHECK-LABEL: @cos_f64_intrinsic(		; CHECK-FORCE-VF4-LABEL: @cos_f64_intrinsic(
; CHECK-LABEL: vector.body		; CHECK-FORCE-VF4-LABEL: vector.body
; CHECK: [[TMP5:%.]] = call <4 x double> @_ZGVdN4v_cos(<4 x double> [[TMP4:%.]])		; CHECK-FORCE-VF4: [[TMP5:%.]] = call <4 x double> @_ZGVdN4v_cos(<4 x double> [[TMP4:%.]])
;		;
entry:		entry:
br label %for.body		br label %for.body

for.body:		for.body:
%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]		%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
%tmp = trunc i64 %iv to i32		%tmp = trunc i64 %iv to i32
%conv = sitofp i32 %tmp to double		%conv = sitofp i32 %tmp to double
%call = tail call double @llvm.cos.f64(double %conv)		%call = tail call double @llvm.cos.f64(double %conv)
%arrayidx = getelementptr inbounds double, double* %varray, i64 %iv		%arrayidx = getelementptr inbounds double, double* %varray, i64 %iv
store double %call, double* %arrayidx, align 4		store double %call, double* %arrayidx, align 4
%iv.next = add nuw nsw i64 %iv, 1		%iv.next = add nuw nsw i64 %iv, 1
%exitcond = icmp eq i64 %iv.next, 1000		%exitcond = icmp eq i64 %iv.next, 1000
br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !71		br i1 %exitcond, label %for.end, label %for.body

for.end:		for.end:
ret void		ret void
}		}

!71 = distinct !{!71, !72, !73}
fhahnUnsubmitted Not Done Reply Inline Actions Why those test changes? fhahn: Why those test changes?
c-rhodesAuthorUnsubmitted Not Done Reply Inline Actions Why those test changes? The maximum VF=2, it's computed as `WidestRegister / WidestType` where the the widest register is 128-bit. c-rhodes: > Why those test changes? The maximum VF=2, it's computed as `WidestRegister / WidestType`…
!72 = !{!"llvm.loop.vectorize.width", i32 4}
!73 = !{!"llvm.loop.vectorize.enable", i1 true}

define void @cos_f32_intrinsic(float* nocapture %varray) {		define void @cos_f32_intrinsic(float* nocapture %varray) {
; CHECK-LABEL: @cos_f32_intrinsic(		; CHECK-LABEL: @cos_f32_intrinsic(
; CHECK-LABEL: vector.body		; CHECK-LABEL: vector.body
; CHECK: [[TMP5:%.]] = call <4 x float> @_ZGVbN4v_cosf(<4 x float> [[TMP4:%.]])		; CHECK: [[TMP5:%.]] = call <4 x float> @_ZGVbN4v_cosf(<4 x float> [[TMP4:%.]])
;		;
entry:		entry:
br label %for.body		br label %for.body

▲ Show 20 Lines • Show All 170 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/X86/svml-calls-finite.ll

	; RUN: opt -vector-library=SVML -inject-tli-mappings -loop-vectorize -S < %s \| FileCheck %s			; RUN: opt -vector-library=SVML -inject-tli-mappings -loop-vectorize -S < %s \| FileCheck %s
				; RUN: opt -vector-library=SVML -inject-tli-mappings -loop-vectorize -force-vector-width=4 -S < %s \| FileCheck %s --check-prefix=CHECK-FORCE-VF4

	; Test to verify that when math headers are built with			; Test to verify that when math headers are built with
	; __FINITE_MATH_ONLY__ enabled, causing use of __<func>_finite			; __FINITE_MATH_ONLY__ enabled, causing use of __<func>_finite
	; function versions, vectorization can map these to vector versions.			; function versions, vectorization can map these to vector versions.

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	Show All 23 Lines

	!1 = distinct !{!1, !2, !3}			!1 = distinct !{!1, !2, !3}
	!2 = !{!"llvm.loop.vectorize.width", i32 4}			!2 = !{!"llvm.loop.vectorize.width", i32 4}
	!3 = !{!"llvm.loop.vectorize.enable", i1 true}			!3 = !{!"llvm.loop.vectorize.enable", i1 true}


	declare double @__exp_finite(double) #0			declare double @__exp_finite(double) #0

	; CHECK-LABEL: @exp_f64			; CHECK-FORCE-VF4-LABEL: @exp_f64
	; CHECK: <4 x double> @__svml_exp4			; CHECK-FORCE-VF4: <4 x double> @__svml_exp4
	; CHECK: ret			; CHECK-FORCE-VF4: ret
	define void @exp_f64(double* nocapture %varray) {			define void @exp_f64(double* nocapture %varray) {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %for.body, %entry			for.body: ; preds = %for.body, %entry
	%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]			%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
	%tmp = trunc i64 %indvars.iv to i32			%tmp = trunc i64 %indvars.iv to i32
	%conv = sitofp i32 %tmp to double			%conv = sitofp i32 %tmp to double
	%call = tail call fast double @__exp_finite(double %conv)			%call = tail call fast double @__exp_finite(double %conv)
	%arrayidx = getelementptr inbounds double, double* %varray, i64 %indvars.iv			%arrayidx = getelementptr inbounds double, double* %varray, i64 %indvars.iv
	store double %call, double* %arrayidx, align 4			store double %call, double* %arrayidx, align 4
	%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1			%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
	%exitcond = icmp eq i64 %indvars.iv.next, 1000			%exitcond = icmp eq i64 %indvars.iv.next, 1000
	br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !11			br i1 %exitcond, label %for.end, label %for.body

	for.end: ; preds = %for.body			for.end: ; preds = %for.body
	ret void			ret void
	}			}

	!11 = distinct !{!11, !12, !13}
	!12 = !{!"llvm.loop.vectorize.width", i32 4}
	!13 = !{!"llvm.loop.vectorize.enable", i1 true}




	declare float @__logf_finite(float) #0			declare float @__logf_finite(float) #0

	; CHECK-LABEL: @log_f32			; CHECK-LABEL: @log_f32
	; CHECK: <4 x float> @__svml_logf4			; CHECK: <4 x float> @__svml_logf4
	; CHECK: ret			; CHECK: ret
	define void @log_f32(float* nocapture %varray) {			define void @log_f32(float* nocapture %varray) {
	entry:			entry:
	br label %for.body			br label %for.body
	Show All 15 Lines

	!21 = distinct !{!21, !22, !23}			!21 = distinct !{!21, !22, !23}
	!22 = !{!"llvm.loop.vectorize.width", i32 4}			!22 = !{!"llvm.loop.vectorize.width", i32 4}
	!23 = !{!"llvm.loop.vectorize.enable", i1 true}			!23 = !{!"llvm.loop.vectorize.enable", i1 true}


	declare double @__log_finite(double) #0			declare double @__log_finite(double) #0

	; CHECK-LABEL: @log_f64			; CHECK-FORCE-VF4-LABEL: @log_f64
	; CHECK: <4 x double> @__svml_log4			; CHECK-FORCE-VF4: <4 x double> @__svml_log4
	; CHECK: ret			; CHECK-FORCE-VF4: ret
	define void @log_f64(double* nocapture %varray) {			define void @log_f64(double* nocapture %varray) {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %for.body, %entry			for.body: ; preds = %for.body, %entry
	%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]			%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
	%tmp = trunc i64 %indvars.iv to i32			%tmp = trunc i64 %indvars.iv to i32
	%conv = sitofp i32 %tmp to double			%conv = sitofp i32 %tmp to double
	%call = tail call fast double @__log_finite(double %conv)			%call = tail call fast double @__log_finite(double %conv)
	%arrayidx = getelementptr inbounds double, double* %varray, i64 %indvars.iv			%arrayidx = getelementptr inbounds double, double* %varray, i64 %indvars.iv
	store double %call, double* %arrayidx, align 4			store double %call, double* %arrayidx, align 4
	%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1			%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
	%exitcond = icmp eq i64 %indvars.iv.next, 1000			%exitcond = icmp eq i64 %indvars.iv.next, 1000
	br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !31			br i1 %exitcond, label %for.end, label %for.body

	for.end: ; preds = %for.body			for.end: ; preds = %for.body
	ret void			ret void
	}			}

	!31 = distinct !{!31, !32, !33}
	!32 = !{!"llvm.loop.vectorize.width", i32 4}
	!33 = !{!"llvm.loop.vectorize.enable", i1 true}


	declare float @__powf_finite(float, float) #0			declare float @__powf_finite(float, float) #0

	; CHECK-LABEL: @pow_f32			; CHECK-LABEL: @pow_f32
	; CHECK: <4 x float> @__svml_powf4			; CHECK: <4 x float> @__svml_powf4
	; CHECK: ret			; CHECK: ret
	define void @pow_f32(float* nocapture %varray, float* nocapture readonly %exp) {			define void @pow_f32(float* nocapture %varray, float* nocapture readonly %exp) {
	entry:			entry:
	br label %for.body			br label %for.body
	Show All 17 Lines

	!41 = distinct !{!41, !42, !43}			!41 = distinct !{!41, !42, !43}
	!42 = !{!"llvm.loop.vectorize.width", i32 4}			!42 = !{!"llvm.loop.vectorize.width", i32 4}
	!43 = !{!"llvm.loop.vectorize.enable", i1 true}			!43 = !{!"llvm.loop.vectorize.enable", i1 true}


	declare double @__pow_finite(double, double) #0			declare double @__pow_finite(double, double) #0

	; CHECK-LABEL: @pow_f64			; CHECK-FORCE-VF4-LABEL: @pow_f64
	; CHECK: <4 x double> @__svml_pow4			; CHECK-FORCE-VF4: <4 x double> @__svml_pow4
	; CHECK: ret			; CHECK-FORCE-VF4: ret
	define void @pow_f64(double* nocapture %varray, double* nocapture readonly %exp) {			define void @pow_f64(double* nocapture %varray, double* nocapture readonly %exp) {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %for.body, %entry			for.body: ; preds = %for.body, %entry
	%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]			%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
	%tmp = trunc i64 %indvars.iv to i32			%tmp = trunc i64 %indvars.iv to i32
	%conv = sitofp i32 %tmp to double			%conv = sitofp i32 %tmp to double
	%arrayidx = getelementptr inbounds double, double* %exp, i64 %indvars.iv			%arrayidx = getelementptr inbounds double, double* %exp, i64 %indvars.iv
	%tmp1 = load double, double* %arrayidx, align 4			%tmp1 = load double, double* %arrayidx, align 4
	%tmp2 = tail call fast double @__pow_finite(double %conv, double %tmp1)			%tmp2 = tail call fast double @__pow_finite(double %conv, double %tmp1)
	%arrayidx2 = getelementptr inbounds double, double* %varray, i64 %indvars.iv			%arrayidx2 = getelementptr inbounds double, double* %varray, i64 %indvars.iv
	store double %tmp2, double* %arrayidx2, align 4			store double %tmp2, double* %arrayidx2, align 4
	%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1			%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
	%exitcond = icmp eq i64 %indvars.iv.next, 1000			%exitcond = icmp eq i64 %indvars.iv.next, 1000
	br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !51			br i1 %exitcond, label %for.end, label %for.body

	for.end: ; preds = %for.body			for.end: ; preds = %for.body
	ret void			ret void
	}			}

	!51 = distinct !{!51, !52, !53}
	!52 = !{!"llvm.loop.vectorize.width", i32 4}
	!53 = !{!"llvm.loop.vectorize.enable", i1 true}

	declare float @__exp2f_finite(float) #0			declare float @__exp2f_finite(float) #0

	define void @exp2f_finite(float* nocapture %varray) {			define void @exp2f_finite(float* nocapture %varray) {
	; CHECK-LABEL: @exp2f_finite(			; CHECK-LABEL: @exp2f_finite(
	; CHECK: call <4 x float> @__svml_exp2f4(<4 x float> %{{.*}})			; CHECK: call <4 x float> @__svml_exp2f4(<4 x float> %{{.*}})
	; CHECK: ret void			; CHECK: ret void
	;			;
	entry:			entry:
	Show All 16 Lines

	!61 = distinct !{!61, !62, !63}			!61 = distinct !{!61, !62, !63}
	!62 = !{!"llvm.loop.vectorize.width", i32 4}			!62 = !{!"llvm.loop.vectorize.width", i32 4}
	!63 = !{!"llvm.loop.vectorize.enable", i1 true}			!63 = !{!"llvm.loop.vectorize.enable", i1 true}

	declare double @__exp2_finite(double) #0			declare double @__exp2_finite(double) #0

	define void @exp2_finite(double* nocapture %varray) {			define void @exp2_finite(double* nocapture %varray) {
	; CHECK-LABEL: @exp2_finite(			; CHECK-FORCE-VF4-LABEL: @exp2_finite(
	; CHECK: call <4 x double> @__svml_exp24(<4 x double> {{.*}})			; CHECK-FORCE-VF4: call <4 x double> @__svml_exp24(<4 x double> {{.*}})
	; CHECK: ret void			; CHECK-FORCE-VF4: ret void
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]			%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
	%tmp = trunc i64 %iv to i32			%tmp = trunc i64 %iv to i32
	%conv = sitofp i32 %tmp to double			%conv = sitofp i32 %tmp to double
	%call = tail call double @__exp2_finite(double %conv)			%call = tail call double @__exp2_finite(double %conv)
	%arrayidx = getelementptr inbounds double, double* %varray, i64 %iv			%arrayidx = getelementptr inbounds double, double* %varray, i64 %iv
	store double %call, double* %arrayidx, align 4			store double %call, double* %arrayidx, align 4
	%iv.next = add nuw nsw i64 %iv, 1			%iv.next = add nuw nsw i64 %iv, 1
	%exitcond = icmp eq i64 %iv.next, 1000			%exitcond = icmp eq i64 %iv.next, 1000
	br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !71			br i1 %exitcond, label %for.end, label %for.body

	for.end:			for.end:
	ret void			ret void
	}			}

	!71 = distinct !{!71, !72, !73}
	!72 = !{!"llvm.loop.vectorize.width", i32 4}
	!73 = !{!"llvm.loop.vectorize.enable", i1 true}

	declare float @__log2f_finite(float) #0			declare float @__log2f_finite(float) #0

	; CHECK-LABEL: @log2_f32			; CHECK-LABEL: @log2_f32
	; CHECK: <4 x float> @__svml_log2f4			; CHECK: <4 x float> @__svml_log2f4
	; CHECK: ret			; CHECK: ret
	define void @log2_f32(float* nocapture %varray) {			define void @log2_f32(float* nocapture %varray) {
	entry:			entry:
	br label %for.body			br label %for.body
	Show All 15 Lines

	!81 = distinct !{!21, !22, !23}			!81 = distinct !{!21, !22, !23}
	!82 = !{!"llvm.loop.vectorize.width", i32 4}			!82 = !{!"llvm.loop.vectorize.width", i32 4}
	!83 = !{!"llvm.loop.vectorize.enable", i1 true}			!83 = !{!"llvm.loop.vectorize.enable", i1 true}


	declare double @__log2_finite(double) #0			declare double @__log2_finite(double) #0

	; CHECK-LABEL: @log2_f64			; CHECK-FORCE-VF4-LABEL: @log2_f64
	; CHECK: <4 x double> @__svml_log24			; CHECK-FORCE-VF4: <4 x double> @__svml_log24
	; CHECK: ret			; CHECK-FORCE-VF4: ret
	define void @log2_f64(double* nocapture %varray) {			define void @log2_f64(double* nocapture %varray) {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %for.body, %entry			for.body: ; preds = %for.body, %entry
	%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]			%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
	%tmp = trunc i64 %indvars.iv to i32			%tmp = trunc i64 %indvars.iv to i32
	%conv = sitofp i32 %tmp to double			%conv = sitofp i32 %tmp to double
	%call = tail call fast double @__log2_finite(double %conv)			%call = tail call fast double @__log2_finite(double %conv)
	%arrayidx = getelementptr inbounds double, double* %varray, i64 %indvars.iv			%arrayidx = getelementptr inbounds double, double* %varray, i64 %indvars.iv
	store double %call, double* %arrayidx, align 4			store double %call, double* %arrayidx, align 4
	%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1			%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
	%exitcond = icmp eq i64 %indvars.iv.next, 1000			%exitcond = icmp eq i64 %indvars.iv.next, 1000
	br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !31			br i1 %exitcond, label %for.end, label %for.body

	for.end: ; preds = %for.body			for.end: ; preds = %for.body
	ret void			ret void
	}			}

	!91 = distinct !{!31, !32, !33}
	!92 = !{!"llvm.loop.vectorize.width", i32 4}
	!93 = !{!"llvm.loop.vectorize.enable", i1 true}

	declare float @__log10f_finite(float) #0			declare float @__log10f_finite(float) #0

	; CHECK-LABEL: @log10_f32			; CHECK-LABEL: @log10_f32
	; CHECK: <4 x float> @__svml_log10f4			; CHECK: <4 x float> @__svml_log10f4
	; CHECK: ret			; CHECK: ret
	define void @log10_f32(float* nocapture %varray) {			define void @log10_f32(float* nocapture %varray) {
	entry:			entry:
	br label %for.body			br label %for.body
	Show All 15 Lines

	!101 = distinct !{!21, !22, !23}			!101 = distinct !{!21, !22, !23}
	!102 = !{!"llvm.loop.vectorize.width", i32 4}			!102 = !{!"llvm.loop.vectorize.width", i32 4}
	!103 = !{!"llvm.loop.vectorize.enable", i1 true}			!103 = !{!"llvm.loop.vectorize.enable", i1 true}


	declare double @__log10_finite(double) #0			declare double @__log10_finite(double) #0

	; CHECK-LABEL: @log10_f64			; CHECK-FORCE-VF4-LABEL: @log10_f64
	; CHECK: <4 x double> @__svml_log104			; CHECK-FORCE-VF4: <4 x double> @__svml_log104
	; CHECK: ret			; CHECK-FORCE-VF4: ret
	define void @log10_f64(double* nocapture %varray) {			define void @log10_f64(double* nocapture %varray) {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %for.body, %entry			for.body: ; preds = %for.body, %entry
	%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]			%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
	%tmp = trunc i64 %indvars.iv to i32			%tmp = trunc i64 %indvars.iv to i32
	%conv = sitofp i32 %tmp to double			%conv = sitofp i32 %tmp to double
	%call = tail call fast double @__log10_finite(double %conv)			%call = tail call fast double @__log10_finite(double %conv)
	%arrayidx = getelementptr inbounds double, double* %varray, i64 %indvars.iv			%arrayidx = getelementptr inbounds double, double* %varray, i64 %indvars.iv
	store double %call, double* %arrayidx, align 4			store double %call, double* %arrayidx, align 4
	%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1			%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
	%exitcond = icmp eq i64 %indvars.iv.next, 1000			%exitcond = icmp eq i64 %indvars.iv.next, 1000
	br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !31			br i1 %exitcond, label %for.end, label %for.body

	for.end: ; preds = %for.body			for.end: ; preds = %for.body
	ret void			ret void
	}			}

	!111 = distinct !{!31, !32, !33}
	!112 = !{!"llvm.loop.vectorize.width", i32 4}
	!113 = !{!"llvm.loop.vectorize.enable", i1 true}

	declare float @__sqrtf_finite(float) #0			declare float @__sqrtf_finite(float) #0

	; CHECK-LABEL: @sqrt_f32			; CHECK-LABEL: @sqrt_f32
	; CHECK: <4 x float> @__svml_sqrtf4			; CHECK: <4 x float> @__svml_sqrtf4
	; CHECK: ret			; CHECK: ret
	define void @sqrt_f32(float* nocapture %varray) {			define void @sqrt_f32(float* nocapture %varray) {
	entry:			entry:
	br label %for.body			br label %for.body
	Show All 15 Lines

	!121 = distinct !{!21, !22, !23}			!121 = distinct !{!21, !22, !23}
	!122 = !{!"llvm.loop.vectorize.width", i32 4}			!122 = !{!"llvm.loop.vectorize.width", i32 4}
	!123 = !{!"llvm.loop.vectorize.enable", i1 true}			!123 = !{!"llvm.loop.vectorize.enable", i1 true}


	declare double @__sqrt_finite(double) #0			declare double @__sqrt_finite(double) #0

	; CHECK-LABEL: @sqrt_f64			; CHECK-FORCE-VF4-LABEL: @sqrt_f64
	; CHECK: <4 x double> @__svml_sqrt4			; CHECK-FORCE-VF4: <4 x double> @__svml_sqrt4
	; CHECK: ret			; CHECK-FORCE-VF4: ret
	define void @sqrt_f64(double* nocapture %varray) {			define void @sqrt_f64(double* nocapture %varray) {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %for.body, %entry			for.body: ; preds = %for.body, %entry
	%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]			%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
	%tmp = trunc i64 %indvars.iv to i32			%tmp = trunc i64 %indvars.iv to i32
	%conv = sitofp i32 %tmp to double			%conv = sitofp i32 %tmp to double
	%call = tail call fast double @__sqrt_finite(double %conv)			%call = tail call fast double @__sqrt_finite(double %conv)
	%arrayidx = getelementptr inbounds double, double* %varray, i64 %indvars.iv			%arrayidx = getelementptr inbounds double, double* %varray, i64 %indvars.iv
	store double %call, double* %arrayidx, align 4			store double %call, double* %arrayidx, align 4
	%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1			%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
	%exitcond = icmp eq i64 %indvars.iv.next, 1000			%exitcond = icmp eq i64 %indvars.iv.next, 1000
	br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !31			br i1 %exitcond, label %for.end, label %for.body

	for.end: ; preds = %for.body			for.end: ; preds = %for.body
	ret void			ret void
	}			}

	!131 = distinct !{!31, !32, !33}
	!132 = !{!"llvm.loop.vectorize.width", i32 4}
	!133 = !{!"llvm.loop.vectorize.enable", i1 true}

llvm/test/Transforms/LoopVectorize/metadata-width.ll

	; RUN: opt < %s -loop-vectorize -force-vector-interleave=1 -dce -instcombine -S \| FileCheck %s			; RUN: opt < %s -loop-vectorize -force-vector-width=8 -force-vector-interleave=1 -dce -instcombine -S \| FileCheck %s

	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"			target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"

	; CHECK-LABEL: @test1(			; CHECK-LABEL: @test1(
	; CHECK: store <8 x i32>			; CHECK: store <8 x i32>
	; CHECK: ret void			; CHECK: ret void
	define void @test1(i32* nocapture %a, i32 %n) #0 {			define void @test1(i32* nocapture %a, i32 %n) #0 {
	entry:			entry:
	%cmp4 = icmp sgt i32 %n, 0			%cmp4 = icmp sgt i32 %n, 0
	br i1 %cmp4, label %for.body, label %for.end			br i1 %cmp4, label %for.body, label %for.end

	for.body: ; preds = %entry, %for.body			for.body: ; preds = %entry, %for.body
	%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ]			%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ]
	%arrayidx = getelementptr inbounds i32, i32* %a, i64 %indvars.iv			%arrayidx = getelementptr inbounds i32, i32* %a, i64 %indvars.iv
	%0 = trunc i64 %indvars.iv to i32			%0 = trunc i64 %indvars.iv to i32
	store i32 %0, i32* %arrayidx, align 4			store i32 %0, i32* %arrayidx, align 4
	%indvars.iv.next = add i64 %indvars.iv, 1			%indvars.iv.next = add i64 %indvars.iv, 1
	%lftr.wideiv = trunc i64 %indvars.iv.next to i32			%lftr.wideiv = trunc i64 %indvars.iv.next to i32
	%exitcond = icmp eq i32 %lftr.wideiv, %n			%exitcond = icmp eq i32 %lftr.wideiv, %n
	br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !0			br i1 %exitcond, label %for.end, label %for.body

	for.end: ; preds = %for.body, %entry			for.end: ; preds = %for.body, %entry
	ret void			ret void
	}			}

	attributes #0 = { nounwind uwtable "less-precise-fpmad"="false" "frame-pointer"="none" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "unsafe-fp-math"="false" "use-soft-float"="false" }			attributes #0 = { nounwind uwtable "less-precise-fpmad"="false" "frame-pointer"="none" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "unsafe-fp-math"="false" "use-soft-float"="false" }

	!0 = !{!0, !1}
	!1 = !{!"llvm.loop.vectorize.width", i32 8}

llvm/test/Transforms/LoopVectorize/preserve-dbg-loc-and-loop-metadata.ll

	; RUN: opt < %s -loop-vectorize -S 2>&1 \| FileCheck %s			; RUN: opt < %s -loop-vectorize -force-vector-width=4 -S 2>&1 \| FileCheck %s
	; RUN: opt < %s -debugify -loop-vectorize -S \| FileCheck %s -check-prefix DEBUGLOC			; RUN: opt < %s -debugify -loop-vectorize -force-vector-width=4 -S \| FileCheck %s -check-prefix DEBUGLOC
	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

	; This test makes sure we don't duplicate the loop vectorizer's metadata			; This test makes sure we don't duplicate the loop vectorizer's metadata
	; while marking them as already vectorized (by setting width = 1), even			; while marking them as already vectorized (by setting width = 1), even
	; at lower optimization levels, where no extra cleanup is done			; at lower optimization levels, where no extra cleanup is done

	; DEBUGLOC-LABEL: define void @_Z3fooPf(			; DEBUGLOC-LABEL: define void @_Z3fooPf(
	; Check that the phi to resume the scalar part of the loop			; Check that the phi to resume the scalar part of the loop
	Show All 28 Lines

llvm/test/Transforms/LoopVectorize/runtime-check.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -loop-vectorize -force-vector-interleave=1 -force-vector-width=4 -dce -instcombine -S \| FileCheck %s		; RUN: opt < %s -loop-vectorize -force-vector-interleave=1 -force-vector-width=4 -dce -instcombine -S \| FileCheck %s
; RUN: opt < %s -loop-vectorize -disable-basic-aa -S -pass-remarks-analysis='loop-vectorize' 2>&1 \| FileCheck %s -check-prefix=FORCED_OPTSIZE		; RUN: opt < %s -loop-vectorize -force-vector-width=2 -disable-basic-aa -S -pass-remarks-analysis='loop-vectorize' 2>&1 \| FileCheck %s -check-prefix=FORCED_OPTSIZE

target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"		target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"

; Make sure we vectorize this loop:		; Make sure we vectorize this loop:
; int foo(float a, float b, int n) {		; int foo(float a, float b, int n) {
; for (int i=0; i<n; ++i)		; for (int i=0; i<n; ++i)
; a[i] = b[i] * 3;		; a[i] = b[i] * 3;
; }		; }
▲ Show 20 Lines • Show All 166 Lines • ▼ Show 20 Lines	for.body:
%0 = load i64, i64* %arrayidx, align 8		%0 = load i64, i64* %arrayidx, align 8
%arrayidx2 = getelementptr inbounds i64, i64* %y_p, i64 %indvars.iv		%arrayidx2 = getelementptr inbounds i64, i64* %y_p, i64 %indvars.iv
%1 = load i64, i64* %arrayidx2, align 8		%1 = load i64, i64* %arrayidx2, align 8
%add = add nsw i64 %1, %0		%add = add nsw i64 %1, %0
%arrayidx4 = getelementptr inbounds i64, i64* %z_p, i64 %indvars.iv		%arrayidx4 = getelementptr inbounds i64, i64* %z_p, i64 %indvars.iv
store i64 %add, i64* %arrayidx4, align 8		store i64 %add, i64* %arrayidx4, align 8
%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1		%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
%exitcond = icmp eq i64 %indvars.iv.next, 128		%exitcond = icmp eq i64 %indvars.iv.next, 128
br i1 %exitcond, label %for.cond.cleanup, label %for.body, !llvm.loop !12		br i1 %exitcond, label %for.cond.cleanup, label %for.body
}		}

!llvm.module.flags = !{!0, !1}		!llvm.module.flags = !{!0, !1}
!llvm.dbg.cu = !{!9}		!llvm.dbg.cu = !{!9}
!0 = !{i32 2, !"Dwarf Version", i32 4}		!0 = !{i32 2, !"Dwarf Version", i32 4}
!1 = !{i32 2, !"Debug Info Version", i32 3}		!1 = !{i32 2, !"Debug Info Version", i32 3}

!2 = !{}		!2 = !{}
!3 = !DISubroutineType(types: !2)		!3 = !DISubroutineType(types: !2)
!4 = !DIFile(filename: "test.cpp", directory: "/tmp")		!4 = !DIFile(filename: "test.cpp", directory: "/tmp")
!5 = distinct !DISubprogram(name: "foo", scope: !4, file: !4, line: 99, type: !3, isLocal: false, isDefinition: true, scopeLine: 100, flags: DIFlagPrototyped, isOptimized: false, unit: !9, retainedNodes: !2)		!5 = distinct !DISubprogram(name: "foo", scope: !4, file: !4, line: 99, type: !3, isLocal: false, isDefinition: true, scopeLine: 100, flags: DIFlagPrototyped, isOptimized: false, unit: !9, retainedNodes: !2)
!6 = !DILocation(line: 100, column: 1, scope: !5)		!6 = !DILocation(line: 100, column: 1, scope: !5)
!7 = !DILocation(line: 101, column: 1, scope: !5)		!7 = !DILocation(line: 101, column: 1, scope: !5)
!8 = !DILocation(line: 102, column: 1, scope: !5)		!8 = !DILocation(line: 102, column: 1, scope: !5)
!9 = distinct !DICompileUnit(language: DW_LANG_C99, producer: "clang",		!9 = distinct !DICompileUnit(language: DW_LANG_C99, producer: "clang",
file: !10,		file: !10,
isOptimized: true, flags: "-O2",		isOptimized: true, flags: "-O2",
splitDebugFilename: "abc.debug", emissionKind: 2)		splitDebugFilename: "abc.debug", emissionKind: 2)
!10 = !DIFile(filename: "path/to/file", directory: "/path/to/dir")		!10 = !DIFile(filename: "path/to/file", directory: "/path/to/dir")
!11 = !{i32 2, !"Debug Info Version", i32 3}		!11 = !{i32 2, !"Debug Info Version", i32 3}
!12 = distinct !{!12, !13, !14}
!13 = !{!"llvm.loop.vectorize.width", i32 2}
!14 = !{!"llvm.loop.vectorize.enable", i1 true}

llvm/test/Transforms/LoopVectorize/unsafe-vf-remark.ll

This file was added.

				; RUN: opt -loop-vectorize -debug-only=loop-vectorize -disable-output -S < %s 2>&1 \| FileCheck %s

				; Make sure that we report unsafe user specified vectorization factor.

				fhahnUnsubmitted Not Done Reply Inline Actions It might also be interesting to add a test cases where the user provided VF is large (say 64), the max legal width is something like 32 and the profitable width selected by the cost model is something smaller (might be easier if this is a target-specific test for a specific architecture ). fhahn: It might also be interesting to add a test cases where the user provided VF is large (say 64)…
				c-rhodesAuthorUnsubmitted Not Done Reply Inline Actions It might also be interesting to add a test cases where the user provided VF is large (say 64), the max legal width is something like 32 and the profitable width selected by the cost model is something smaller (might be easier if this is a target-specific test for a specific architecture ). Is this loop like what you had in mind? void foo(int a, int b, int N) { #pragma clang loop vectorize(enable) vectorize_width(64) for (int i=0; i<N; ++i) { a[i + 32] = a[i] / b[i]; } } When compiling with: ./bin/clang -S -emit-llvm -o - ../dependence.c -O2 -mllvm -debug-only=loop-vectorize,loop-accesses -target aarch64-linux-gnu The user VF of 64 is unsafe so it's clamped to 32 and the vector loop of width 32 is more expensive (cost 13) than the scalar loop (cost 10), although the vectorization is forced so the VF=32 is still chosen. c-rhodes: > It might also be interesting to add a test cases where the user provided VF is large (say 64)…
				c-rhodesAuthorUnsubmitted Not Done Reply Inline Actions It might also be interesting to add a test cases where the user provided VF is large (say 64), the max legal width is something like 32 and the profitable width selected by the cost model is something smaller (might be easier if this is a target-specific test for a specific architecture ). I might be missing something but I don't think this test will add any value, although I'm not comfortable landing this patch until this is resolved. I suggest landing this patch as is, @fhahn unless you feel strongly about it? c-rhodes: > It might also be interesting to add a test cases where the user provided VF is large (say 64)…
				fhahnUnsubmitted Not Done Reply Inline Actions It might also be interesting to add a test cases where the user provided VF is large (say 64), the max legal width is something like 32 and the profitable width selected by the cost model is something smaller (might be easier if this is a target-specific test for a specific architecture ). I might be missing something but I don't think this test will add any value, although I'm not comfortable landing this patch until this is resolved. I suggest landing this patch as is, @fhahn unless you feel strongly about it? Agreed, such a test doesn't really add much. What I was suggesting was one where the cost model does pick a different VF than the maximum safe one. This is the case that should be handled differently with the current version compared to the first version. I was thinking about a test like the one below. It requests VF = 32, but only 16 is safe. When built with `opt -loop-vectorize -mtriple=arm64-apple-iphoneos`, the cost model should pick VF = 2 instead of the higher alternatives. define void @test(i64* nocapture %a, i64* nocapture readonly %b) { entry: br label %loop.header loop.header: %iv = phi i64 [ 0, %entry ], [ %iv.next, %latch ] %arrayidx = getelementptr inbounds i64, i64* %a, i64 %iv %0 = load i64, i64* %arrayidx, align 4 %arrayidx2 = getelementptr inbounds i64, i64* %b, i64 %iv %1 = load i64, i64* %arrayidx2, align 4 %add = add nsw i64 %1, %0 %2 = add nuw nsw i64 %iv, 16 %arrayidx5 = getelementptr inbounds i64, i64* %a, i64 %2 %c = icmp eq i64 %1, 120 br i1 %c, label %then, label %latch then: store i64 %add, i64* %arrayidx5, align 4 br label %latch latch: %iv.next = add nuw nsw i64 %iv, 1 %exitcond.not = icmp eq i64 %iv.next, 1024 br i1 %exitcond.not, label %exit, label %loop.header, !llvm.loop !0 exit: ret void } !0 = !{!0, !1, !2} !1 = !{!"llvm.loop.vectorize.width", i64 32} !2 = !{!"llvm.loop.vectorize.enable", i1 true} fhahn: >> It might also be interesting to add a test cases where the user provided VF is large (say…
				c-rhodesAuthorUnsubmitted Done Reply Inline Actions I was thinking about a test like the one below. It requests VF = 32, but only 16 is safe. When built with opt -loop-vectorize -mtriple=arm64-apple-iphoneos, the cost model should pick VF = 2 instead of the higher alternatives. That makes sense cheers, I've added the test c-rhodes: > I was thinking about a test like the one below. It requests VF = 32, but only 16 is safe.
				target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"

				; void foo(int a, int b, int N) {
				; #pragma clang loop vectorize(enable) vectorize_width(4)
				; for (int i=0; i<N; ++i) {
				; a[i + 2] = a[i] + b[i];
				; }
				; }

				; CHECK: LV: User VF=4 is unsafe, using maximum safe VF=1. This can be overridden with '-force-vector-width=X'.
				define void @foo(i32* nocapture %a, i32* nocapture readonly %b, i32 %N) {
				entry:
				%cmp12 = icmp sgt i32 %N, 0
				br i1 %cmp12, label %for.body.preheader, label %for.cond.cleanup

				for.body.preheader: ; preds = %entry
				%wide.trip.count = zext i32 %N to i64
				fhahnUnsubmitted Not Done Reply Inline Actions nit: Is this required? LV will add a minimum iteration check anyways, so this could be dropped to make the IR more compact? Alternatively we could also use a constant trip count. fhahn: nit: Is this required? LV will add a minimum iteration check anyways, so this could be dropped…
				c-rhodesAuthorUnsubmitted Not Done Reply Inline Actions nit: Is this required? LV will add a minimum iteration check anyways, so this could be dropped to make the IR more compact? Alternatively we could also use a constant trip count. Changed it to use a constant trip count c-rhodes: > nit: Is this required? LV will add a minimum iteration check anyways, so this could be…
				br label %for.body

				for.cond.cleanup: ; preds = %for.body, %entry
				ret void
				fhahnUnsubmitted Not Done Reply Inline Actions nit: Is this required? We could just change `%N` to be a `i64` to make the IR more compact. fhahn: nit: Is this required? We could just change `%N` to be a `i64` to make the IR more compact.

				for.body: ; preds = %for.body.preheader, %for.body
				%indvars.iv = phi i64 [ 0, %for.body.preheader ], [ %indvars.iv.next, %for.body ]
				%arrayidx = getelementptr inbounds i32, i32* %a, i64 %indvars.iv
				%0 = load i32, i32* %arrayidx, align 4
				%arrayidx2 = getelementptr inbounds i32, i32* %b, i64 %indvars.iv
				%1 = load i32, i32* %arrayidx2, align 4
				fhahnUnsubmitted Done Reply Inline Actions nit: can we drop the `indvars.` prefix to make the IR slightly more readable? fhahn: nit: can we drop the `indvars.` prefix to make the IR slightly more readable?
				%add = add nsw i32 %1, %0
				%2 = add nuw nsw i64 %indvars.iv, 2
				%arrayidx5 = getelementptr inbounds i32, i32* %a, i64 %2
				store i32 %add, i32* %arrayidx5, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond.not = icmp eq i64 %indvars.iv.next, %wide.trip.count
				br i1 %exitcond.not, label %for.cond.cleanup, label %for.body, !llvm.loop !0
				}

				!0 = !{!0, !1, !2}
				!1 = !{!"llvm.loop.vectorize.width", i32 4}
				!2 = !{!"llvm.loop.vectorize.enable", i1 true}