This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
4/19
LoopVectorize.cpp
-
test/Transforms/LoopVectorize/
-
Transforms/
-
LoopVectorize/
2/9
unsafe-vf-remark.ll

Differential D90687

[LV] Clamp VF hint when unsafe
ClosedPublic

Authored by c-rhodes on Nov 3 2020, 8:07 AM.

Download Raw Diff

Details

Reviewers

sdesmalen
efriedma
fhahn
mkuper
RKSimon
spatel
Meinersbur

Commits

rGcba4accda08f: [LV] Clamp VF hint when unsafe

Summary

In the following loop the dependence distance is 2 and can only be
vectorized if the vector length is no larger than this.

void foo(int *a, int *b, int N) {
  #pragma clang loop vectorize(enable) vectorize_width(4)
  for (int i=0; i<N; ++i) {
    a[i + 2] = a[i] + b[i];
  }
}

However, when specifying a VF of 4 via a loop hint this loop is
vectorized. According to [1][2], loop hints are ignored if the
optimization is not safe to apply.

This patch introduces a check to bail of vectorization if the user
specified VF is greater than the maximum feasible VF, unless explicitly
forced with '-force-vector-width=X'.

[1] https://llvm.org/docs/LangRef.html#llvm-loop-vectorize-and-llvm-loop-interleave
[2] https://clang.llvm.org/docs/LanguageExtensions.html#extensions-for-loop-hint-optimizations

Diff Detail

Event Timeline

c-rhodes created this revision.Nov 3 2020, 8:07 AM

Herald added a project: Restricted Project. · View Herald TranscriptNov 3 2020, 8:07 AM

Herald added a subscriber: hiraditya. · View Herald Transcript

c-rhodes requested review of this revision.Nov 3 2020, 8:07 AM

Harbormaster completed remote builds in B77412: Diff 302584.Nov 3 2020, 9:34 AM

dmgreen added a reviewer: Meinersbur.Nov 3 2020, 9:35 AM

dmgreen added a subscriber: dmgreen.

dmgreen added inline comments.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5238	I don't think computeFeasibleMaxVF is the maximum safe width. It does a lot of things and is largely based on the backend vector widths. I'm guessing that's why so many tests are changing. It should probably be based on Legal.getMaxSafeRegisterWidth()?

fhahn added inline comments.Nov 3 2020, 9:46 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5238	It should probably be based on Legal.getMaxSafeRegisterWidth()? computeFeasibleMaxVF should only return legal vectorization factors (using getMaxSafeRegisterWidht) I think. Given that we are free to ignore the hint, if it is not useful, why not just use the largest safe vectorization factors instead of bailing out? Unfortunately the handling of UserVF is a bit of a mess, but I think it might be preferable to clamp the UserVF to the maximum vectorization factor in the caller of computeMaxVF where UserVF is actually used.
llvm/test/Transforms/LoopVectorize/X86/libm-vector-calls.ll
192 ↗	(On Diff #302584)	Why those test changes?

#pragma clang loop vectorize_witdh(..) ignoring safety checks is indeed bad. Instead of not vectorizing at all in this case, did you consider using min(UserVF,FeasibleVF) instead?

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5238	`Legal.getMaxSafeRegisterWidth()` is called within `computeFeasibleMaxVF`. With `computeFeasibleMaxVF` considering additional architecture concerns, can these be just ignored?
5238	With `computeFeasibleMaxVF` already called later, could you store its result for both uses? I suggest to move `UserVF ? UserVF : computeFeasibleMaxVF(TC)` which is duplicated multiple times below before this condition. Such as: auto MaxVF = computeFeasibleMaxVF(TC); if (UserVF) { if (UserVF > MaxVF) { ... } MaxVF = UserFC; }

c-rhodes added inline comments.Nov 3 2020, 12:48 PM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5238	It should probably be based on Legal.getMaxSafeRegisterWidth()? computeFeasibleMaxVF should only return legal vectorization factors (using getMaxSafeRegisterWidht) I think. Given that we are free to ignore the hint, if it is not useful, why not just use the largest safe vectorization factors instead of bailing out? Yeah that's right, `computeFeasibleMaxVF` is based on `Legal->getMaxSafeRegisterWidth()` and bounded by the widest register according to the TTI. Unfortunately the handling of UserVF is a bit of a mess, but I think it might be preferable to clamp the UserVF to the maximum vectorization factor in the caller of computeMaxVF where UserVF is actually used. Sure, clamping sounds good to me and @Meinersbur suggested this as well, I'll update the patch.
5238	I don't think computeFeasibleMaxVF is the maximum safe width. It does a lot of things and is largely based on the backend vector widths. I'm guessing that's why so many tests are changing. I did find for the X86 tests that have changed it was because of a backend vector width of 128-bit, I expect these changes will still be required when clamping the UserVF to the maximum vectorization factor as suggested.
llvm/test/Transforms/LoopVectorize/X86/libm-vector-calls.ll
192 ↗	(On Diff #302584)	Why those test changes? The maximum VF=2, it's computed as `WidestRegister / WidestType` where the the widest register is 128-bit.

Rather than bail out of vectorization if UserVF > MaxVF, clamp UserVF to the maximum feasible VF.

dmgreen added inline comments.Nov 4 2020, 10:56 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5237	Clamping sounds good. But I think that there is a difference between the "maximum safe vector width" and "the maximum register bitwidth". The maximum safe distance should be a legality constraint that doesn't depend on the size of the backend registers. It's perhaps easier to show with examples. This code should produce the same thing (the same width vectors) with and without this patch, I believe, as there is nothing "unsafe" about vectorizing at a higher bitwidth than the vector registers: https://godbolt.org/z/ePdv3K (I am trying to not use the term "legal", as it has too many meanings. There is a difference between "legal to vectorize" (as the safety constraints in LoopVectorizationLegality) and "legal vector widths" which just means that the llvm-ir vector can be lowered to a single vector register and I don't think should be very relevant here).

Address @dmgreen's comments, response in thread.

c-rhodes added inline comments.Nov 5 2020, 7:40 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5237	But I think that there is a difference between the "maximum safe vector width" and "the maximum register bitwidth". The maximum safe distance should be a legality constraint that doesn't depend on the size of the backend registers. So only considering the VF computed by LAA for dependencies rather the backend register widths, I think that makes sense. Whilst looking into this I discovered the example I gave in the commit message doesn't actually vectorize when only specifying `-force-vector-width=4` and no loop hint: void foo(int a, int b, int N) { for (int i=0; i<N; ++i) { a[i + 2] = a[i] + b[i]; } } clang -S -emit-llvm -o - ../dependence.c -O3 -Rpass-missed=loop-vectorize -mllvm -force-vector-width=4 ../dependence.c:2:3: remark: loop not vectorized [-Rpass-missed=loop-vectorize] for (int i=0; i<N; ++i) { ^ It's a bit of a mess how UserVF is handled, it seems LAA only knows about the UserVF specified by `-force-vector-width`. With the pragma LAA is operating on VF=2 and LV on VF=4. What's also interesting is the loop metadata takes precedence over the flag since in `LoopVectorizationLegality` the vector width is initialized with the flag then populated with loop metadata, so the VF according to LAA would come from the flag and in LV the loop hint, assuming both were specified by the user that is. I've updated the patch such that `computeFeasibleMaxVF` now takes `UserVF` and clamps this to VF based on `Legal->getMaxSafeRegisterWidth();` if it exceeds it. This simplifies the patch a fair bit since no existing tests change, although I think the handling of `UserVF` needs refactoring. Now I know `-force-vector-width` won't apply unless it's safe, the loop hint and flag feel semantically equivalent but I'm not sure if there's anything I'm missing. Any thoughts?

sdesmalen added inline comments.Nov 5 2020, 7:46 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5237	That's a good point @dmgreen, `MaxSafeRegisterWidth` and corresponding `getMaxSafeRegisterWidth` in LoopVectorize.cpp and LoopAccessAnalysis.cpp are actually misnomers because it isn't the maximum safe register width that is calculated, but rather the maximum safe vector bitwidth. Without this patch, this example is vectorized with VF=8 when compiling for Neon (128bit vectors): void foo(int a, int b, int c, int N) { #pragma clang loop vectorize(enable) vectorize_width(8) for (int i=0; i<N; ++i) { a[i + 16] = a[i] + b[i]; } } Where with this patch, it is now vectorized with VF=4. It seems like the limitation with regards to the actual physical vector register can and should be removed.

sdesmalen added inline comments.Nov 5 2020, 7:53 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5237	the loop hint and flag feel semantically equivalent but I'm not sure if there's anything I'm missing. Any thoughts? To me they read as slightly different things, but it's good to clear their semantics up. I interpret: `-force-vector-width` as "vectorize it with this width and this width only, and fail if not legal". The `LoopHint` as "try to vectorize with this width but if not legal, feel free to ignore the hint and pick a different width". At least, that's what the implementation does today and I assumed that was deliberate. Maybe someone else can clarify that?

Meinersbur added inline comments.Nov 9 2020, 4:26 PM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5237	`-force-vector-width` is internal and should be used for regression tests only. Whether it applies a different vector width or does not vectorize at all, the regression tests should fail. Because it is internal, we it shouldn't matter which semantics we chose. Personally, I'd chose the LoopHint semantics to reduce the number of code paths.
5340	[style] LLVM coding style prefers explicit types in declarations.
5344–5346	Should there be (also) a diagnostic warning (-Rpass) to inform the user that the value has been clamped? (Or maybe there is already and I don't see where it is done)

fhahn added inline comments.Nov 10 2020, 2:35 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5237	I've updated the patch such that computeFeasibleMaxVF now takes UserVF and clamps this to VF based on Legal->getMaxSafeRegisterWidth(); if it exceeds it. This simplifies the patch a fair bit since no existing tests change, although I think the handling of UserVF needs refactoring. Now I know -force-vector-width won't apply unless it's safe, the loop hint and flag feel semantically equivalent but I'm not sure if there's anything I'm missing. Any thoughts? The logic looks good I think. My only concern is that passing UserVF even further down seems to make handling of it even more complicated to follow, but I don't think there's a good way around that because we need some info not available here. At least, that's what the implementation does today and I assumed that was deliberate. Maybe someone else can clarify that? -force-vector-width is internal and should be used for regression tests only. Whether it applies a different vector width or does not vectorize at all, the regression tests should fail. Because it is internal, we it shouldn't matter which semantics we chose. Personally, I'd chose the LoopHint semantics to reduce the number of code paths. Yes that is how it is used today AFAIK, to write LV tests independent of cost-modeling. Agreed with @Meinersbur that it should not matter for regression tests, because I think it is mostly used for testing codegen in legal scenarios. If we decide to adjust the behavior, that's probably best done in a separate patch.
5344–5346	I think we currently would only issue a remark with the chosen VF, but it would probably good to add a separate remark so it is easy for users to spot (`OptimizationRemarkAnalysis` seems to be a suitable category here).

Fix style issue.
Add optimization remark when clamping.

c-rhodes marked 3 inline comments as done.Nov 10 2020, 9:03 AM

c-rhodes added inline comments.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5237	@fhahn @Meinersbur thanks for clarifying
5344–5346	I think we currently would only issue a remark with the chosen VF, but it would probably good to add a separate remark so it is easy for users to spot (OptimizationRemarkAnalysis seems to be a suitable category here). Done, thanks for pointing out OptimizationRemarkAnalysis

Thanks, looks good to me.

This revision is now accepted and ready to land.Nov 10 2020, 1:46 PM

LGTM as well, thanks!

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5345	nit: it seems like the message here and of the remark slightly diverged. I think it would be worth using the same 'clamping' wording as in the remark here.
llvm/test/Transforms/LoopVectorize/unsafe-vf-remark.ll
4	It might also be interesting to add a test cases where the user provided VF is large (say 64), the max legal width is something like 32 and the profitable width selected by the cost model is something smaller (might be easier if this is a target-specific test for a specific architecture ).

LGTM as well. It seems the current patch fixes the example I pasted in my previous comment.

LGTM as well. It seems the current patch fixes the example I pasted in my previous comment.

Yeah, Thanks for that.

c-rhodes added inline comments.Nov 11 2020, 5:32 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5345	nit: it seems like the message here and of the remark slightly diverged. I think it would be worth using the same 'clamping' wording as in the remark here. Good spot, I'll fix it before merging or update this patch once I get a better idea about the other suggestion you made.
llvm/test/Transforms/LoopVectorize/unsafe-vf-remark.ll
4	It might also be interesting to add a test cases where the user provided VF is large (say 64), the max legal width is something like 32 and the profitable width selected by the cost model is something smaller (might be easier if this is a target-specific test for a specific architecture ). Is this loop like what you had in mind? void foo(int a, int b, int N) { #pragma clang loop vectorize(enable) vectorize_width(64) for (int i=0; i<N; ++i) { a[i + 32] = a[i] / b[i]; } } When compiling with: ./bin/clang -S -emit-llvm -o - ../dependence.c -O2 -mllvm -debug-only=loop-vectorize,loop-accesses -target aarch64-linux-gnu The user VF of 64 is unsafe so it's clamped to 32 and the vector loop of width 32 is more expensive (cost 13) than the scalar loop (cost 10), although the vectorization is forced so the VF=32 is still chosen.

Rebased (computeFeasibleMaxVF now returns an ElementCount).
Address one of @fhahn's comments.
Added an assert at the top of computeFeasibleMaxVF that UserVF isn't scalable.

c-rhodes marked an inline comment as done.Nov 18 2020, 5:31 AM

c-rhodes added a child revision: D91718: [LV] Legalize scalable VF hints.Nov 18 2020, 8:32 AM

c-rhodes added inline comments.Nov 20 2020, 3:24 AM

llvm/test/Transforms/LoopVectorize/unsafe-vf-remark.ll
4	It might also be interesting to add a test cases where the user provided VF is large (say 64), the max legal width is something like 32 and the profitable width selected by the cost model is something smaller (might be easier if this is a target-specific test for a specific architecture ). I might be missing something but I don't think this test will add any value, although I'm not comfortable landing this patch until this is resolved. I suggest landing this patch as is, @fhahn unless you feel strongly about it?

fhahn added inline comments.Nov 24 2020, 11:37 AM

llvm/test/Transforms/LoopVectorize/unsafe-vf-remark.ll
4	It might also be interesting to add a test cases where the user provided VF is large (say 64), the max legal width is something like 32 and the profitable width selected by the cost model is something smaller (might be easier if this is a target-specific test for a specific architecture ). I might be missing something but I don't think this test will add any value, although I'm not comfortable landing this patch until this is resolved. I suggest landing this patch as is, @fhahn unless you feel strongly about it? Agreed, such a test doesn't really add much. What I was suggesting was one where the cost model does pick a different VF than the maximum safe one. This is the case that should be handled differently with the current version compared to the first version. I was thinking about a test like the one below. It requests VF = 32, but only 16 is safe. When built with `opt -loop-vectorize -mtriple=arm64-apple-iphoneos`, the cost model should pick VF = 2 instead of the higher alternatives. define void @test(i64* nocapture %a, i64* nocapture readonly %b) { entry: br label %loop.header loop.header: %iv = phi i64 [ 0, %entry ], [ %iv.next, %latch ] %arrayidx = getelementptr inbounds i64, i64* %a, i64 %iv %0 = load i64, i64* %arrayidx, align 4 %arrayidx2 = getelementptr inbounds i64, i64* %b, i64 %iv %1 = load i64, i64* %arrayidx2, align 4 %add = add nsw i64 %1, %0 %2 = add nuw nsw i64 %iv, 16 %arrayidx5 = getelementptr inbounds i64, i64* %a, i64 %2 %c = icmp eq i64 %1, 120 br i1 %c, label %then, label %latch then: store i64 %add, i64* %arrayidx5, align 4 br label %latch latch: %iv.next = add nuw nsw i64 %iv, 1 %exitcond.not = icmp eq i64 %iv.next, 1024 br i1 %exitcond.not, label %exit, label %loop.header, !llvm.loop !0 exit: ret void } !0 = !{!0, !1, !2} !1 = !{!"llvm.loop.vectorize.width", i64 32} !2 = !{!"llvm.loop.vectorize.enable", i1 true}
21	nit: Is this required? LV will add a minimum iteration check anyways, so this could be dropped to make the IR more compact? Alternatively we could also use a constant trip count.
25	nit: Is this required? We could just change `%N` to be a `i64` to make the IR more compact.
32	nit: can we drop the `indvars.` prefix to make the IR slightly more readable?

Address Florian's comments

c-rhodes marked 2 inline comments as done.Nov 25 2020, 4:03 AM

c-rhodes added inline comments.

llvm/test/Transforms/LoopVectorize/unsafe-vf-remark.ll
4	I was thinking about a test like the one below. It requests VF = 32, but only 16 is safe. When built with opt -loop-vectorize -mtriple=arm64-apple-iphoneos, the cost model should pick VF = 2 instead of the higher alternatives. That makes sense cheers, I've added the test
21	nit: Is this required? LV will add a minimum iteration check anyways, so this could be dropped to make the IR more compact? Alternatively we could also use a constant trip count. Changed it to use a constant trip count

c-rhodes mentioned this in D91718: [LV] Legalize scalable VF hints.Nov 25 2020, 7:41 AM

I think I've addressed all comments now so I'll land this at the beginning of next week unless there's any objections between then, thanks for reviewing all!

Closed by commit rGcba4accda08f: [LV] Clamp VF hint when unsafe (authored by c-rhodes). · Explain WhyDec 1 2020, 3:31 AM

This revision was automatically updated to reflect the committed changes.

c-rhodes added a commit: rGcba4accda08f: [LV] Clamp VF hint when unsafe.

sdesmalen mentioned this in D91077: [LoopVectorizer][SVE] Vectorize a simple loop with with a scalable VF..Dec 8 2020, 6:17 AM

I configured and built llvm with:
cmake -G Ninja -DLLVM_TARGETS_TO_BUILD:STRING=Hexagon -DLLVM_DEFAULT_TARGET_TRIPLE:STRING=hexagon-unknown-elf -DLLVM_TARGET_ARCH:STRING=hexagon-unknown-elf -DLLVM_ENABLE_ASSERTIONS:BOOL=ON -DLLVM_PARALLEL_LINK_JOBS=1 '-DLLVM_ENABLE_PROJECTS=llvm;clang' -DBUILD_SHARED_LIBS:BOOL=ON ..

This patch causes an assert with this testcase:
typedef struct {

char a;

} b;
b *c;
int d, e;
int f() {

int g = 0;
for (; d; d++) {
  e = 0;
  for (; e < c[d].a; e++)
    g++;
}
return g;

}
clang -Os -mhvx -fvectorize -mv67 testcase.i -S -o -

llvm::LoopVectorizationCostModel::computeMaxVF(llvm::ElementCount, unsigned int): Assertion `WideningDecisions.empty() && Uniforms.empty() && Scalars.empty() && "No decisions should have been taken at this point"' failed.

This revision is now accepted and ready to land.Jan 15 2021, 8:53 AM

clang can also be configured for all targets
cmake -G Ninja -DLLVM_ENABLE_ASSERTIONS:BOOL=ON -DLLVM_PARALLEL_LINK_JOBS=1 '-DLLVM_ENABLE_PROJECTS=llvm;clang' -DBUILD_SHARED_LIBS:BOOL=ON ..

clang -Os -target hexagon -mhvx -fvectorize -mv67 testcase.i -S -o -

In D90687#2501292, @iajbar wrote:
I configured and built llvm with:
cmake -G Ninja -DLLVM_TARGETS_TO_BUILD:STRING=Hexagon -DLLVM_DEFAULT_TARGET_TRIPLE:STRING=hexagon-unknown-elf -DLLVM_TARGET_ARCH:STRING=hexagon-unknown-elf -DLLVM_ENABLE_ASSERTIONS:BOOL=ON -DLLVM_PARALLEL_LINK_JOBS=1 '-DLLVM_ENABLE_PROJECTS=llvm;clang' -DBUILD_SHARED_LIBS:BOOL=ON ..

This patch causes an assert with this testcase:
typedef struct {
char a;
} b;
b *c;
int d, e;
int f() {
int g = 0;
for (; d; d++) {
  e = 0;
  for (; e < c[d].a; e++)
    g++;
}
return g;
}
clang -Os -mhvx -fvectorize -mv67 testcase.i -S -o -

llvm::LoopVectorizationCostModel::computeMaxVF(llvm::ElementCount, unsigned int): Assertion `WideningDecisions.empty() && Uniforms.empty() && Scalars.empty() && "No decisions should have been taken at this point"' failed.

Thanks for reporting. I reproduced the crash but found it also occurs before this landed (I tried commit eeba70a), so I'm not sure it's this patch causing the issue. Could you confirm this?

The patch that causes the assert is patch cba4accda08f90. I tried commit c63799fc52ff247 that is before your patch and there is no assert. Thank you.

c-rhodes mentioned this in D94869: [LV] Fix crash when computing max VF too early.Jan 16 2021, 9:00 AM

In D90687#2501881, @iajbar wrote:

The patch that causes the assert is patch cba4accda08f90. I tried commit c63799fc52ff247 that is before your patch and there is no assert. Thank you.

Apologies, I got this patch mixed up with D91718. You're right the crash is introduced by this patch. I've posted a fix D94869, see patch for more details.

Unrelated to this issue but might be of interest to you, I hit a crash:

Loop IV: clang: /home/culrho01/llvm-project/llvm/include/llvm/Support/Casting.h:104: static bool llvm::isa_impl_cl<To, const From*>::doit(const From*) [with To = llvm::Instruction; From = llvm::Value]: Assertion `Val && "isa<> used on a null pointer"' failed.

when compiling your testcase with -mllvm -debug. Full invocation: ./bin/clang -Os -mhvx -fvectorize -mv67 ../testcase.c -S -o - -mllvm -debug.

bjope added a subscriber: bjope.Jan 17 2021, 12:56 AM

Thanks Cullen! I tried your fix D94869 and my benchmark passes.

c-rhodes mentioned this in rG8cda227432f1: [LV] Fix crash when computing max VF too early.Feb 1 2021, 4:15 AM

In D90687#2512510, @iajbar wrote:

Thanks Cullen! I tried your fix D94869 and my benchmark passes.

Fix has landed, closing this again.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

LoopVectorize.cpp

32 lines

test/

Transforms/

LoopVectorize/

unsafe-vf-remark.ll

46 lines

Diff 303119

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,436 Lines • ▼ Show 20 Lines	public:
}		}

private:		private:
unsigned NumPredStores = 0;		unsigned NumPredStores = 0;

/// \return An upper bound for the vectorization factor, a power-of-2 larger		/// \return An upper bound for the vectorization factor, a power-of-2 larger
/// than zero. One is returned if vectorization should best be avoided due		/// than zero. One is returned if vectorization should best be avoided due
/// to cost.		/// to cost.
unsigned computeFeasibleMaxVF(unsigned ConstTripCount);		unsigned computeFeasibleMaxVF(unsigned ConstTripCount, unsigned UserVF);

/// The vectorization cost is a combination of the cost itself and a boolean		/// The vectorization cost is a combination of the cost itself and a boolean
/// indicating whether any of the contributing operations will actually		/// indicating whether any of the contributing operations will actually
/// operate on		/// operate on
/// vector values after type legalization in the backend. If this latter value		/// vector values after type legalization in the backend. If this latter value
/// is		/// is
/// false, then all operations will be scalarized (i.e. no vectorization has		/// false, then all operations will be scalarized (i.e. no vectorization has
/// actually taken place).		/// actually taken place).
▲ Show 20 Lines • Show All 3,775 Lines • ▼ Show 20 Lines	Optional<unsigned> LoopVectorizationCostModel::computeMaxVF(unsigned UserVF,
LLVM_DEBUG(dbgs() << "LV: Found trip count: " << TC << '\n');		LLVM_DEBUG(dbgs() << "LV: Found trip count: " << TC << '\n');
if (TC == 1) {		if (TC == 1) {
reportVectorizationFailure("Single iteration (non) loop",		reportVectorizationFailure("Single iteration (non) loop",
"loop trip count is one, irrelevant for vectorization",		"loop trip count is one, irrelevant for vectorization",
"SingleIterationLoop", ORE, TheLoop);		"SingleIterationLoop", ORE, TheLoop);
return None;		return None;
}		}

		auto MaxVF = computeFeasibleMaxVF(TC, UserVF);
		dmgreenUnsubmitted Not Done Reply Inline Actions Clamping sounds good. But I think that there is a difference between the "maximum safe vector width" and "the maximum register bitwidth". The maximum safe distance should be a legality constraint that doesn't depend on the size of the backend registers. It's perhaps easier to show with examples. This code should produce the same thing (the same width vectors) with and without this patch, I believe, as there is nothing "unsafe" about vectorizing at a higher bitwidth than the vector registers: https://godbolt.org/z/ePdv3K (I am trying to not use the term "legal", as it has too many meanings. There is a difference between "legal to vectorize" (as the safety constraints in LoopVectorizationLegality) and "legal vector widths" which just means that the llvm-ir vector can be lowered to a single vector register and I don't think should be very relevant here). dmgreen: Clamping sounds good. But I think that there is a difference between the "maximum safe vector…
		c-rhodesAuthorUnsubmitted Not Done Reply Inline Actions But I think that there is a difference between the "maximum safe vector width" and "the maximum register bitwidth". The maximum safe distance should be a legality constraint that doesn't depend on the size of the backend registers. So only considering the VF computed by LAA for dependencies rather the backend register widths, I think that makes sense. Whilst looking into this I discovered the example I gave in the commit message doesn't actually vectorize when only specifying `-force-vector-width=4` and no loop hint: void foo(int a, int b, int N) { for (int i=0; i<N; ++i) { a[i + 2] = a[i] + b[i]; } } clang -S -emit-llvm -o - ../dependence.c -O3 -Rpass-missed=loop-vectorize -mllvm -force-vector-width=4 ../dependence.c:2:3: remark: loop not vectorized [-Rpass-missed=loop-vectorize] for (int i=0; i<N; ++i) { ^ It's a bit of a mess how UserVF is handled, it seems LAA only knows about the UserVF specified by `-force-vector-width`. With the pragma LAA is operating on VF=2 and LV on VF=4. What's also interesting is the loop metadata takes precedence over the flag since in `LoopVectorizationLegality` the vector width is initialized with the flag then populated with loop metadata, so the VF according to LAA would come from the flag and in LV the loop hint, assuming both were specified by the user that is. I've updated the patch such that `computeFeasibleMaxVF` now takes `UserVF` and clamps this to VF based on `Legal->getMaxSafeRegisterWidth();` if it exceeds it. This simplifies the patch a fair bit since no existing tests change, although I think the handling of `UserVF` needs refactoring. Now I know `-force-vector-width` won't apply unless it's safe, the loop hint and flag feel semantically equivalent but I'm not sure if there's anything I'm missing. Any thoughts? c-rhodes: > But I think that there is a difference between the "maximum safe vector width" and "the…
		sdesmalenUnsubmitted Not Done Reply Inline Actions the loop hint and flag feel semantically equivalent but I'm not sure if there's anything I'm missing. Any thoughts? To me they read as slightly different things, but it's good to clear their semantics up. I interpret: `-force-vector-width` as "vectorize it with this width and this width only, and fail if not legal". The `LoopHint` as "try to vectorize with this width but if not legal, feel free to ignore the hint and pick a different width". At least, that's what the implementation does today and I assumed that was deliberate. Maybe someone else can clarify that? sdesmalen: > the loop hint and flag feel semantically equivalent but I'm not sure if there's anything I'm…
		MeinersburUnsubmitted Not Done Reply Inline Actions `-force-vector-width` is internal and should be used for regression tests only. Whether it applies a different vector width or does not vectorize at all, the regression tests should fail. Because it is internal, we it shouldn't matter which semantics we chose. Personally, I'd chose the LoopHint semantics to reduce the number of code paths. Meinersbur: `-force-vector-width` is internal and should be used for regression tests only. Whether it…
		fhahnUnsubmitted Not Done Reply Inline Actions I've updated the patch such that computeFeasibleMaxVF now takes UserVF and clamps this to VF based on Legal->getMaxSafeRegisterWidth(); if it exceeds it. This simplifies the patch a fair bit since no existing tests change, although I think the handling of UserVF needs refactoring. Now I know -force-vector-width won't apply unless it's safe, the loop hint and flag feel semantically equivalent but I'm not sure if there's anything I'm missing. Any thoughts? The logic looks good I think. My only concern is that passing UserVF even further down seems to make handling of it even more complicated to follow, but I don't think there's a good way around that because we need some info not available here. At least, that's what the implementation does today and I assumed that was deliberate. Maybe someone else can clarify that? -force-vector-width is internal and should be used for regression tests only. Whether it applies a different vector width or does not vectorize at all, the regression tests should fail. Because it is internal, we it shouldn't matter which semantics we chose. Personally, I'd chose the LoopHint semantics to reduce the number of code paths. Yes that is how it is used today AFAIK, to write LV tests independent of cost-modeling. Agreed with @Meinersbur that it should not matter for regression tests, because I think it is mostly used for testing codegen in legal scenarios. If we decide to adjust the behavior, that's probably best done in a separate patch. fhahn: > I've updated the patch such that computeFeasibleMaxVF now takes UserVF and clamps this to VF…
		c-rhodesAuthorUnsubmitted Not Done Reply Inline Actions @fhahn @Meinersbur thanks for clarifying c-rhodes: @fhahn @Meinersbur thanks for clarifying
		sdesmalenUnsubmitted Not Done Reply Inline Actions That's a good point @dmgreen, `MaxSafeRegisterWidth` and corresponding `getMaxSafeRegisterWidth` in LoopVectorize.cpp and LoopAccessAnalysis.cpp are actually misnomers because it isn't the maximum safe register width that is calculated, but rather the maximum safe vector bitwidth. Without this patch, this example is vectorized with VF=8 when compiling for Neon (128bit vectors): void foo(int a, int b, int c, int N) { #pragma clang loop vectorize(enable) vectorize_width(8) for (int i=0; i<N; ++i) { a[i + 16] = a[i] + b[i]; } } Where with this patch, it is now vectorized with VF=4. It seems like the limitation with regards to the actual physical vector register can and should be removed. sdesmalen: That's a good point @dmgreen, `MaxSafeRegisterWidth` and corresponding…

		dmgreenUnsubmitted Not Done Reply Inline Actions I don't think computeFeasibleMaxVF is the maximum safe width. It does a lot of things and is largely based on the backend vector widths. I'm guessing that's why so many tests are changing. It should probably be based on Legal.getMaxSafeRegisterWidth()? dmgreen: I don't think computeFeasibleMaxVF is the maximum safe width. It does a lot of things and is…
		fhahnUnsubmitted Not Done Reply Inline Actions It should probably be based on Legal.getMaxSafeRegisterWidth()? computeFeasibleMaxVF should only return legal vectorization factors (using getMaxSafeRegisterWidht) I think. Given that we are free to ignore the hint, if it is not useful, why not just use the largest safe vectorization factors instead of bailing out? Unfortunately the handling of UserVF is a bit of a mess, but I think it might be preferable to clamp the UserVF to the maximum vectorization factor in the caller of computeMaxVF where UserVF is actually used. fhahn: > It should probably be based on Legal.getMaxSafeRegisterWidth()? computeFeasibleMaxVF should…
		c-rhodesAuthorUnsubmitted Not Done Reply Inline Actions It should probably be based on Legal.getMaxSafeRegisterWidth()? computeFeasibleMaxVF should only return legal vectorization factors (using getMaxSafeRegisterWidht) I think. Given that we are free to ignore the hint, if it is not useful, why not just use the largest safe vectorization factors instead of bailing out? Yeah that's right, `computeFeasibleMaxVF` is based on `Legal->getMaxSafeRegisterWidth()` and bounded by the widest register according to the TTI. Unfortunately the handling of UserVF is a bit of a mess, but I think it might be preferable to clamp the UserVF to the maximum vectorization factor in the caller of computeMaxVF where UserVF is actually used. Sure, clamping sounds good to me and @Meinersbur suggested this as well, I'll update the patch. c-rhodes: > > It should probably be based on Legal.getMaxSafeRegisterWidth()? > > computeFeasibleMaxVF…
		MeinersburUnsubmitted Not Done Reply Inline Actions `Legal.getMaxSafeRegisterWidth()` is called within `computeFeasibleMaxVF`. With `computeFeasibleMaxVF` considering additional architecture concerns, can these be just ignored? Meinersbur: `Legal.getMaxSafeRegisterWidth()` is called within `computeFeasibleMaxVF`. With…
		c-rhodesAuthorUnsubmitted Not Done Reply Inline Actions I don't think computeFeasibleMaxVF is the maximum safe width. It does a lot of things and is largely based on the backend vector widths. I'm guessing that's why so many tests are changing. I did find for the X86 tests that have changed it was because of a backend vector width of 128-bit, I expect these changes will still be required when clamping the UserVF to the maximum vectorization factor as suggested. c-rhodes: > I don't think computeFeasibleMaxVF is the maximum safe width. It does a lot of things and is…
		MeinersburUnsubmitted Not Done Reply Inline Actions With `computeFeasibleMaxVF` already called later, could you store its result for both uses? I suggest to move `UserVF ? UserVF : computeFeasibleMaxVF(TC)` which is duplicated multiple times below before this condition. Such as: auto MaxVF = computeFeasibleMaxVF(TC); if (UserVF) { if (UserVF > MaxVF) { ... } MaxVF = UserFC; } Meinersbur: With `computeFeasibleMaxVF` already called later, could you store its result for both uses? I…
switch (ScalarEpilogueStatus) {		switch (ScalarEpilogueStatus) {
case CM_ScalarEpilogueAllowed:		case CM_ScalarEpilogueAllowed:
return UserVF ? UserVF : computeFeasibleMaxVF(TC);		return MaxVF;
case CM_ScalarEpilogueNotNeededUsePredicate:		case CM_ScalarEpilogueNotNeededUsePredicate:
LLVM_DEBUG(		LLVM_DEBUG(
dbgs() << "LV: vector predicate hint/switch found.\n"		dbgs() << "LV: vector predicate hint/switch found.\n"
<< "LV: Not allowing scalar epilogue, creating predicated "		<< "LV: Not allowing scalar epilogue, creating predicated "
<< "vector loop.\n");		<< "vector loop.\n");
break;		break;
case CM_ScalarEpilogueNotAllowedLowTripLoop:		case CM_ScalarEpilogueNotAllowedLowTripLoop:
// fallthrough as a special case of OptForSize		// fallthrough as a special case of OptForSize
Show All 19 Lines	Optional<unsigned> LoopVectorizationCostModel::computeMaxVF(unsigned UserVF,
if (!useMaskedInterleavedAccesses(TTI)) {		if (!useMaskedInterleavedAccesses(TTI)) {
assert(WideningDecisions.empty() && Uniforms.empty() && Scalars.empty() &&		assert(WideningDecisions.empty() && Uniforms.empty() && Scalars.empty() &&
"No decisions should have been taken at this point");		"No decisions should have been taken at this point");
// Note: There is no need to invalidate any cost modeling decisions here, as		// Note: There is no need to invalidate any cost modeling decisions here, as
// non where taken so far.		// non where taken so far.
InterleaveInfo.invalidateGroupsRequiringScalarEpilogue();		InterleaveInfo.invalidateGroupsRequiringScalarEpilogue();
}		}

unsigned MaxVF = UserVF ? UserVF : computeFeasibleMaxVF(TC);		assert(isPowerOf2_32(MaxVF) && "MaxVF must be a power of 2");
assert((UserVF \|\| isPowerOf2_32(MaxVF)) && "MaxVF must be a power of 2");
unsigned MaxVFtimesIC = UserIC ? MaxVF * UserIC : MaxVF;		unsigned MaxVFtimesIC = UserIC ? MaxVF * UserIC : MaxVF;
if (TC > 0 && TC % MaxVFtimesIC == 0) {		if (TC > 0 && TC % MaxVFtimesIC == 0) {
// Accept MaxVF if we do not have a tail.		// Accept MaxVF if we do not have a tail.
LLVM_DEBUG(dbgs() << "LV: No tail will remain for any chosen VF.\n");		LLVM_DEBUG(dbgs() << "LV: No tail will remain for any chosen VF.\n");
return MaxVF;		return MaxVF;
}		}

// If we don't know the precise trip count, or if the trip count that we		// If we don't know the precise trip count, or if the trip count that we
Show All 31 Lines	reportVectorizationFailure(
"cannot optimize for size and vectorize at the same time. "		"cannot optimize for size and vectorize at the same time. "
"Enable vectorization of this loop with '#pragma clang loop "		"Enable vectorization of this loop with '#pragma clang loop "
"vectorize(enable)' when compiling with -Os/-Oz",		"vectorize(enable)' when compiling with -Os/-Oz",
"NoTailLoopWithOptForSize", ORE, TheLoop);		"NoTailLoopWithOptForSize", ORE, TheLoop);
return None;		return None;
}		}

unsigned		unsigned
LoopVectorizationCostModel::computeFeasibleMaxVF(unsigned ConstTripCount) {		LoopVectorizationCostModel::computeFeasibleMaxVF(unsigned ConstTripCount,
		unsigned UserVF) {
MinBWs = computeMinimumValueSizes(TheLoop->getBlocks(), *DB, &TTI);		MinBWs = computeMinimumValueSizes(TheLoop->getBlocks(), *DB, &TTI);
unsigned SmallestType, WidestType;		unsigned SmallestType, WidestType;
std::tie(SmallestType, WidestType) = getSmallestAndWidestTypes();		std::tie(SmallestType, WidestType) = getSmallestAndWidestTypes();
unsigned WidestRegister = TTI.getRegisterBitWidth(true);		unsigned WidestRegister = TTI.getRegisterBitWidth(true);

// Get the maximum safe dependence distance in bits computed by LAA.		// Get the maximum safe dependence distance in bits computed by LAA.
// It is computed by MaxVF * sizeOf(type) * 8, where type is taken from		// It is computed by MaxVF * sizeOf(type) * 8, where type is taken from
// the memory accesses that is most restrictive (involved in the smallest		// the memory accesses that is most restrictive (involved in the smallest
// dependence distance).		// dependence distance).
unsigned MaxSafeRegisterWidth = Legal->getMaxSafeRegisterWidth();		unsigned MaxSafeRegisterWidth = Legal->getMaxSafeRegisterWidth();

		if (UserVF) {
		// If legally unsafe, clamp the user vectorization factor to a safe value.
		auto MaxSafeVF = PowerOf2Floor(MaxSafeRegisterWidth / WidestType);
		MeinersburUnsubmitted Done Reply Inline Actions [style] LLVM coding style prefers explicit types in declarations. Meinersbur: [style] LLVM coding style [[ https://llvm.org/docs/CodingStandards.html#use-auto-type-deduction…
		if (UserVF <= MaxSafeVF)
		return UserVF;

		LLVM_DEBUG(dbgs() << "LV: User VF=" << UserVF
		<< " is unsafe, using maximum safe VF=" << MaxSafeVF
		fhahnUnsubmitted Not Done Reply Inline Actions nit: it seems like the message here and of the remark slightly diverged. I think it would be worth using the same 'clamping' wording as in the remark here. fhahn: nit: it seems like the message here and of the remark slightly diverged. I think it would be…
		c-rhodesAuthorUnsubmitted Done Reply Inline Actions nit: it seems like the message here and of the remark slightly diverged. I think it would be worth using the same 'clamping' wording as in the remark here. Good spot, I'll fix it before merging or update this patch once I get a better idea about the other suggestion you made. c-rhodes: > nit: it seems like the message here and of the remark slightly diverged. I think it would be…
		<< ".\n");
		MeinersburUnsubmitted Done Reply Inline Actions Should there be (also) a diagnostic warning (-Rpass) to inform the user that the value has been clamped? (Or maybe there is already and I don't see where it is done) Meinersbur: Should there be (also) a diagnostic warning (-Rpass) to inform the user that the value has been…
		fhahnUnsubmitted Done Reply Inline Actions I think we currently would only issue a remark with the chosen VF, but it would probably good to add a separate remark so it is easy for users to spot (`OptimizationRemarkAnalysis` seems to be a suitable category here). fhahn: I think we currently would only issue a remark with the chosen VF, but it would probably good…
		c-rhodesAuthorUnsubmitted Not Done Reply Inline Actions I think we currently would only issue a remark with the chosen VF, but it would probably good to add a separate remark so it is easy for users to spot (OptimizationRemarkAnalysis seems to be a suitable category here). Done, thanks for pointing out OptimizationRemarkAnalysis c-rhodes: > I think we currently would only issue a remark with the chosen VF, but it would probably good…
		return MaxSafeVF;
		}

WidestRegister = std::min(WidestRegister, MaxSafeRegisterWidth);		WidestRegister = std::min(WidestRegister, MaxSafeRegisterWidth);

// Ensure MaxVF is a power of 2; the dependence distance bound may not be.		// Ensure MaxVF is a power of 2; the dependence distance bound may not be.
// Note that both WidestRegister and WidestType may not be a powers of 2.		// Note that both WidestRegister and WidestType may not be a powers of 2.
unsigned MaxVectorSize = PowerOf2Floor(WidestRegister / WidestType);		unsigned MaxVectorSize = PowerOf2Floor(WidestRegister / WidestType);

LLVM_DEBUG(dbgs() << "LV: The Smallest and Widest types: " << SmallestType		LLVM_DEBUG(dbgs() << "LV: The Smallest and Widest types: " << SmallestType
<< " / " << WidestType << " bits.\n");		<< " / " << WidestType << " bits.\n");
▲ Show 20 Lines • Show All 1,651 Lines • ▼ Show 20 Lines	LLVM_DEBUG(
"which requires masked-interleaved support.\n");		"which requires masked-interleaved support.\n");
if (CM.InterleaveInfo.invalidateGroups())		if (CM.InterleaveInfo.invalidateGroups())
// Invalidating interleave groups also requires invalidating all decisions		// Invalidating interleave groups also requires invalidating all decisions
// based on them, which includes widening decisions and uniform and scalar		// based on them, which includes widening decisions and uniform and scalar
// values.		// values.
CM.invalidateCostModelingDecisions();		CM.invalidateCostModelingDecisions();
}		}

if (!UserVF.isZero()) {		unsigned MaxVF = MaybeMaxVF.getValue();
		assert(MaxVF != 0 && "MaxVF is zero.");

		if (!UserVF.isZero() && UserVF.getKnownMinValue() <= MaxVF) {
LLVM_DEBUG(dbgs() << "LV: Using user VF " << UserVF << ".\n");		LLVM_DEBUG(dbgs() << "LV: Using user VF " << UserVF << ".\n");
assert(isPowerOf2_32(UserVF.getKnownMinValue()) &&		assert(isPowerOf2_32(UserVF.getKnownMinValue()) &&
"VF needs to be a power of two");		"VF needs to be a power of two");
// Collect the instructions (and their associated costs) that will be more		// Collect the instructions (and their associated costs) that will be more
// profitable to scalarize.		// profitable to scalarize.
CM.selectUserVectorizationFactor(UserVF);		CM.selectUserVectorizationFactor(UserVF);
CM.collectInLoopReductions();		CM.collectInLoopReductions();
buildVPlansWithVPRecipes(UserVF.getKnownMinValue(),		buildVPlansWithVPRecipes(UserVF.getKnownMinValue(),
UserVF.getKnownMinValue());		UserVF.getKnownMinValue());
LLVM_DEBUG(printPlans(dbgs()));		LLVM_DEBUG(printPlans(dbgs()));
return {{UserVF, 0}};		return {{UserVF, 0}};
}		}

unsigned MaxVF = MaybeMaxVF.getValue();
assert(MaxVF != 0 && "MaxVF is zero.");

for (unsigned VF = 1; VF <= MaxVF; VF *= 2) {		for (unsigned VF = 1; VF <= MaxVF; VF *= 2) {
// Collect Uniform and Scalar instructions after vectorization with VF.		// Collect Uniform and Scalar instructions after vectorization with VF.
CM.collectUniformsAndScalars(ElementCount::getFixed(VF));		CM.collectUniformsAndScalars(ElementCount::getFixed(VF));

// Collect the instructions (and their associated costs) that will be more		// Collect the instructions (and their associated costs) that will be more
// profitable to scalarize.		// profitable to scalarize.
if (VF > 1)		if (VF > 1)
CM.collectInstsToScalarize(ElementCount::getFixed(VF));		CM.collectInstsToScalarize(ElementCount::getFixed(VF));
▲ Show 20 Lines • Show All 1,678 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/unsafe-vf-remark.ll

This file was added.

				; RUN: opt -loop-vectorize -debug-only=loop-vectorize -S < %s 2>&1 \| FileCheck %s

				; Make sure the unsafe user specified vectorization factor is clamped.

				fhahnUnsubmitted Not Done Reply Inline Actions It might also be interesting to add a test cases where the user provided VF is large (say 64), the max legal width is something like 32 and the profitable width selected by the cost model is something smaller (might be easier if this is a target-specific test for a specific architecture ). fhahn: It might also be interesting to add a test cases where the user provided VF is large (say 64)…
				c-rhodesAuthorUnsubmitted Not Done Reply Inline Actions It might also be interesting to add a test cases where the user provided VF is large (say 64), the max legal width is something like 32 and the profitable width selected by the cost model is something smaller (might be easier if this is a target-specific test for a specific architecture ). Is this loop like what you had in mind? void foo(int a, int b, int N) { #pragma clang loop vectorize(enable) vectorize_width(64) for (int i=0; i<N; ++i) { a[i + 32] = a[i] / b[i]; } } When compiling with: ./bin/clang -S -emit-llvm -o - ../dependence.c -O2 -mllvm -debug-only=loop-vectorize,loop-accesses -target aarch64-linux-gnu The user VF of 64 is unsafe so it's clamped to 32 and the vector loop of width 32 is more expensive (cost 13) than the scalar loop (cost 10), although the vectorization is forced so the VF=32 is still chosen. c-rhodes: > It might also be interesting to add a test cases where the user provided VF is large (say 64)…
				c-rhodesAuthorUnsubmitted Not Done Reply Inline Actions It might also be interesting to add a test cases where the user provided VF is large (say 64), the max legal width is something like 32 and the profitable width selected by the cost model is something smaller (might be easier if this is a target-specific test for a specific architecture ). I might be missing something but I don't think this test will add any value, although I'm not comfortable landing this patch until this is resolved. I suggest landing this patch as is, @fhahn unless you feel strongly about it? c-rhodes: > It might also be interesting to add a test cases where the user provided VF is large (say 64)…
				fhahnUnsubmitted Not Done Reply Inline Actions It might also be interesting to add a test cases where the user provided VF is large (say 64), the max legal width is something like 32 and the profitable width selected by the cost model is something smaller (might be easier if this is a target-specific test for a specific architecture ). I might be missing something but I don't think this test will add any value, although I'm not comfortable landing this patch until this is resolved. I suggest landing this patch as is, @fhahn unless you feel strongly about it? Agreed, such a test doesn't really add much. What I was suggesting was one where the cost model does pick a different VF than the maximum safe one. This is the case that should be handled differently with the current version compared to the first version. I was thinking about a test like the one below. It requests VF = 32, but only 16 is safe. When built with `opt -loop-vectorize -mtriple=arm64-apple-iphoneos`, the cost model should pick VF = 2 instead of the higher alternatives. define void @test(i64* nocapture %a, i64* nocapture readonly %b) { entry: br label %loop.header loop.header: %iv = phi i64 [ 0, %entry ], [ %iv.next, %latch ] %arrayidx = getelementptr inbounds i64, i64* %a, i64 %iv %0 = load i64, i64* %arrayidx, align 4 %arrayidx2 = getelementptr inbounds i64, i64* %b, i64 %iv %1 = load i64, i64* %arrayidx2, align 4 %add = add nsw i64 %1, %0 %2 = add nuw nsw i64 %iv, 16 %arrayidx5 = getelementptr inbounds i64, i64* %a, i64 %2 %c = icmp eq i64 %1, 120 br i1 %c, label %then, label %latch then: store i64 %add, i64* %arrayidx5, align 4 br label %latch latch: %iv.next = add nuw nsw i64 %iv, 1 %exitcond.not = icmp eq i64 %iv.next, 1024 br i1 %exitcond.not, label %exit, label %loop.header, !llvm.loop !0 exit: ret void } !0 = !{!0, !1, !2} !1 = !{!"llvm.loop.vectorize.width", i64 32} !2 = !{!"llvm.loop.vectorize.enable", i1 true} fhahn: >> It might also be interesting to add a test cases where the user provided VF is large (say…
				c-rhodesAuthorUnsubmitted Done Reply Inline Actions I was thinking about a test like the one below. It requests VF = 32, but only 16 is safe. When built with opt -loop-vectorize -mtriple=arm64-apple-iphoneos, the cost model should pick VF = 2 instead of the higher alternatives. That makes sense cheers, I've added the test c-rhodes: > I was thinking about a test like the one below. It requests VF = 32, but only 16 is safe.
				target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"

				; void foo(int a, int b, int N) {
				; #pragma clang loop vectorize(enable) vectorize_width(4)
				; for (int i=0; i<N; ++i) {
				; a[i + 2] = a[i] + b[i];
				; }
				; }

				; CHECK: LV: User VF=4 is unsafe, using maximum safe VF=2.
				; CHECK-LABEL: @foo
				; CHECK: <2 x i32>
				define void @foo(i32* nocapture %a, i32* nocapture readonly %b, i32 %N) {
				entry:
				%cmp12 = icmp sgt i32 %N, 0
				br i1 %cmp12, label %for.body.preheader, label %for.cond.cleanup

				fhahnUnsubmitted Not Done Reply Inline Actions nit: Is this required? LV will add a minimum iteration check anyways, so this could be dropped to make the IR more compact? Alternatively we could also use a constant trip count. fhahn: nit: Is this required? LV will add a minimum iteration check anyways, so this could be dropped…
				c-rhodesAuthorUnsubmitted Not Done Reply Inline Actions nit: Is this required? LV will add a minimum iteration check anyways, so this could be dropped to make the IR more compact? Alternatively we could also use a constant trip count. Changed it to use a constant trip count c-rhodes: > nit: Is this required? LV will add a minimum iteration check anyways, so this could be…
				for.body.preheader: ; preds = %entry
				%wide.trip.count = zext i32 %N to i64
				br label %for.body

				fhahnUnsubmitted Not Done Reply Inline Actions nit: Is this required? We could just change `%N` to be a `i64` to make the IR more compact. fhahn: nit: Is this required? We could just change `%N` to be a `i64` to make the IR more compact.
				for.cond.cleanup: ; preds = %for.body, %entry
				ret void

				for.body: ; preds = %for.body.preheader, %for.body
				%indvars.iv = phi i64 [ 0, %for.body.preheader ], [ %indvars.iv.next, %for.body ]
				%arrayidx = getelementptr inbounds i32, i32* %a, i64 %indvars.iv
				%0 = load i32, i32* %arrayidx, align 4
				fhahnUnsubmitted Done Reply Inline Actions nit: can we drop the `indvars.` prefix to make the IR slightly more readable? fhahn: nit: can we drop the `indvars.` prefix to make the IR slightly more readable?
				%arrayidx2 = getelementptr inbounds i32, i32* %b, i64 %indvars.iv
				%1 = load i32, i32* %arrayidx2, align 4
				%add = add nsw i32 %1, %0
				%2 = add nuw nsw i64 %indvars.iv, 2
				%arrayidx5 = getelementptr inbounds i32, i32* %a, i64 %2
				store i32 %add, i32* %arrayidx5, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond.not = icmp eq i64 %indvars.iv.next, %wide.trip.count
				br i1 %exitcond.not, label %for.cond.cleanup, label %for.body, !llvm.loop !0
				}

				!0 = !{!0, !1, !2}
				!1 = !{!"llvm.loop.vectorize.width", i32 4}
				!2 = !{!"llvm.loop.vectorize.enable", i1 true}