This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
18/18
LoopVectorize.cpp
22/23
VPlan.h
8/8
VPlan.cpp
-
test/Transforms/LoopVectorize/
-
Transforms/
-
LoopVectorize/
-
AArch64/
5/5
sve-extract-last-veclane.ll
1/1
extract-last-veclane.ll

Differential D95139

[SVE][LoopVectorize] Add support for extracting the last lane of a scalable vector
ClosedPublic

Authored by david-arm on Jan 21 2021, 7:44 AM.

Download Raw Diff

Details

Reviewers

sdesmalen
kmclaughlin
CarolineConcatto
c-rhodes
greened
fhahn
efriedma
frasercrmck

Commits

rGfec0a0adac54: [SVE][LoopVectorize] Add support for extracting the last lane of a scalable…

Summary

There are certain loops like this below:

for (int i = 0; i < n; i++) {
  a[i] = b[i] + 1;
  *inv = a[i];
}

that can only be vectorised if we are able to extract the last lane of the
vectorised form of 'a[i]'. For fixed width vectors this already works since
we know at compile time what the final lane is, however for scalable vectors
this is a different story. This patch adds support for extracting the last
lane from a scalable vector using a runtime determined lane value. I have
added support to VPIteration for non-constant lanes that still permits the
caching of values. Whilst doing this work I couldn't find any explicit tests
for extracting the last lane values of fixed width vectors so I added tests
for both scalable and fixed width vectors.

Diff Detail

Event Timeline

david-arm created this revision.Jan 21 2021, 7:44 AM

Herald added a reviewer: efriedma. · View Herald TranscriptJan 21 2021, 7:44 AM

Herald added subscribers: NickHung, psnobl, rogfer01 and 2 others. · View Herald Transcript

david-arm requested review of this revision.Jan 21 2021, 7:44 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 21 2021, 7:44 AM

Herald added subscribers: llvm-commits, vkmr. · View Herald Transcript

Harbormaster completed remote builds in B86097: Diff 318200.Jan 21 2021, 9:32 AM

david-arm edited reviewers, added: fhahn; removed: efriedma.Jan 22 2021, 5:08 AM

Herald added a reviewer: efriedma. · View Herald TranscriptJan 22 2021, 5:08 AM

david-arm added a reviewer: frasercrmck.Jan 22 2021, 9:15 AM

david-arm mentioned this in D95245: [SVE] Add support for scalable vectorization of loops with int/fast FP reductions.Jan 25 2021, 1:37 AM

CarolineConcatto added a child revision: D95363: [SVE][LoopVectorize] Add support for scalable vectorization of loops with vector reverse.Jan 25 2021, 7:22 AM

david-arm mentioned this in D95363: [SVE][LoopVectorize] Add support for scalable vectorization of loops with vector reverse.Jan 25 2021, 8:45 AM

sdesmalen added inline comments.Jan 26 2021, 5:54 AM

llvm/lib/Transforms/Vectorize/VPlan.h
168	For scalable vectors, it probably only ever makes sense to capture any of the following lanes: The first N lanes from <vscale x N x <eltty>> The last N lanes from <vscale x N x <eltty>> I'm not sure if the loop-vectorizer would currently ever need more than just the first/last lane, but I could imagine for interleaving it may want to extract the second/third/fourth-last value from the vector. Perhaps you can represent this with: unsigned LaneIdx; enum { LK_Fixed, LK_ScalableFirst, LK_ScalableLast, } LaneKind; ?

david-arm added inline comments.Jan 26 2021, 8:20 AM

llvm/lib/Transforms/Vectorize/VPlan.h
168	I'm happy with the idea of adding an extra member to VPInstance that contains an enum and is probably nicer than what I have now! I'm not sure if we need a LK_ScalableFirst though as this is always known at compile time to be 0 I think - perhaps we just need a LK_First and a LK_ScalableLast? Also, are suggesting that the enum describes how to use Lane, i.e. if (StartFromFirst) Index = Lane else if (StartFromLast) Index = NumElts - 1 - Lane? or with LK_ScalableLast do you literally mean the last lane of the vector?

Following @sdesmalen's suggestion I've added an enum to VPIteration that describes how we should interpret the 'Lane' member variable. This means we can support offsets from the last lane up to the known minimum number of elements.

sdesmalen added inline comments.Jan 28 2021, 7:30 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2486	should this be implied by Instance.Lane > 0? (implemented with operator overload, `Instance > 0`)
2507–2513	nit: switch (Instance.Kind) { case VPIteration::LK_First: Lane = Builder.getInt32(Instance.Lane); break; case VPIteration::LK_ScalableLast: Lane = Builder.CreateSub(...); break; } (without default). That will give a compile-time warning when a new kind is added.
4453	Should Lane not be set to VF.getKnownMinValue() - 1 for VF.isScalable() as well?
llvm/lib/Transforms/Vectorize/VPlan.h
166–173	How about: /// LaneKind describes how to interpret Lane. /// For LK_First, Lane is the index into the first N elements of a fixed-vector <N x <etltty>> or a scalable vector <vscale x N x <eltty>>. /// For LK_ScalableLast, Lane is the index into the last N elements of a scalable vector <vscale x N x <eltty>>
168	I indeed meant that LaneKind indexes the first or last 'chunk' in the vector, e.g. `v0, v1` for the first, and `vN-2, vN-1` for the last in: <vscale x 2 x i32> <=> <elt0, elt1 \| elt2, elt3 \| ... \| eltN-2, eltN-1>
234	This only needs to be `2 * VF.getKnownMinValue()` if VF is scalable.
241	use LK_First and drop the default, so that you get a compile-time warning if a new kind is added to the enum.
243–244	Can you make this a method to VPIteration, e.g. `getIndex()`

david-arm added inline comments.Jan 28 2021, 7:43 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
4453	Not in the way I've defined it, i.e. LastLane - Lane. This is a bit similar to how the offset is defined for Cullen's vector.splice intrinsic. A Lane of 0 is the same as LastLane - this does make the calculation of the runtime lane a bit easier.
llvm/lib/Transforms/Vectorize/VPlan.h
166–173	Sure, although I think the way I've defined LK_ScalableLast is not quite the same as you described above. The calculation of lane in my patch is: LaneFromStartOfVec = LastLaneOfVec - Lane.
243–244	Sure. I did originally think about that, but then I wondered if that only really makes sense in the context of a cache? For example, if I move this to VPIteration then I think I should also move getNumCachedLanes() there for consistency otherwise it's a bit odd having the cache size defined in one class and the mapping to a cache index in another.

fhahn added inline comments.Jan 28 2021, 7:52 AM

llvm/lib/Transforms/Vectorize/VPlan.cpp
241	can you add support here as well? The callback is going away soon, so we also need to support the non-callback version.
llvm/lib/Transforms/Vectorize/VPlan.h
175	are you planning on adding more kinds here? otherwise this can just be a boolean? Or make this an `enum class`?
182	Can this have a better name, e.g. in line with the enum value or the Boolean variable name, if you change it?.
243–244	I think both should be moved to `VPIteration`, as we need to support both `VPIteration` versions there as well.
llvm/test/Transforms/LoopVectorize/AArch64/neon-extract-last-veclane.ll
3 ↗	(On Diff #319853)	This test does not need to be neon specific, right? extracting the last lane for fixed vectors should be tested fairly well already, so not sure if the test is needed at all?
llvm/test/Transforms/LoopVectorize/AArch64/sve-extract-last-veclane.ll
23	not needed?
26	nit: the names of the blocks could be improved.

Hi @fhahn thanks for the review!

llvm/lib/Transforms/Vectorize/VPlan.cpp
241	OK, the reason I didn't add this originally is because I cannot test this code path in my patch. I thought it might be bad practice to add code without testing it, but I'm happy to add support here if you want. I guess we do have a test for it, so when the callback is removed if there is a bug it will break the test.
llvm/lib/Transforms/Vectorize/VPlan.h
175	I don't have plans to add other kinds here at the moment - I chose an enum here to make it extensible should people wish to add other kinds in future and based on earlier reviewer comments. I'm happy to change it to `enum class` or use a boolean - @sdesmalen don't know if you have a preference here?
llvm/test/Transforms/LoopVectorize/AArch64/neon-extract-last-veclane.ll
3 ↗	(On Diff #319853)	Similar to what I mentioned in the commit message, I deliberately added a llvm_unreachable() in the code that I've changed and I found no explicit tests for this at all. There is some limited coverage, but accidentally so in cases where the test was actually trying to test something else. How about I move this to a generic place?
llvm/test/Transforms/LoopVectorize/AArch64/sve-extract-last-veclane.ll
26	OK, to be honest I don't really know what they should be called. :) This is the name that LLVM generates. How about `for.cond.pre-cleanup`?

fhahn added inline comments.Jan 28 2021, 8:14 AM

llvm/test/Transforms/LoopVectorize/AArch64/neon-extract-last-veclane.ll
3 ↗	(On Diff #319853)	Sounds good to me
llvm/test/Transforms/LoopVectorize/AArch64/sve-extract-last-veclane.ll
26	how about something like just `exit`. It can also directly return; you don't need the `%mul.lcssa` phi I think, LV will insert them if needed.

Changed some comments in VPIteration.
Moved getNumCachedLanes() and mapLaneToCacheIndex() to VPIteration.
Changed enum LaneKind to enum class LaneKind, which required introducing constructors in a previous patch.

david-arm marked 16 inline comments as done.Jan 29 2021, 5:38 AM

david-arm added inline comments.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2486	I thought that adding an operator overload to struct VPIteration seemed a bit unnecessary for just this one case. The way I've defined LK_ScalableFirst to work means that the pair (Instance.Lane=0,Instance.Kind=LK_ScalableFirst) actually refers to the last element of the vector, i.e. LastLane - Instance.Lane.
llvm/lib/Transforms/Vectorize/VPlan.h
182	Hi @fhahn It's hard to come up with a much better name - how about "isKnownLane"? This terminology is used throughout the codebase to mean something that is known at compile time, e.g. ElementCount::getKnownMinValue()
241	Sadly this way I also get a compile-time warning about ending the function without returning a value, which means I still have to add a default here.

david-arm added a parent revision: D95676: [VPlan][NFC] Introduce constructor for VPIteration.Jan 29 2021, 6:44 AM

sdesmalen mentioned this in D95676: [VPlan][NFC] Introduce constructor for VPIteration.Feb 2 2021, 1:52 AM

Rebase.

sdesmalen added inline comments.Feb 8 2021, 9:18 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2486	Okay, fair enough on not overloading the operator. I'm a bit concerned about exposing Lane directly though, because this now means you'll need to check Instance.isKnownLane() to know whether this concerns the first lane, or the last. Any code that forgets to check this, may make the wrong assumptions. I'd suggest making Lane private, and adding interfaces such as: getAsFirst(int Lane) getAsScalableLast(int Lane) which asserts that the lane starts at the beginning (getFromFirst) of from the back of a scalable vector (getFromScalableLast). It would also remove the need for additional asserts you added to check if the VPIteration Instance is a known lane.
2486	nit: assert((!Instance.isFirstLane() \|\| Cost->isUniformAfterVectorization(cast<Instruction>(V), VF)) && "Uniform values only have lane zero"); ?
4451	Can you restructure this so that either: Lane/Kind are explicitly set for both Fixed and Scalable case. Or alternatively: Both Lane and Kind are initialized for Fixed, and only updated for scalable.
4453	What I don't like about that definition is that the representation in the LoopVectorizer is entirely different for scalable and fixed-width vectors. `{0, First}` and `{0, LastScalable}` mean the first and last element, respectively. `{3, First}` means the last for `<4 x i32>` and `{3, LastScalable}` means 4th last element (for `<vscale x 4 x i32>`). I think this is confusing and error prone. If First/ScalableLast would mean the first and last scalable "chunk" of a vector, with the index being the index as normal, then the last iteration would be described as `{3, First}` for `<4 x i32>` and `{3, ScalableLast}` for `<vscale x 4 x i32>`, and so the representation in the LoopVectorizer is more or less the same. Code-generating it is slightly different of course, but there's probably only a single place where that has to happen.

david-arm added inline comments.Feb 8 2021, 9:33 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
4453	OK, I'll have a think about it and see code changes would be required. I think in terms of code-generating the lane index, starting from the end is easier for the developer as the most common case will be the very last lane. That's why in my original patch I'd only catered specifically for the very last lane. If we define it in terms of chunks (really just subvectors I think) I might change some of the interfaces and names to make it clear that we're dealing with lane indices from the start of what is essentially a subvector.

david-arm added inline comments.Feb 10 2021, 7:06 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2486	I'm not sure about having enum specific getAsXX functions to be honest. Having to add a new function for any new types people want to add in the future seems a bit inflexible. However, I'll try to find a way of doing something better here if possible. I do take your point about making member variables private, but that will require another NFC patch to change `struct` to `class` and adding get/set variables first. It won't be a small change. :)

Added a new VPLane class to contain all the lane offset and kind, plus add methods getKnownValue and getExpr to return the compile-time value and runtime expression, respectively.
Changed the definition of ScalableLast to Sander's suggestion where the lane value is the offset from the start of the last subvector.
Folded some asserts about the lane value into VPLane::getKnownValue() and VPLane::mapToCacheIndex.

david-arm marked 4 inline comments as done.Feb 15 2021, 9:11 AM

Matt added a subscriber: Matt.Feb 16 2021, 9:05 AM

Thanks for the changes @david-arm, I think this is moving in the right direction.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2486	nit: Can you maybe write this as `Instance.Lane.isFirst()`? Personally I find the difference between isFirstLane and isFirstIteration a bit confusing. (same for other places)
4449–4455	is it worth creating a `static VPIteration::getLastLaneForVF(ElementCount)` for this?
llvm/lib/Transforms/Vectorize/VPlan.cpp
70	don't use default. Cover both cases explicitly, so that if another enum value is added, the compiler will emit a diagnostic this case is not covered.
llvm/lib/Transforms/Vectorize/VPlan.h
98	nit: if you put the comments above each enum-value, doxygen generates a nice table with a comment describing what each enum-value means. See for example `ARMLdStMultipleTiming` in ARMSubtarget.h (and the generated doxygen here: https://llvm.org/doxygen/classllvm_1_1ARMSubtarget.html#ac7324b67d7e3be270177e6590f0bb1e5)
108	I'd prefer this to just be named `Lane`, because it is still a lane. The First or ScalableLast tells in which chunk of the vector this lane lives.
118	nit: `getKnownLane`.
125	nit: `getLaneAsRuntimeExpr` ?

david-arm added inline comments.Feb 18 2021, 1:40 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2486	Sure. I added isFirstIteration at your suggestion because it refers to Part=0 as well as Lane=0.
llvm/lib/Transforms/Vectorize/VPlan.cpp
70	OK I can do that - it just might mean adding an initialiser to Lane at the start of the function. I can't return directly from a case statement without a default as the compiler warns about functions returning void otherwise.

fhahn added inline comments.Feb 18 2021, 1:45 AM

llvm/lib/Transforms/Vectorize/VPlan.cpp
70	can you just put `llvm_unreachable` after the switch to get rid of the warning?

david-arm added inline comments.Feb 18 2021, 1:50 AM

llvm/lib/Transforms/Vectorize/VPlan.cpp
70	Yeah I can do that if that's the preferred way?

Renamed and removed some interfaces.
Removed default case in getAsRuntimeExpr, but had to add llvm_unreachable() as a result in order to kill the resulting compiler warning.
Add VPLane::getLastLaneForVF().

david-arm marked 8 inline comments as done.Feb 19 2021, 9:06 AM

sdesmalen added inline comments.Feb 23 2021, 3:17 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2942	nit: maybe add a `static VPIteration::getFirstLane`? (seems to be useful in several places).
llvm/lib/Transforms/Vectorize/VPlan.cpp
242	should this use `mapToCacheIndex` ?
llvm/lib/Transforms/Vectorize/VPlan.h
166–168	nit: `const VPLane &`
170–171	nit: is `isFirstIteration` unused now?

Rebase required changing VPTransformState since VectorizerValueMap has been removed.
Added VPLane::getFirstLane()

david-arm marked 4 inline comments as done.Feb 24 2021, 8:41 AM

david-arm added inline comments.

llvm/lib/Transforms/Vectorize/VPlan.h
170–171	It's still used in two places so I've left this in.

Harbormaster completed remote builds in B90624: Diff 326108.Feb 24 2021, 10:19 AM

Thanks for all the changes. The patch LGTM with nits addressed.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
4444	nit: Did you need to move the call to Builder.SetInsertPoint?
llvm/lib/Transforms/Vectorize/VPlan.cpp
429	nit: it would be nice if you can write `State->Instance->Lane = VPLane(Lane, VPLane::Kind::First);` (e.g. by providing `VPLane(const VPlane &Other)` in favour of having a `set` method).
llvm/test/Transforms/LoopVectorize/AArch64/sve-extract-last-veclane.ll
2	nit: passing the attribute in the command is redundant if it's already set by the IR attributes.

This revision is now accepted and ready to land.Mar 2 2021, 1:00 AM

fhahn added inline comments.Mar 2 2021, 1:15 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1155	It looks like this is only used in `llvm/lib/Transforms/Vectorize/VPlan.cpp` in this patch? Should it be defined directly there? If it needs to be shared between multiple files it would probably be better to just put the declaration into a header?
llvm/test/Transforms/LoopVectorize/extract-last-veclane.ll
56	Is there reason to use the metadata here? Can we instead just use the `-force-vector-width` option, which should be a bit simpler and the expected VF is clear from the run line?

david-arm marked an inline comment as done.Mar 3 2021, 12:27 AM

david-arm added inline comments.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1155	It will be used by code in this file in other patches - for example, @CarolineConcatto currently has a patch that also uses this function. I can try to find a suitable header to put this in.

david-arm updated this revision to Diff 328117.Mar 4 2021, 3:55 AM

david-arm marked 5 inline comments as done.

LGTM thanks!

llvm/lib/Transforms/Vectorize/VPlan.h
63	No need for `extern` here I think? Please also add a comment.

Harbormaster completed remote builds in B92034: Diff 328117.Mar 4 2021, 2:39 PM

Closed by commit rGfec0a0adac54: [SVE][LoopVectorize] Add support for extracting the last lane of a scalable… (authored by david-arm). · Explain WhyMar 5 2021, 1:58 AM

This revision was automatically updated to reflect the committed changes.

david-arm added a commit: rGfec0a0adac54: [SVE][LoopVectorize] Add support for extracting the last lane of a scalable….

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

LoopVectorize.cpp

49 lines

VPlan.h

102 lines

VPlan.cpp

29 lines

test/

Transforms/

LoopVectorize/

AArch64/

sve-extract-last-veclane.ll

77 lines

extract-last-veclane.ll

61 lines

Diff 323781

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,145 Lines • ▼ Show 20 Lines
static Value createStepForVF(IRBuilder<> &B, Constant Step, ElementCount VF) {		static Value createStepForVF(IRBuilder<> &B, Constant Step, ElementCount VF) {
assert(isa<ConstantInt>(Step) && "Expected an integer step");		assert(isa<ConstantInt>(Step) && "Expected an integer step");
Constant *StepVal = ConstantInt::get(		Constant *StepVal = ConstantInt::get(
Step->getType(),		Step->getType(),
cast<ConstantInt>(Step)->getSExtValue() * VF.getKnownMinValue());		cast<ConstantInt>(Step)->getSExtValue() * VF.getKnownMinValue());
return VF.isScalable() ? B.CreateVScale(StepVal) : StepVal;		return VF.isScalable() ? B.CreateVScale(StepVal) : StepVal;
}		}

		/// Return the runtime value for VF.
		Value getRuntimeVF(IRBuilder<> &B, Type Ty, ElementCount VF) {
		fhahnUnsubmitted Done Reply Inline Actions It looks like this is only used in `llvm/lib/Transforms/Vectorize/VPlan.cpp` in this patch? Should it be defined directly there? If it needs to be shared between multiple files it would probably be better to just put the declaration into a header? fhahn: It looks like this is only used in `llvm/lib/Transforms/Vectorize/VPlan.cpp` in this patch?
		david-armAuthorUnsubmitted Done Reply Inline Actions It will be used by code in this file in other patches - for example, @CarolineConcatto currently has a patch that also uses this function. I can try to find a suitable header to put this in. david-arm: It will be used by code in this file in other patches - for example, @CarolineConcatto…
		Constant *EC = ConstantInt::get(Ty, VF.getKnownMinValue());
		return VF.isScalable() ? B.CreateVScale(EC) : EC;
		}

namespace llvm {		namespace llvm {

void reportVectorizationFailure(const StringRef DebugMsg,		void reportVectorizationFailure(const StringRef DebugMsg,
const StringRef OREMsg, const StringRef ORETag,		const StringRef OREMsg, const StringRef ORETag,
OptimizationRemarkEmitter ORE, Loop TheLoop, Instruction *I) {		OptimizationRemarkEmitter ORE, Loop TheLoop, Instruction *I) {
LLVM_DEBUG(debugVectorizationFailure(DebugMsg, I));		LLVM_DEBUG(debugVectorizationFailure(DebugMsg, I));
LoopVectorizeHints Hints(TheLoop, true /* doesn't matter /, ORE);		LoopVectorizeHints Hints(TheLoop, true /* doesn't matter /, ORE);
ORE->emit(createLVAnalysis(Hints.vectorizeAnalysisPassName(),		ORE->emit(createLVAnalysis(Hints.vectorizeAnalysisPassName(),
▲ Show 20 Lines • Show All 1,310 Lines • ▼ Show 20 Lines
Value *		Value *
InnerLoopVectorizer::getOrCreateScalarValue(Value *V,		InnerLoopVectorizer::getOrCreateScalarValue(Value *V,
const VPIteration &Instance) {		const VPIteration &Instance) {
// If the value is not an instruction contained in the loop, it should		// If the value is not an instruction contained in the loop, it should
// already be scalar.		// already be scalar.
if (OrigLoop->isLoopInvariant(V))		if (OrigLoop->isLoopInvariant(V))
return V;		return V;

assert(Instance.Lane > 0		assert(!Instance.isFirstLane()
		sdesmalenUnsubmitted Done Reply Inline Actions should this be implied by Instance.Lane > 0? (implemented with operator overload, `Instance > 0`) sdesmalen: should this be implied by Instance.Lane > 0? (implemented with operator overload, `Instance >…
		david-armAuthorUnsubmitted Done Reply Inline Actions I thought that adding an operator overload to struct VPIteration seemed a bit unnecessary for just this one case. The way I've defined LK_ScalableFirst to work means that the pair (Instance.Lane=0,Instance.Kind=LK_ScalableFirst) actually refers to the last element of the vector, i.e. LastLane - Instance.Lane. david-arm: I thought that adding an operator overload to struct VPIteration seemed a bit unnecessary for…
		sdesmalenUnsubmitted Done Reply Inline Actions Okay, fair enough on not overloading the operator. I'm a bit concerned about exposing Lane directly though, because this now means you'll need to check Instance.isKnownLane() to know whether this concerns the first lane, or the last. Any code that forgets to check this, may make the wrong assumptions. I'd suggest making Lane private, and adding interfaces such as: getAsFirst(int Lane) getAsScalableLast(int Lane) which asserts that the lane starts at the beginning (getFromFirst) of from the back of a scalable vector (getFromScalableLast). It would also remove the need for additional asserts you added to check if the VPIteration Instance is a known lane. sdesmalen: Okay, fair enough on not overloading the operator. I'm a bit concerned about exposing Lane…
		david-armAuthorUnsubmitted Done Reply Inline Actions I'm not sure about having enum specific getAsXX functions to be honest. Having to add a new function for any new types people want to add in the future seems a bit inflexible. However, I'll try to find a way of doing something better here if possible. I do take your point about making member variables private, but that will require another NFC patch to change `struct` to `class` and adding get/set variables first. It won't be a small change. :) david-arm: I'm not sure about having enum specific getAsXX functions to be honest. Having to add a new…
		sdesmalenUnsubmitted Done Reply Inline Actions nit: assert((!Instance.isFirstLane() \|\| Cost->isUniformAfterVectorization(cast<Instruction>(V), VF)) && "Uniform values only have lane zero"); ? sdesmalen: nit: assert((!Instance.isFirstLane() \|\| Cost->isUniformAfterVectorization…
		sdesmalenUnsubmitted Done Reply Inline Actions nit: Can you maybe write this as `Instance.Lane.isFirst()`? Personally I find the difference between isFirstLane and isFirstIteration a bit confusing. (same for other places) sdesmalen: nit: Can you maybe write this as `Instance.Lane.isFirst()`? Personally I find the difference…
		david-armAuthorUnsubmitted Done Reply Inline Actions Sure. I added isFirstIteration at your suggestion because it refers to Part=0 as well as Lane=0. david-arm: Sure. I added isFirstIteration at your suggestion because it refers to Part=0 as well as Lane=0.
? !Cost->isUniformAfterVectorization(cast<Instruction>(V), VF)		? !Cost->isUniformAfterVectorization(cast<Instruction>(V), VF)
: true && "Uniform values only have lane zero");		: true && "Uniform values only have lane zero");

// If the value from the original loop has not been vectorized, it is		// If the value from the original loop has not been vectorized, it is
// represented by UF x VF scalar values in the new loop. Return the requested		// represented by UF x VF scalar values in the new loop. Return the requested
// scalar value.		// scalar value.
if (VectorLoopValueMap.hasScalarValue(V, Instance))		if (VectorLoopValueMap.hasScalarValue(V, Instance))
return VectorLoopValueMap.getScalarValue(V, Instance);		return VectorLoopValueMap.getScalarValue(V, Instance);

// If the value has not been scalarized, get its entry in VectorLoopValueMap		// If the value has not been scalarized, get its entry in VectorLoopValueMap
// for the given unroll part. If this entry is not a vector type (i.e., the		// for the given unroll part. If this entry is not a vector type (i.e., the
// vectorization factor is one), there is no need to generate an		// vectorization factor is one), there is no need to generate an
// extractelement instruction.		// extractelement instruction.
auto *U = getOrCreateVectorValue(V, Instance.Part);		auto *U = getOrCreateVectorValue(V, Instance.Part);
if (!U->getType()->isVectorTy()) {		if (!U->getType()->isVectorTy()) {
assert(VF.isScalar() && "Value not scalarized has non-vector type");		assert(VF.isScalar() && "Value not scalarized has non-vector type");
return U;		return U;
}		}

		Value *Lane = Instance.Lane.getExpr(Builder, VF);

// Otherwise, the value from the original loop has been vectorized and is		// Otherwise, the value from the original loop has been vectorized and is
// represented by UF vector values. Extract and return the requested scalar		// represented by UF vector values. Extract and return the requested scalar
// value from the appropriate vector lane.		// value from the appropriate vector lane.
return Builder.CreateExtractElement(U, Builder.getInt32(Instance.Lane));		return Builder.CreateExtractElement(U, Lane);
}		}

		sdesmalenUnsubmitted Done Reply Inline Actions nit: switch (Instance.Kind) { case VPIteration::LK_First: Lane = Builder.getInt32(Instance.Lane); break; case VPIteration::LK_ScalableLast: Lane = Builder.CreateSub(...); break; } (without default). That will give a compile-time warning when a new kind is added. sdesmalen: nit: switch (Instance.Kind) { case VPIteration::LK_First: Lane = Builder.getInt32…
void InnerLoopVectorizer::packScalarIntoVectorValue(		void InnerLoopVectorizer::packScalarIntoVectorValue(
Value *V, const VPIteration &Instance) {		Value *V, const VPIteration &Instance) {
assert(V != Induction && "The new induction variable should not be used.");		assert(V != Induction && "The new induction variable should not be used.");
assert(!V->getType()->isVectorTy() && "Can't pack a vector");		assert(!V->getType()->isVectorTy() && "Can't pack a vector");
assert(!V->getType()->isVoidTy() && "Type does not produce a value");		assert(!V->getType()->isVoidTy() && "Type does not produce a value");

Value *ScalarInst = VectorLoopValueMap.getScalarValue(V, Instance);		Value *ScalarInst = VectorLoopValueMap.getScalarValue(V, Instance);
Value *VectorValue = VectorLoopValueMap.getVectorValue(V, Instance.Part);		Value *VectorValue = VectorLoopValueMap.getVectorValue(V, Instance.Part);
VectorValue = Builder.CreateInsertElement(VectorValue, ScalarInst,		VectorValue = Builder.CreateInsertElement(VectorValue, ScalarInst,
Builder.getInt32(Instance.Lane));		Instance.Lane.getExpr(Builder, VF));
VectorLoopValueMap.resetVectorValue(V, Instance.Part, VectorValue);		VectorLoopValueMap.resetVectorValue(V, Instance.Part, VectorValue);
}		}

void InnerLoopVectorizer::packScalarIntoVectorValue(VPValue *Def,		void InnerLoopVectorizer::packScalarIntoVectorValue(VPValue *Def,
const VPIteration &Instance,		const VPIteration &Instance,
VPTransformState &State) {		VPTransformState &State) {
Value *ScalarInst = State.get(Def, Instance);		Value *ScalarInst = State.get(Def, Instance);
Value *VectorValue = State.get(Def, Instance.Part);		Value *VectorValue = State.get(Def, Instance.Part);
VectorValue = Builder.CreateInsertElement(		VectorValue = Builder.CreateInsertElement(
VectorValue, ScalarInst, State.Builder.getInt32(Instance.Lane));		VectorValue, ScalarInst, Instance.Lane.getExpr(State.Builder, VF));
State.set(Def, VectorValue, Instance.Part);		State.set(Def, VectorValue, Instance.Part);
}		}

Value InnerLoopVectorizer::reverseVector(Value Vec) {		Value InnerLoopVectorizer::reverseVector(Value Vec) {
assert(Vec->getType()->isVectorTy() && "Invalid type");		assert(Vec->getType()->isVectorTy() && "Invalid type");
assert(!VF.isScalable() && "Cannot reverse scalable vectors");		assert(!VF.isScalable() && "Cannot reverse scalable vectors");
SmallVector<int, 8> ShuffleMask;		SmallVector<int, 8> ShuffleMask;
for (unsigned i = 0; i < VF.getKnownMinValue(); ++i)		for (unsigned i = 0; i < VF.getKnownMinValue(); ++i)
▲ Show 20 Lines • Show All 392 Lines • ▼ Show 20 Lines	void InnerLoopVectorizer::scalarizeInstruction(Instruction *Instr, VPUser &User,

// Replace the operands of the cloned instructions with their scalar		// Replace the operands of the cloned instructions with their scalar
// equivalents in the new loop.		// equivalents in the new loop.
for (unsigned op = 0, e = User.getNumOperands(); op != e; ++op) {		for (unsigned op = 0, e = User.getNumOperands(); op != e; ++op) {
auto *Operand = dyn_cast<Instruction>(Instr->getOperand(op));		auto *Operand = dyn_cast<Instruction>(Instr->getOperand(op));
auto InputInstance = Instance;		auto InputInstance = Instance;
if (!Operand \|\| !OrigLoop->contains(Operand) \|\|		if (!Operand \|\| !OrigLoop->contains(Operand) \|\|
(Cost->isUniformAfterVectorization(Operand, State.VF)))		(Cost->isUniformAfterVectorization(Operand, State.VF)))
InputInstance.Lane = 0;		InputInstance.Lane.set(0, VPLane::Kind::First);
		sdesmalenUnsubmitted Done Reply Inline Actions nit: maybe add a `static VPIteration::getFirstLane`? (seems to be useful in several places). sdesmalen: nit: maybe add a `static VPIteration::getFirstLane`? (seems to be useful in several places).
auto *NewOp = State.get(User.getOperand(op), InputInstance);		auto *NewOp = State.get(User.getOperand(op), InputInstance);
Cloned->setOperand(op, NewOp);		Cloned->setOperand(op, NewOp);
}		}
addNewMetadata(Cloned, Instr);		addNewMetadata(Cloned, Instr);

// Place the cloned scalar in the new loop.		// Place the cloned scalar in the new loop.
Builder.Insert(Cloned);		Builder.Insert(Cloned);

▲ Show 20 Lines • Show All 1,483 Lines • ▼ Show 20 Lines	void InnerLoopVectorizer::fixLCSSAPHIs(VPTransformState &State) {
for (PHINode &LCSSAPhi : LoopExitBlock->phis()) {		for (PHINode &LCSSAPhi : LoopExitBlock->phis()) {
if (LCSSAPhi.getBasicBlockIndex(LoopMiddleBlock) != -1)		if (LCSSAPhi.getBasicBlockIndex(LoopMiddleBlock) != -1)
// Some phis were already hand updated by the reduction and recurrence		// Some phis were already hand updated by the reduction and recurrence
// code above, leave them alone.		// code above, leave them alone.
continue;		continue;

auto *IncomingValue = LCSSAPhi.getIncomingValue(0);		auto *IncomingValue = LCSSAPhi.getIncomingValue(0);
// Non-instruction incoming values will have only one value.		// Non-instruction incoming values will have only one value.
unsigned LastLane = 0;
if (isa<Instruction>(IncomingValue))		unsigned LaneOffset = 0;
LastLane = Cost->isUniformAfterVectorization(		VPLane::Kind Kind = VPLane::Kind::First;
		sdesmalenUnsubmitted Done Reply Inline Actions nit: Did you need to move the call to Builder.SetInsertPoint? sdesmalen: nit: Did you need to move the call to Builder.SetInsertPoint?
cast<Instruction>(IncomingValue), VF)		Builder.SetInsertPoint(LoopMiddleBlock->getTerminator());
? 0		if (isa<Instruction>(IncomingValue) &&
: VF.getKnownMinValue() - 1;		!Cost->isUniformAfterVectorization(cast<Instruction>(IncomingValue),
assert((!VF.isScalable() \|\| LastLane == 0) &&		VF)) {
"scalable vectors dont support non-uniform scalars yet");		LaneOffset = VF.getKnownMinValue() - 1;
		if (VF.isScalable())
		// In this case 'LaneOffset' refers to the offset from the start of the
		sdesmalenUnsubmitted Done Reply Inline Actions Can you restructure this so that either: Lane/Kind are explicitly set for both Fixed and Scalable case. Or alternatively: Both Lane and Kind are initialized for Fixed, and only updated for scalable. sdesmalen: Can you restructure this so that either: * Lane/Kind are explicitly set for both Fixed and…
		// last subvector with VF.getKnownMinValue() elements.
		Kind = VPLane::Kind::ScalableLast;
		sdesmalenUnsubmitted Done Reply Inline Actions Should Lane not be set to VF.getKnownMinValue() - 1 for VF.isScalable() as well? sdesmalen: Should Lane not be set to VF.getKnownMinValue() - 1 for VF.isScalable() as well?
		david-armAuthorUnsubmitted Done Reply Inline Actions Not in the way I've defined it, i.e. LastLane - Lane. This is a bit similar to how the offset is defined for Cullen's vector.splice intrinsic. A Lane of 0 is the same as LastLane - this does make the calculation of the runtime lane a bit easier. david-arm: Not in the way I've defined it, i.e. LastLane - Lane. This is a bit similar to how the offset…
		sdesmalenUnsubmitted Done Reply Inline Actions What I don't like about that definition is that the representation in the LoopVectorizer is entirely different for scalable and fixed-width vectors. `{0, First}` and `{0, LastScalable}` mean the first and last element, respectively. `{3, First}` means the last for `<4 x i32>` and `{3, LastScalable}` means 4th last element (for `<vscale x 4 x i32>`). I think this is confusing and error prone. If First/ScalableLast would mean the first and last scalable "chunk" of a vector, with the index being the index as normal, then the last iteration would be described as `{3, First}` for `<4 x i32>` and `{3, ScalableLast}` for `<vscale x 4 x i32>`, and so the representation in the LoopVectorizer is more or less the same. Code-generating it is slightly different of course, but there's probably only a single place where that has to happen. sdesmalen: What I don't like about that definition is that the representation in the LoopVectorizer is…
		david-armAuthorUnsubmitted Done Reply Inline Actions OK, I'll have a think about it and see code changes would be required. I think in terms of code-generating the lane index, starting from the end is easier for the developer as the most common case will be the very last lane. That's why in my original patch I'd only catered specifically for the very last lane. If we define it in terms of chunks (really just subvectors I think) I might change some of the interfaces and names to make it clear that we're dealing with lane indices from the start of what is essentially a subvector. david-arm: OK, I'll have a think about it and see code changes would be required. I think in terms of code…
		else
		Kind = VPLane::Kind::First;
		sdesmalenUnsubmitted Done Reply Inline Actions is it worth creating a `static VPIteration::getLastLaneForVF(ElementCount)` for this? sdesmalen: is it worth creating a `static VPIteration::getLastLaneForVF(ElementCount)` for this?
		}

// Can be a loop invariant incoming value or the last scalar value to be		// Can be a loop invariant incoming value or the last scalar value to be
// extracted from the vectorized loop.		// extracted from the vectorized loop.
Builder.SetInsertPoint(LoopMiddleBlock->getTerminator());
Value *lastIncomingValue =		Value *lastIncomingValue =
OrigLoop->isLoopInvariant(IncomingValue)		OrigLoop->isLoopInvariant(IncomingValue)
? IncomingValue		? IncomingValue
: State.get(State.Plan->getVPValue(IncomingValue),		: State.get(State.Plan->getVPValue(IncomingValue),
VPIteration(UF - 1, LastLane));		VPIteration(UF - 1, LaneOffset, Kind));
LCSSAPhi.addIncoming(lastIncomingValue, LoopMiddleBlock);		LCSSAPhi.addIncoming(lastIncomingValue, LoopMiddleBlock);
}		}
}		}

void InnerLoopVectorizer::sinkScalarOperands(Instruction *PredInst) {		void InnerLoopVectorizer::sinkScalarOperands(Instruction *PredInst) {
// The basic block and loop containing the predicated instruction.		// The basic block and loop containing the predicated instruction.
auto *PredBB = PredInst->getParent();		auto *PredBB = PredInst->getParent();
auto *VectorLoop = LI->getLoopFor(PredBB);		auto *VectorLoop = LI->getLoopFor(PredBB);
▲ Show 20 Lines • Show All 4,664 Lines • ▼ Show 20 Lines
void VPReplicateRecipe::execute(VPTransformState &State) {		void VPReplicateRecipe::execute(VPTransformState &State) {
if (State.Instance) { // Generate a single instance.		if (State.Instance) { // Generate a single instance.
assert(!State.VF.isScalable() && "Can't scalarize a scalable vector");		assert(!State.VF.isScalable() && "Can't scalarize a scalable vector");
State.ILV->scalarizeInstruction(getUnderlyingInstr(), *this,		State.ILV->scalarizeInstruction(getUnderlyingInstr(), *this,
*State.Instance, IsPredicated, State);		*State.Instance, IsPredicated, State);
// Insert scalar instance packing it into a vector.		// Insert scalar instance packing it into a vector.
if (AlsoPack && State.VF.isVector()) {		if (AlsoPack && State.VF.isVector()) {
// If we're constructing lane 0, initialize to start from poison.		// If we're constructing lane 0, initialize to start from poison.
if (State.Instance->Lane == 0) {		if (State.Instance->isFirstLane()) {
assert(!State.VF.isScalable() && "VF is assumed to be non scalable.");		assert(!State.VF.isScalable() && "VF is assumed to be non scalable.");
Value *Poison = PoisonValue::get(		Value *Poison = PoisonValue::get(
VectorType::get(getUnderlyingValue()->getType(), State.VF));		VectorType::get(getUnderlyingValue()->getType(), State.VF));
State.ValueMap.setVectorValue(getUnderlyingInstr(),		State.ValueMap.setVectorValue(getUnderlyingInstr(),
State.Instance->Part, Poison);		State.Instance->Part, Poison);
}		}
State.ILV->packScalarIntoVectorValue(getUnderlyingInstr(),		State.ILV->packScalarIntoVectorValue(getUnderlyingInstr(),
*State.Instance);		*State.Instance);
Show All 13 Lines	for (unsigned Lane = 0; Lane < EndLane; ++Lane)
VPIteration(Part, Lane), IsPredicated,		VPIteration(Part, Lane), IsPredicated,
State);		State);
}		}

void VPBranchOnMaskRecipe::execute(VPTransformState &State) {		void VPBranchOnMaskRecipe::execute(VPTransformState &State) {
assert(State.Instance && "Branch on Mask works only on single instance.");		assert(State.Instance && "Branch on Mask works only on single instance.");

unsigned Part = State.Instance->Part;		unsigned Part = State.Instance->Part;
unsigned Lane = State.Instance->Lane;		unsigned Lane = State.Instance->Lane.getKnownValue();

Value *ConditionBit = nullptr;		Value *ConditionBit = nullptr;
VPValue *BlockInMask = getMask();		VPValue *BlockInMask = getMask();
if (BlockInMask) {		if (BlockInMask) {
ConditionBit = State.get(BlockInMask, Part);		ConditionBit = State.get(BlockInMask, Part);
if (ConditionBit->getType()->isVectorTy())		if (ConditionBit->getType()->isVectorTy())
ConditionBit = State.Builder.CreateExtractElement(		ConditionBit = State.Builder.CreateExtractElement(
ConditionBit, State.Builder.getInt32(Lane));		ConditionBit, State.Builder.getInt32(Lane));
▲ Show 20 Lines • Show All 681 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/VPlan.h

Show First 20 Lines • Show All 54 Lines • ▼ Show 20 Lines
class raw_ostream;		class raw_ostream;
class RecurrenceDescriptor;		class RecurrenceDescriptor;
class Value;		class Value;
class VPBasicBlock;		class VPBasicBlock;
class VPRegionBlock;		class VPRegionBlock;
class VPlan;		class VPlan;
class VPlanSlp;		class VPlanSlp;

/// A range of powers-of-2 vectorization factors with fixed start and		/// A range of powers-of-2 vectorization factors with fixed start and
		fhahnUnsubmitted Not Done Reply Inline Actions No need for `extern` here I think? Please also add a comment. fhahn: No need for `extern` here I think? Please also add a comment.
/// adjustable end. The range includes start and excludes end, e.g.,:		/// adjustable end. The range includes start and excludes end, e.g.,:
/// [1, 9) = {1, 2, 4, 8}		/// [1, 9) = {1, 2, 4, 8}
struct VFRange {		struct VFRange {
// A power of 2.		// A power of 2.
const ElementCount Start;		const ElementCount Start;

// Need not be a power of 2. If End <= Start range is empty.		// Need not be a power of 2. If End <= Start range is empty.
ElementCount End;		ElementCount End;
Show All 12 Lines
};		};

using VPlanPtr = std::unique_ptr<VPlan>;		using VPlanPtr = std::unique_ptr<VPlan>;

/// In what follows, the term "input IR" refers to code that is fed into the		/// In what follows, the term "input IR" refers to code that is fed into the
/// vectorizer whereas the term "output IR" refers to code that is generated by		/// vectorizer whereas the term "output IR" refers to code that is generated by
/// the vectorizer.		/// the vectorizer.

		/// VPLane provides a way to access lanes in both fixed width and scalable
		/// vectors, where for the latter the lane index sometimes needs calculating
		/// as a runtime expression.
		class VPLane {
		public:
		/// Kind describes how to interpret Val.
		/// For First, Val is the index into the first N elements of a
		sdesmalenUnsubmitted Done Reply Inline Actions nit: if you put the comments above each enum-value, doxygen generates a nice table with a comment describing what each enum-value means. See for example `ARMLdStMultipleTiming` in ARMSubtarget.h (and the generated doxygen here: https://llvm.org/doxygen/classllvm_1_1ARMSubtarget.html#ac7324b67d7e3be270177e6590f0bb1e5) sdesmalen: nit: if you put the comments above each enum-value, doxygen generates a nice table with a…
		/// fixed-vector <N x <ElTy>> or a scalable vector <vscale x N x <ElTy>>.
		/// For ScalableLast, Val is the offset from the start of the last N-element
		/// subvector in a scalable vector <vscale x N x <ElTy>>. For example, a Val
		/// of 0 corresponds to lane `(vscale - 1) * N`, a Val of 1 corresponds to
		/// `((vscale - 1) * N) + 1`, etc.
		enum class Kind { First, ScalableLast };

		private:
		/// in [0..VF)
		unsigned Val;
		sdesmalenUnsubmitted Done Reply Inline Actions I'd prefer this to just be named `Lane`, because it is still a lane. The First or ScalableLast tells in which chunk of the vector this lane lives. sdesmalen: I'd prefer this to just be named `Lane`, because it is still a lane. The First or ScalableLast…

		/// Indicates how the Lane should be interpreted, as described above.
		Kind LaneKind;

		public:
		VPLane(unsigned Val, Kind LaneKind) : Val(Val), LaneKind(LaneKind) {}

		/// Returns a compile-time known value for the lane index and asserts if the
		/// lane can only be calculated at runtime.
		unsigned getKnownValue() const {
		sdesmalenUnsubmitted Done Reply Inline Actions nit: `getKnownLane`. sdesmalen: nit: `getKnownLane`.
		assert(LaneKind == Kind::First);
		return Val;
		}

		/// Returns an expression describing the lane index that can be used at
		/// runtime.
		Value *getExpr(IRBuilder<> &Builder, const ElementCount &VF) const;
		sdesmalenUnsubmitted Done Reply Inline Actions nit: `getLaneAsRuntimeExpr` ? sdesmalen: nit: `getLaneAsRuntimeExpr` ?

		/// Returns the Kind of lane offset.
		Kind getKind() const { return LaneKind; }

		/// Sets the lane offset and lane kind.
		void set(unsigned V, Kind K) {
		Val = V;
		LaneKind = K;
		}

		/// Returns true if this is the first lane of the whole vector.
		bool isFirst() const { return Val == 0 && LaneKind == Kind::First; }

		/// Maps the lane to a cache index based on \p VF.
		unsigned mapToCacheIndex(const ElementCount &VF) const {
		switch (LaneKind) {
		case VPLane::Kind::ScalableLast:
		assert(VF.isScalable() && Val < VF.getKnownMinValue());
		return VF.getKnownMinValue() + Val;
		default:
		assert(Val < VF.getKnownMinValue());
		return Val;
		}
		}

		/// Returns the maxmimum number of lanes that we are able to consider
		/// caching for \p VF.
		static unsigned getNumCachedLanes(const ElementCount &VF) {
		return VF.getKnownMinValue() * (VF.isScalable() ? 2 : 1);
		}
		};

/// VPIteration represents a single point in the iteration space of the output		/// VPIteration represents a single point in the iteration space of the output
/// (vectorized and/or unrolled) IR loop.		/// (vectorized and/or unrolled) IR loop.
struct VPIteration {		struct VPIteration {
/// in [0..UF)		/// in [0..UF)
unsigned Part;		unsigned Part;

/// in [0..VF)		VPLane Lane;
unsigned Lane;

VPIteration(unsigned Part, unsigned Lane) : Part(Part), Lane(Lane) {}		VPIteration(unsigned Part, unsigned Lane,
		VPLane::Kind Kind = VPLane::Kind::First)
		: Part(Part), Lane(Lane, Kind) {}
		sdesmalenUnsubmitted Done Reply Inline Actions For scalable vectors, it probably only ever makes sense to capture any of the following lanes: The first N lanes from <vscale x N x <eltty>> The last N lanes from <vscale x N x <eltty>> I'm not sure if the loop-vectorizer would currently ever need more than just the first/last lane, but I could imagine for interleaving it may want to extract the second/third/fourth-last value from the vector. Perhaps you can represent this with: unsigned LaneIdx; enum { LK_Fixed, LK_ScalableFirst, LK_ScalableLast, } LaneKind; ? sdesmalen: For scalable vectors, it probably only ever makes sense to capture any of the following lanes…
		david-armAuthorUnsubmitted Done Reply Inline Actions I'm happy with the idea of adding an extra member to VPInstance that contains an enum and is probably nicer than what I have now! I'm not sure if we need a LK_ScalableFirst though as this is always known at compile time to be 0 I think - perhaps we just need a LK_First and a LK_ScalableLast? Also, are suggesting that the enum describes how to use Lane, i.e. if (StartFromFirst) Index = Lane else if (StartFromLast) Index = NumElts - 1 - Lane? or with LK_ScalableLast do you literally mean the last lane of the vector? david-arm: I'm happy with the idea of adding an extra member to VPInstance that contains an enum and is…
		sdesmalenUnsubmitted Done Reply Inline Actions I indeed meant that LaneKind indexes the first or last 'chunk' in the vector, e.g. `v0, v1` for the first, and `vN-2, vN-1` for the last in: <vscale x 2 x i32> <=> <elt0, elt1 \| elt2, elt3 \| ... \| eltN-2, eltN-1> sdesmalen: I indeed meant that LaneKind indexes the first or last 'chunk' in the vector, e.g. `v0, v1` for…
		sdesmalenUnsubmitted Done Reply Inline Actions nit: `const VPLane &` sdesmalen: nit: `const VPLane &`

bool isFirstIteration() const { return Part == 0 && Lane == 0; }		bool isFirstLane() const { return Lane.isFirst(); }
		bool isFirstIteration() const { return Part == 0 && isFirstLane(); }
		sdesmalenUnsubmitted Done Reply Inline Actions nit: is `isFirstIteration` unused now? sdesmalen: nit: is `isFirstIteration` unused now?
		david-armAuthorUnsubmitted Done Reply Inline Actions It's still used in two places so I've left this in. david-arm: It's still used in two places so I've left this in.
};		};

		sdesmalenUnsubmitted Done Reply Inline Actions How about: /// LaneKind describes how to interpret Lane. /// For LK_First, Lane is the index into the first N elements of a fixed-vector <N x <etltty>> or a scalable vector <vscale x N x <eltty>>. /// For LK_ScalableLast, Lane is the index into the last N elements of a scalable vector <vscale x N x <eltty>> sdesmalen: How about: /// LaneKind describes how to interpret Lane. /// For LK_First, Lane is the…
		david-armAuthorUnsubmitted Done Reply Inline Actions Sure, although I think the way I've defined LK_ScalableLast is not quite the same as you described above. The calculation of lane in my patch is: LaneFromStartOfVec = LastLaneOfVec - Lane. david-arm: Sure, although I think the way I've defined LK_ScalableLast is not quite the same as you…
/// This is a helper struct for maintaining vectorization state. It's used for		/// This is a helper struct for maintaining vectorization state. It's used for
/// mapping values from the original loop to their corresponding values in		/// mapping values from the original loop to their corresponding values in
		fhahnUnsubmitted Done Reply Inline Actions are you planning on adding more kinds here? otherwise this can just be a boolean? Or make this an `enum class`? fhahn: are you planning on adding more kinds here? otherwise this can just be a boolean? Or make this…
		david-armAuthorUnsubmitted Done Reply Inline Actions I don't have plans to add other kinds here at the moment - I chose an enum here to make it extensible should people wish to add other kinds in future and based on earlier reviewer comments. I'm happy to change it to `enum class` or use a boolean - @sdesmalen don't know if you have a preference here? david-arm: I don't have plans to add other kinds here at the moment - I chose an enum here to make it…
/// the new loop. Two mappings are maintained: one for vectorized values and		/// the new loop. Two mappings are maintained: one for vectorized values and
/// one for scalarized values. Vectorized values are represented with UF		/// one for scalarized values. Vectorized values are represented with UF
/// vector values in the new loop, and scalarized values are represented with		/// vector values in the new loop, and scalarized values are represented with
/// UF x VF scalar values in the new loop. UF and VF are the unroll and		/// UF x VF scalar values in the new loop. UF and VF are the unroll and
/// vectorization factors, respectively.		/// vectorization factors, respectively.
///		///
/// Entries can be added to either map with setVectorValue and setScalarValue,		/// Entries can be added to either map with setVectorValue and setScalarValue,
		fhahnUnsubmitted Done Reply Inline Actions Can this have a better name, e.g. in line with the enum value or the Boolean variable name, if you change it?. fhahn: Can this have a better name, e.g. in line with the enum value or the Boolean variable name, if…
		david-armAuthorUnsubmitted Done Reply Inline Actions Hi @fhahn It's hard to come up with a much better name - how about "isKnownLane"? This terminology is used throughout the codebase to mean something that is known at compile time, e.g. ElementCount::getKnownMinValue() david-arm: Hi @fhahn It's hard to come up with a much better name - how about "isKnownLane"? This…
/// which assert that an entry was not already added before. If an entry is to		/// which assert that an entry was not already added before. If an entry is to
/// replace an existing one, call resetVectorValue and resetScalarValue. This is		/// replace an existing one, call resetVectorValue and resetScalarValue. This is
/// currently needed to modify the mapped values during "fix-up" operations that		/// currently needed to modify the mapped values during "fix-up" operations that
/// occur once the first phase of widening is complete. These operations include		/// occur once the first phase of widening is complete. These operations include
/// type truncation and the second phase of recurrence widening.		/// type truncation and the second phase of recurrence widening.
///		///
/// Entries from either map can be retrieved using the getVectorValue and		/// Entries from either map can be retrieved using the getVectorValue and
/// getScalarValue functions, which assert that the desired value exists.		/// getScalarValue functions, which assert that the desired value exists.
Show All 35 Lines	public:
}		}

/// \return True if the map has any scalar entry for \p Key.		/// \return True if the map has any scalar entry for \p Key.
bool hasAnyScalarValue(Value *Key) const {		bool hasAnyScalarValue(Value *Key) const {
return ScalarMapStorage.count(Key);		return ScalarMapStorage.count(Key);
}		}

/// \return True if the map has a scalar entry for \p Key and \p Instance.		/// \return True if the map has a scalar entry for \p Key and \p Instance.
bool hasScalarValue(Value *Key, const VPIteration &Instance) const {		bool hasScalarValue(Value *Key, const VPIteration &Instance) const {
		sdesmalenUnsubmitted Done Reply Inline Actions This only needs to be `2 * VF.getKnownMinValue()` if VF is scalable. sdesmalen: This only needs to be `2 * VF.getKnownMinValue()` if VF is scalable.
assert(Instance.Part < UF && "Queried Scalar Part is too large.");		assert(Instance.Part < UF && "Queried Scalar Part is too large.");
assert(Instance.Lane < VF.getKnownMinValue() &&
"Queried Scalar Lane is too large.");

if (!hasAnyScalarValue(Key))		if (!hasAnyScalarValue(Key))
return false;		return false;
const ScalarParts &Entry = ScalarMapStorage.find(Key)->second;		const ScalarParts &Entry = ScalarMapStorage.find(Key)->second;
assert(Entry.size() == UF && "ScalarParts has wrong dimensions.");		assert(Entry.size() == UF && "ScalarParts has wrong dimensions.");
assert(Entry[Instance.Part].size() == VF.getKnownMinValue() &&		assert(Entry[Instance.Part].size() == VPLane::getNumCachedLanes(VF) &&
		sdesmalenUnsubmitted Done Reply Inline Actions use LK_First and drop the default, so that you get a compile-time warning if a new kind is added to the enum. sdesmalen: use LK_First and drop the default, so that you get a compile-time warning if a new kind is…
		david-armAuthorUnsubmitted Done Reply Inline Actions Sadly this way I also get a compile-time warning about ending the function without returning a value, which means I still have to add a default here. david-arm: Sadly this way I also get a compile-time warning about ending the function without returning a…
"ScalarParts has wrong dimensions.");		"ScalarParts has wrong dimensions.");
return Entry[Instance.Part][Instance.Lane] != nullptr;		unsigned CacheIdx = Instance.Lane.mapToCacheIndex(VF);
		return Entry[Instance.Part][CacheIdx] != nullptr;
		sdesmalenUnsubmitted Done Reply Inline Actions Can you make this a method to VPIteration, e.g. `getIndex()` sdesmalen: Can you make this a method to VPIteration, e.g. `getIndex()`
		david-armAuthorUnsubmitted Done Reply Inline Actions Sure. I did originally think about that, but then I wondered if that only really makes sense in the context of a cache? For example, if I move this to VPIteration then I think I should also move getNumCachedLanes() there for consistency otherwise it's a bit odd having the cache size defined in one class and the mapping to a cache index in another. david-arm: Sure. I did originally think about that, but then I wondered if that only really makes sense in…
		fhahnUnsubmitted Done Reply Inline Actions I think both should be moved to `VPIteration`, as we need to support both `VPIteration` versions there as well. fhahn: I think both should be moved to `VPIteration`, as we need to support both `VPIteration`…
}		}

/// Retrieve the existing vector value that corresponds to \p Key and		/// Retrieve the existing vector value that corresponds to \p Key and
/// \p Part.		/// \p Part.
Value getVectorValue(Value Key, unsigned Part) {		Value getVectorValue(Value Key, unsigned Part) {
assert(hasVectorValue(Key, Part) && "Getting non-existent value.");		assert(hasVectorValue(Key, Part) && "Getting non-existent value.");
return VectorMapStorage[Key][Part];		return VectorMapStorage[Key][Part];
}		}

/// Retrieve the existing scalar value that corresponds to \p Key and		/// Retrieve the existing scalar value that corresponds to \p Key and
/// \p Instance.		/// \p Instance.
Value getScalarValue(Value Key, const VPIteration &Instance) {		Value getScalarValue(Value Key, const VPIteration &Instance) {
assert(hasScalarValue(Key, Instance) && "Getting non-existent value.");		assert(hasScalarValue(Key, Instance) && "Getting non-existent value.");
return ScalarMapStorage[Key][Instance.Part][Instance.Lane];		unsigned CacheIdx = Instance.Lane.mapToCacheIndex(VF);
		return ScalarMapStorage[Key][Instance.Part][CacheIdx];
}		}

/// Set a vector value associated with \p Key and \p Part. Assumes such a		/// Set a vector value associated with \p Key and \p Part. Assumes such a
/// value is not already set. If it is, use resetVectorValue() instead.		/// value is not already set. If it is, use resetVectorValue() instead.
void setVectorValue(Value Key, unsigned Part, Value Vector) {		void setVectorValue(Value Key, unsigned Part, Value Vector) {
assert(!hasVectorValue(Key, Part) && "Vector value already set for part");		assert(!hasVectorValue(Key, Part) && "Vector value already set for part");
if (!VectorMapStorage.count(Key)) {		if (!VectorMapStorage.count(Key)) {
VectorParts Entry(UF);		VectorParts Entry(UF);
VectorMapStorage[Key] = Entry;		VectorMapStorage[Key] = Entry;
}		}
VectorMapStorage[Key][Part] = Vector;		VectorMapStorage[Key][Part] = Vector;
}		}

/// Set a scalar value associated with \p Key and \p Instance. Assumes such a		/// Set a scalar value associated with \p Key and \p Instance. Assumes such a
/// value is not already set.		/// value is not already set.
void setScalarValue(Value Key, const VPIteration &Instance, Value Scalar) {		void setScalarValue(Value Key, const VPIteration &Instance, Value Scalar) {
assert(!hasScalarValue(Key, Instance) && "Scalar value already set");		assert(!hasScalarValue(Key, Instance) && "Scalar value already set");
if (!ScalarMapStorage.count(Key)) {		if (!ScalarMapStorage.count(Key)) {
ScalarParts Entry(UF);		ScalarParts Entry(UF);
// TODO: Consider storing uniform values only per-part, as they occupy		// TODO: Consider storing uniform values only per-part, as they occupy
// lane 0 only, keeping the other VF-1 redundant entries null.		// lane 0 only, keeping the other VF-1 redundant entries null.
for (unsigned Part = 0; Part < UF; ++Part)		for (unsigned Part = 0; Part < UF; ++Part)
Entry[Part].resize(VF.getKnownMinValue(), nullptr);		Entry[Part].resize(VPLane::getNumCachedLanes(VF), nullptr);
ScalarMapStorage[Key] = Entry;		ScalarMapStorage[Key] = Entry;
}		}
ScalarMapStorage[Key][Instance.Part][Instance.Lane] = Scalar;		unsigned CacheIdx = Instance.Lane.mapToCacheIndex(VF);
		ScalarMapStorage[Key][Instance.Part][CacheIdx] = Scalar;
}		}

/// Reset the vector value associated with \p Key for the given \p Part.		/// Reset the vector value associated with \p Key for the given \p Part.
/// This function can be used to update values that have already been		/// This function can be used to update values that have already been
/// vectorized. This is the case for "fix-up" operations including type		/// vectorized. This is the case for "fix-up" operations including type
/// truncation and the second phase of recurrence vectorization.		/// truncation and the second phase of recurrence vectorization.
void resetVectorValue(Value Key, unsigned Part, Value Vector) {		void resetVectorValue(Value Key, unsigned Part, Value Vector) {
assert(hasVectorValue(Key, Part) && "Vector value not set for part");		assert(hasVectorValue(Key, Part) && "Vector value not set for part");
VectorMapStorage[Key][Part] = Vector;		VectorMapStorage[Key][Part] = Vector;
}		}

/// Reset the scalar value associated with \p Key for \p Part and \p Lane.		/// Reset the scalar value associated with \p Key for \p Part and \p Lane.
/// This function can be used to update values that have already been		/// This function can be used to update values that have already been
/// scalarized. This is the case for "fix-up" operations including scalar phi		/// scalarized. This is the case for "fix-up" operations including scalar phi
/// nodes for scalarized and predicated instructions.		/// nodes for scalarized and predicated instructions.
void resetScalarValue(Value *Key, const VPIteration &Instance,		void resetScalarValue(Value *Key, const VPIteration &Instance,
Value *Scalar) {		Value *Scalar) {
assert(hasScalarValue(Key, Instance) &&		assert(hasScalarValue(Key, Instance) &&
"Scalar value not set for part and lane");		"Scalar value not set for part and lane");
ScalarMapStorage[Key][Instance.Part][Instance.Lane] = Scalar;		unsigned CacheIdx = Instance.Lane.mapToCacheIndex(VF);
		ScalarMapStorage[Key][Instance.Part][CacheIdx] = Scalar;
}		}
};		};

/// This class is used to enable the VPlan to invoke a method of ILV. This is		/// This class is used to enable the VPlan to invoke a method of ILV. This is
/// needed until the method is refactored out of ILV and becomes reusable.		/// needed until the method is refactored out of ILV and becomes reusable.
struct VPCallback {		struct VPCallback {
virtual ~VPCallback() {}		virtual ~VPCallback() {}
virtual Value getOrCreateVectorValues(Value V, unsigned Part) = 0;		virtual Value getOrCreateVectorValues(Value V, unsigned Part) = 0;
▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	return I != Data.PerPartOutput.end() && Part < I->second.size() &&
I->second[Part];		I->second[Part];
}		}

bool hasScalarValue(VPValue *Def, VPIteration Instance) {		bool hasScalarValue(VPValue *Def, VPIteration Instance) {
auto I = Data.PerPartScalars.find(Def);		auto I = Data.PerPartScalars.find(Def);
if (I == Data.PerPartScalars.end())		if (I == Data.PerPartScalars.end())
return false;		return false;
return Instance.Part < I->second.size() &&		return Instance.Part < I->second.size() &&
Instance.Lane < I->second[Instance.Part].size() &&		Instance.Lane.getKnownValue() < I->second[Instance.Part].size() &&
I->second[Instance.Part][Instance.Lane];		I->second[Instance.Part][Instance.Lane.getKnownValue()];
}		}

/// Set the generated Value for a given VPValue and a given Part.		/// Set the generated Value for a given VPValue and a given Part.
void set(VPValue Def, Value V, unsigned Part) {		void set(VPValue Def, Value V, unsigned Part) {
if (!Data.PerPartOutput.count(Def)) {		if (!Data.PerPartOutput.count(Def)) {
DataState::PerPartValuesTy Entry(UF);		DataState::PerPartValuesTy Entry(UF);
Data.PerPartOutput[Def] = Entry;		Data.PerPartOutput[Def] = Entry;
}		}
Data.PerPartOutput[Def][Part] = V;		Data.PerPartOutput[Def][Part] = V;
}		}
void set(VPValue Def, Value IRDef, Value *V, unsigned Part);		void set(VPValue Def, Value IRDef, Value *V, unsigned Part);
void reset(VPValue Def, Value IRDef, Value *V, unsigned Part);		void reset(VPValue Def, Value IRDef, Value *V, unsigned Part);
void set(VPValue Def, Value IRDef, Value *V, const VPIteration &Instance);		void set(VPValue Def, Value IRDef, Value *V, const VPIteration &Instance);

void set(VPValue Def, Value V, const VPIteration &Instance) {		void set(VPValue Def, Value V, const VPIteration &Instance) {
auto Iter = Data.PerPartScalars.insert({Def, {}});		auto Iter = Data.PerPartScalars.insert({Def, {}});
auto &PerPartVec = Iter.first->second;		auto &PerPartVec = Iter.first->second;
while (PerPartVec.size() <= Instance.Part)		while (PerPartVec.size() <= Instance.Part)
PerPartVec.emplace_back();		PerPartVec.emplace_back();
auto &Scalars = PerPartVec[Instance.Part];		auto &Scalars = PerPartVec[Instance.Part];
while (Scalars.size() <= Instance.Lane)		while (Scalars.size() <= Instance.Lane.getKnownValue())
Scalars.push_back(nullptr);		Scalars.push_back(nullptr);
Scalars[Instance.Lane] = V;		Scalars[Instance.Lane.getKnownValue()] = V;
}		}

/// Hold state information used when constructing the CFG of the output IR,		/// Hold state information used when constructing the CFG of the output IR,
/// traversing the VPBasicBlocks and generating corresponding IR BasicBlocks.		/// traversing the VPBasicBlocks and generating corresponding IR BasicBlocks.
struct CFGState {		struct CFGState {
/// The previous VPBasicBlock visited. Initially set to null.		/// The previous VPBasicBlock visited. Initially set to null.
VPBasicBlock *PrevVPBB = nullptr;		VPBasicBlock *PrevVPBB = nullptr;

▲ Show 20 Lines • Show All 1,840 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/VPlan.cpp

Show First 20 Lines • Show All 41 Lines • ▼ Show 20 Lines
#include "llvm/Transforms/Utils/BasicBlockUtils.h"		#include "llvm/Transforms/Utils/BasicBlockUtils.h"
#include <cassert>		#include <cassert>
#include <iterator>		#include <iterator>
#include <string>		#include <string>
#include <vector>		#include <vector>

using namespace llvm;		using namespace llvm;
extern cl::opt<bool> EnableVPlanNativePath;		extern cl::opt<bool> EnableVPlanNativePath;
		extern Value getRuntimeVF(IRBuilder<> &B, Type Ty, ElementCount VF);

#define DEBUG_TYPE "vplan"		#define DEBUG_TYPE "vplan"

raw_ostream &llvm::operator<<(raw_ostream &OS, const VPValue &V) {		raw_ostream &llvm::operator<<(raw_ostream &OS, const VPValue &V) {
const VPInstruction *Instr = dyn_cast<VPInstruction>(&V);		const VPInstruction *Instr = dyn_cast<VPInstruction>(&V);
VPSlotTracker SlotTracker(		VPSlotTracker SlotTracker(
(Instr && Instr->getParent()) ? Instr->getParent()->getPlan() : nullptr);		(Instr && Instr->getParent()) ? Instr->getParent()->getPlan() : nullptr);
V.print(OS, SlotTracker);		V.print(OS, SlotTracker);
return OS;		return OS;
}		}

		Value *VPLane::getExpr(IRBuilder<> &Builder, const ElementCount &VF) const {
		Value *Lane;
		switch (LaneKind) {
		case VPLane::Kind::ScalableLast:
		// Lane = RuntimeVF - VF.getKnownMinValue() + Val
		Lane = Builder.CreateSub(getRuntimeVF(Builder, Builder.getInt32Ty(), VF),
		Builder.getInt32(VF.getKnownMinValue() - Val));
		break;
		default:
		sdesmalenUnsubmitted Done Reply Inline Actions don't use default. Cover both cases explicitly, so that if another enum value is added, the compiler will emit a diagnostic this case is not covered. sdesmalen: don't use default. Cover both cases explicitly, so that if another enum value is added, the…
		david-armAuthorUnsubmitted Done Reply Inline Actions OK I can do that - it just might mean adding an initialiser to Lane at the start of the function. I can't return directly from a case statement without a default as the compiler warns about functions returning void otherwise. david-arm: OK I can do that - it just might mean adding an initialiser to Lane at the start of the…
		fhahnUnsubmitted Done Reply Inline Actions can you just put `llvm_unreachable` after the switch to get rid of the warning? fhahn: can you just put `llvm_unreachable` after the switch to get rid of the warning?
		david-armAuthorUnsubmitted Done Reply Inline Actions Yeah I can do that if that's the preferred way? david-arm: Yeah I can do that if that's the preferred way?
		assert(LaneKind == VPLane::Kind::First);
		Lane = Builder.getInt32(Val);
		break;
		}
		return Lane;
		}

VPValue::VPValue(const unsigned char SC, Value UV, VPDef Def)		VPValue::VPValue(const unsigned char SC, Value UV, VPDef Def)
: SubclassID(SC), UnderlyingVal(UV), Def(Def) {		: SubclassID(SC), UnderlyingVal(UV), Def(Def) {
if (Def)		if (Def)
Def->addDefinedValue(this);		Def->addDefinedValue(this);
}		}

VPValue::~VPValue() {		VPValue::~VPValue() {
assert(Users.empty() && "trying to delete a VPValue with remaining users");		assert(Users.empty() && "trying to delete a VPValue with remaining users");
▲ Show 20 Lines • Show All 147 Lines • ▼ Show 20 Lines	VPBasicBlock::iterator VPBasicBlock::getFirstNonPhi() {
return It;		return It;
}		}

Value VPTransformState::get(VPValue Def, const VPIteration &Instance) {		Value VPTransformState::get(VPValue Def, const VPIteration &Instance) {
if (!Def->getDef())		if (!Def->getDef())
return Def->getLiveInIRValue();		return Def->getLiveInIRValue();

if (hasScalarValue(Def, Instance))		if (hasScalarValue(Def, Instance))
return Data.PerPartScalars[Def][Instance.Part][Instance.Lane];		return Data
		fhahnUnsubmitted Done Reply Inline Actions can you add support here as well? The callback is going away soon, so we also need to support the non-callback version. fhahn: can you add support here as well? The callback is going away soon, so we also need to support…
		david-armAuthorUnsubmitted Done Reply Inline Actions OK, the reason I didn't add this originally is because I cannot test this code path in my patch. I thought it might be bad practice to add code without testing it, but I'm happy to add support here if you want. I guess we do have a test for it, so when the callback is removed if there is a bug it will break the test. david-arm: OK, the reason I didn't add this originally is because I cannot test this code path in my patch.
		.PerPartScalars[Def][Instance.Part][Instance.Lane.getKnownValue()];
		sdesmalenUnsubmitted Done Reply Inline Actions should this use `mapToCacheIndex` ? sdesmalen: should this use `mapToCacheIndex` ?

if (hasVectorValue(Def, Instance.Part)) {		if (hasVectorValue(Def, Instance.Part)) {
assert(Data.PerPartOutput.count(Def));		assert(Data.PerPartOutput.count(Def));
auto *VecPart = Data.PerPartOutput[Def][Instance.Part];		auto *VecPart = Data.PerPartOutput[Def][Instance.Part];
if (!VecPart->getType()->isVectorTy()) {		if (!VecPart->getType()->isVectorTy()) {
assert(Instance.Lane == 0 && "cannot get lane > 0 for scalar");		assert(Instance.isFirstLane() && "cannot get lane > 0 for scalar");
return VecPart;		return VecPart;
}		}

		Value *Lane = Instance.Lane.getExpr(Builder, VF);
// TODO: Cache created scalar values.		// TODO: Cache created scalar values.
return Builder.CreateExtractElement(VecPart,		return Builder.CreateExtractElement(VecPart, Lane);
Builder.getInt32(Instance.Lane));
}		}
return Callback.getOrCreateScalarValue(VPValue2Value[Def], Instance);		return Callback.getOrCreateScalarValue(VPValue2Value[Def], Instance);
}		}

BasicBlock *		BasicBlock *
VPBasicBlock::createEmptyBasicBlock(VPTransformState::CFGState &CFG) {		VPBasicBlock::createEmptyBasicBlock(VPTransformState::CFGState &CFG) {
// BB stands for IR BasicBlocks. VPBB stands for VPlan VPBasicBlocks.		// BB stands for IR BasicBlocks. VPBB stands for VPlan VPBasicBlocks.
// Pred stands for Predessor. Prev stands for Previous - last visited/created.		// Pred stands for Predessor. Prev stands for Previous - last visited/created.
▲ Show 20 Lines • Show All 158 Lines • ▼ Show 20 Lines	void VPRegionBlock::execute(VPTransformState *State) {
// Enter replicating mode.		// Enter replicating mode.
State->Instance = VPIteration(0, 0);		State->Instance = VPIteration(0, 0);

for (unsigned Part = 0, UF = State->UF; Part < UF; ++Part) {		for (unsigned Part = 0, UF = State->UF; Part < UF; ++Part) {
State->Instance->Part = Part;		State->Instance->Part = Part;
assert(!State->VF.isScalable() && "VF is assumed to be non scalable.");		assert(!State->VF.isScalable() && "VF is assumed to be non scalable.");
for (unsigned Lane = 0, VF = State->VF.getKnownMinValue(); Lane < VF;		for (unsigned Lane = 0, VF = State->VF.getKnownMinValue(); Lane < VF;
++Lane) {		++Lane) {
State->Instance->Lane = Lane;		State->Instance->Lane.set(Lane, VPLane::Kind::First);
		sdesmalenUnsubmitted Done Reply Inline Actions nit: it would be nice if you can write `State->Instance->Lane = VPLane(Lane, VPLane::Kind::First);` (e.g. by providing `VPLane(const VPlane &Other)` in favour of having a `set` method). sdesmalen: nit: it would be nice if you can write `State->Instance->Lane = VPLane(Lane, VPLane::Kind…
// Visit the VPBlocks connected to \p this, starting from it.		// Visit the VPBlocks connected to \p this, starting from it.
for (VPBlockBase *Block : RPOT) {		for (VPBlockBase *Block : RPOT) {
LLVM_DEBUG(dbgs() << "LV: VPBlock in RPO " << Block->getName() << '\n');		LLVM_DEBUG(dbgs() << "LV: VPBlock in RPO " << Block->getName() << '\n');
Block->execute(State);		Block->execute(State);
}		}
}		}
}		}

▲ Show 20 Lines • Show All 723 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/AArch64/sve-extract-last-veclane.ll

This file was added.

				; RUN: opt -loop-vectorize -dce -instcombine -mtriple aarch64-linux-gnu -mattr=+sve -S < %s 2>%t \| FileCheck %s

				sdesmalenUnsubmitted Done Reply Inline Actions nit: passing the attribute in the command is redundant if it's already set by the IR attributes. sdesmalen: nit: passing the attribute in the command is redundant if it's already set by the IR attributes.
				; RUN: FileCheck --check-prefix=WARN --allow-empty %s <%t

				; If this check fails please read test/CodeGen/AArch64/README for instructions on how to resolve it.
				; WARN-NOT: warning

				target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
				target triple = "aarch64-unknown-linux-gnu"

				define void @inv_store_last_lane(i32* noalias nocapture %a, i32* noalias nocapture %inv, i32* noalias nocapture readonly %b, i64 %n) #0 {
				; CHECK-LABEL: @inv_store_last_lane
				; CHECK: vector.body:
				; CHECK: store <vscale x 4 x i32> %[[VEC_VAL:.*]], <
				; CHECK: middle.block:
				; CHECK: %[[VSCALE:.*]] = call i32 @llvm.vscale.i32()
				; CHECK-NEXT: %[[VSCALE2:.*]] = shl i32 %[[VSCALE]], 2
				; CHECK-NEXT: %[[LAST_LANE:.*]] = add i32 %[[VSCALE2]], -1
				; CHECK-NEXT: %{{.*}} = extractelement <vscale x 4 x i32> %[[VEC_VAL]], i32 %[[LAST_LANE]]

				entry:
				br label %for.body

				fhahnUnsubmitted Done Reply Inline Actions not needed? fhahn: not needed?
				for.body: ; preds = %for.body.lr.ph, %for.body
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%arrayidx = getelementptr inbounds i32, i32* %b, i64 %indvars.iv
				fhahnUnsubmitted Done Reply Inline Actions nit: the names of the blocks could be improved. fhahn: nit: the names of the blocks could be improved.
				david-armAuthorUnsubmitted Done Reply Inline Actions OK, to be honest I don't really know what they should be called. :) This is the name that LLVM generates. How about `for.cond.pre-cleanup`? david-arm: OK, to be honest I don't really know what they should be called. :) This is the name that LLVM…
				fhahnUnsubmitted Done Reply Inline Actions how about something like just `exit`. It can also directly return; you don't need the `%mul.lcssa` phi I think, LV will insert them if needed. fhahn: how about something like just `exit`. It can also directly return; you don't need the `%mul.
				%0 = load i32, i32* %arrayidx, align 4
				%mul = shl nsw i32 %0, 1
				%arrayidx2 = getelementptr inbounds i32, i32* %a, i64 %indvars.iv
				store i32 %mul, i32* %arrayidx2, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond.not = icmp eq i64 %indvars.iv.next, %n
				br i1 %exitcond.not, label %exit, label %for.body, !llvm.loop !0

				exit: ; preds = %for.body
				%arrayidx5 = getelementptr inbounds i32, i32* %inv, i64 42
				store i32 %mul, i32* %arrayidx5, align 4
				ret void
				}

				define float @ret_last_lane(float* noalias nocapture %a, float* noalias nocapture readonly %b, i64 %n) #0 {
				; CHECK-LABEL: @ret_last_lane
				; CHECK: vector.body:
				; CHECK: store <vscale x 4 x float> %[[VEC_VAL:.*]], <
				; CHECK: middle.block:
				; CHECK: %[[VSCALE:.*]] = call i32 @llvm.vscale.i32()
				; CHECK-NEXT: %[[VSCALE2:.*]] = shl i32 %[[VSCALE]], 2
				; CHECK-NEXT: %[[LAST_LANE:.*]] = add i32 %[[VSCALE2]], -1
				; CHECK-NEXT: %{{.*}} = extractelement <vscale x 4 x float> %[[VEC_VAL]], i32 %[[LAST_LANE]]

				entry:
				br label %for.body

				for.body: ; preds = %for.body.preheader, %for.body
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%arrayidx = getelementptr inbounds float, float* %b, i64 %indvars.iv
				%0 = load float, float* %arrayidx, align 4
				%mul = fmul float %0, 2.000000e+00
				%arrayidx2 = getelementptr inbounds float, float* %a, i64 %indvars.iv
				store float %mul, float* %arrayidx2, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond.not = icmp eq i64 %indvars.iv.next, %n
				br i1 %exitcond.not, label %exit, label %for.body, !llvm.loop !6

				exit: ; preds = %for.body, %entry
				ret float %mul
				}

				attributes #0 = { "target-cpu"="generic" "target-features"="+neon,+sve" }

				!0 = distinct !{!0, !1, !2, !3, !4, !5}
				!1 = !{!"llvm.loop.mustprogress"}
				!2 = !{!"llvm.loop.vectorize.width", i32 4}
				!3 = !{!"llvm.loop.vectorize.scalable.enable", i1 true}
				!4 = !{!"llvm.loop.interleave.count", i32 1}
				!5 = !{!"llvm.loop.vectorize.enable", i1 true}
				!6 = distinct !{!6, !1, !2, !3, !4, !5}

llvm/test/Transforms/LoopVectorize/extract-last-veclane.ll

This file was added.

				; RUN: opt -loop-vectorize -dce -instcombine -S < %s 2>%t \| FileCheck %s

				define void @inv_store_last_lane(i32* noalias nocapture %a, i32* noalias nocapture %inv, i32* noalias nocapture readonly %b, i64 %n) {
				; CHECK-LABEL: @inv_store_last_lane
				; CHECK: vector.body:
				; CHECK: store <4 x i32> %[[VEC_VAL:.*]], <
				; CHECK: middle.block:
				; CHECK: %{{.*}} = extractelement <4 x i32> %[[VEC_VAL]], i32 3

				entry:
				br label %for.body

				for.body: ; preds = %entry, %for.body
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%arrayidx = getelementptr inbounds i32, i32* %b, i64 %indvars.iv
				%0 = load i32, i32* %arrayidx, align 4
				%mul = shl nsw i32 %0, 1
				%arrayidx2 = getelementptr inbounds i32, i32* %a, i64 %indvars.iv
				store i32 %mul, i32* %arrayidx2, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond.not = icmp eq i64 %indvars.iv.next, %n
				br i1 %exitcond.not, label %exit, label %for.body, !llvm.loop !0

				exit: ; preds = %for.body
				%arrayidx5 = getelementptr inbounds i32, i32* %inv, i64 42
				store i32 %mul, i32* %arrayidx5, align 4
				ret void
				}

				define float @ret_last_lane(float* noalias nocapture %a, float* noalias nocapture readonly %b, i64 %n) {
				; CHECK-LABEL: @ret_last_lane
				; CHECK: vector.body:
				; CHECK: store <4 x float> %[[VEC_VAL:.*]], <
				; CHECK: middle.block:
				; CHECK: %{{.*}} = extractelement <4 x float> %[[VEC_VAL]], i32 3

				entry:
				br label %for.body

				for.body: ; preds = %for.body.preheader, %for.body
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%arrayidx = getelementptr inbounds float, float* %b, i64 %indvars.iv
				%0 = load float, float* %arrayidx, align 4
				%mul = fmul float %0, 2.000000e+00
				%arrayidx2 = getelementptr inbounds float, float* %a, i64 %indvars.iv
				store float %mul, float* %arrayidx2, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond.not = icmp eq i64 %indvars.iv.next, %n
				br i1 %exitcond.not, label %exit, label %for.body, !llvm.loop !6

				exit: ; preds = %for.body, %entry
				ret float %mul
				}

				!0 = distinct !{!0, !1, !2, !3, !4, !5}
				!1 = !{!"llvm.loop.mustprogress"}
				fhahnUnsubmitted Done Reply Inline Actions Is there reason to use the metadata here? Can we instead just use the `-force-vector-width` option, which should be a bit simpler and the expected VF is clear from the run line? fhahn: Is there reason to use the metadata here? Can we instead just use the `-force-vector-width`…
				!2 = !{!"llvm.loop.vectorize.width", i32 4}
				!3 = !{!"llvm.loop.vectorize.scalable.enable", i1 false}
				!4 = !{!"llvm.loop.interleave.count", i32 1}
				!5 = !{!"llvm.loop.vectorize.enable", i1 true}
				!6 = distinct !{!6, !1, !2, !3, !4, !5}

This is an archive of the discontinued LLVM Phabricator instance.

[SVE][LoopVectorize] Add support for extracting the last lane of a scalable vectorClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 323781

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

llvm/lib/Transforms/Vectorize/VPlan.h

llvm/lib/Transforms/Vectorize/VPlan.cpp

llvm/test/Transforms/LoopVectorize/AArch64/sve-extract-last-veclane.ll

llvm/test/Transforms/LoopVectorize/extract-last-veclane.ll

[SVE][LoopVectorize] Add support for extracting the last lane of a scalable vector
ClosedPublic