Page MenuHomePhabricator

[SVE][LoopVectorize] Add support for extracting the last lane of a scalable vector
ClosedPublic

Authored by david-arm on Jan 21 2021, 7:44 AM.

Details

Summary

There are certain loops like this below:

for (int i = 0; i < n; i++) {
  a[i] = b[i] + 1;
  *inv = a[i];
}

that can only be vectorised if we are able to extract the last lane of the
vectorised form of 'a[i]'. For fixed width vectors this already works since
we know at compile time what the final lane is, however for scalable vectors
this is a different story. This patch adds support for extracting the last
lane from a scalable vector using a runtime determined lane value. I have
added support to VPIteration for non-constant lanes that still permits the
caching of values. Whilst doing this work I couldn't find any explicit tests
for extracting the last lane values of fixed width vectors so I added tests
for both scalable and fixed width vectors.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
david-arm added inline comments.Jan 26 2021, 8:20 AM
llvm/lib/Transforms/Vectorize/VPlan.h
184

I'm happy with the idea of adding an extra member to VPInstance that contains an enum and is probably nicer than what I have now! I'm not sure if we need a LK_ScalableFirst though as this is always known at compile time to be 0 I think - perhaps we just need a LK_First and a LK_ScalableLast? Also, are suggesting that the enum describes how to use Lane, i.e.

if (StartFromFirst)

Index = Lane

else if (StartFromLast)

Index = NumElts - 1 - Lane?

or with LK_ScalableLast do you literally mean the last lane of the vector?

david-arm updated this revision to Diff 319853.Jan 28 2021, 6:37 AM
  • Following @sdesmalen's suggestion I've added an enum to VPIteration that describes how we should interpret the 'Lane' member variable. This means we can support offsets from the last lane up to the known minimum number of elements.
sdesmalen added inline comments.Jan 28 2021, 7:30 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2557–2558

should this be implied by Instance.Lane > 0? (implemented with operator overload, Instance > 0)

2559–2565

nit:

switch (Instance.Kind) {
case VPIteration::LK_First:
  Lane = Builder.getInt32(Instance.Lane);
  break;
case VPIteration::LK_ScalableLast:
  Lane = Builder.CreateSub(...);
  break;
}

(without default). That will give a compile-time warning when a new kind is added.

4460

Should Lane not be set to VF.getKnownMinValue() - 1 for VF.isScalable() as well?

llvm/lib/Transforms/Vectorize/VPlan.h
180–185

Can you make this a method to VPIteration, e.g. getIndex()

182

This only needs to be 2 * VF.getKnownMinValue() if VF is scalable.

182–189

How about:

/// LaneKind describes how to interpret Lane.
/// For LK_First, Lane is the index into the first N elements of a fixed-vector <N x <etltty>> or a scalable vector <vscale x N x <eltty>>.
/// For LK_ScalableLast, Lane is the index into the last N elements of a scalable vector <vscale x N x  <eltty>>
184

I indeed meant that LaneKind indexes the first or last 'chunk' in the vector, e.g. v0, v1 for the first, and vN-2, vN-1 for the last in:

<vscale x 2 x i32> <=> <elt0, elt1 |  elt2, elt3 | ... | eltN-2, eltN-1>
189

use LK_First and drop the default, so that you get a compile-time warning if a new kind is added to the enum.

david-arm added inline comments.Jan 28 2021, 7:43 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
4460

Not in the way I've defined it, i.e. LastLane - Lane. This is a bit similar to how the offset is defined for Cullen's vector.splice intrinsic. A Lane of 0 is the same as LastLane - this does make the calculation of the runtime lane a bit easier.

llvm/lib/Transforms/Vectorize/VPlan.h
180–185

Sure. I did originally think about that, but then I wondered if that only really makes sense in the context of a cache? For example, if I move this to VPIteration then I think I should also move getNumCachedLanes() there for consistency otherwise it's a bit odd having the cache size defined in one class and the mapping to a cache index in another.

182–189

Sure, although I think the way I've defined LK_ScalableLast is not quite the same as you described above. The calculation of lane in my patch is:

LaneFromStartOfVec = LastLaneOfVec - Lane.

fhahn added inline comments.Jan 28 2021, 7:52 AM
llvm/lib/Transforms/Vectorize/VPlan.cpp
261

can you add support here as well? The callback is going away soon, so we also need to support the non-callback version.

llvm/lib/Transforms/Vectorize/VPlan.h
180–185

I think both should be moved to VPIteration, as we need to support both VPIteration versions there as well.

191

are you planning on adding more kinds here? otherwise this can just be a boolean? Or make this an enum class?

198

Can this have a better name, e.g. in line with the enum value or the Boolean variable name, if you change it?.

llvm/test/Transforms/LoopVectorize/AArch64/neon-extract-last-veclane.ll
3 ↗(On Diff #319853)

This test does not need to be neon specific, right? extracting the last lane for fixed vectors should be tested fairly well already, so not sure if the test is needed at all?

llvm/test/Transforms/LoopVectorize/AArch64/sve-extract-last-veclane.ll
23

not needed?

26

nit: the names of the blocks could be improved.

Hi @fhahn thanks for the review!

llvm/lib/Transforms/Vectorize/VPlan.cpp
261

OK, the reason I didn't add this originally is because I cannot test this code path in my patch. I thought it might be bad practice to add code without testing it, but I'm happy to add support here if you want. I guess we do have a test for it, so when the callback is removed if there is a bug it will break the test.

llvm/lib/Transforms/Vectorize/VPlan.h
191

I don't have plans to add other kinds here at the moment - I chose an enum here to make it extensible should people wish to add other kinds in future and based on earlier reviewer comments. I'm happy to change it to enum class or use a boolean - @sdesmalen don't know if you have a preference here?

llvm/test/Transforms/LoopVectorize/AArch64/neon-extract-last-veclane.ll
3 ↗(On Diff #319853)

Similar to what I mentioned in the commit message, I deliberately added a llvm_unreachable() in the code that I've changed and I found no explicit tests for this at all. There is some limited coverage, but accidentally so in cases where the test was actually trying to test something else. How about I move this to a generic place?

llvm/test/Transforms/LoopVectorize/AArch64/sve-extract-last-veclane.ll
26

OK, to be honest I don't really know what they should be called. :) This is the name that LLVM generates. How about for.cond.pre-cleanup?

fhahn added inline comments.Jan 28 2021, 8:14 AM
llvm/test/Transforms/LoopVectorize/AArch64/neon-extract-last-veclane.ll
3 ↗(On Diff #319853)

Sounds good to me

llvm/test/Transforms/LoopVectorize/AArch64/sve-extract-last-veclane.ll
26

how about something like just exit. It can also directly return; you don't need the %mul.lcssa phi I think, LV will insert them if needed.

david-arm updated this revision to Diff 320116.Jan 29 2021, 5:34 AM
  • Changed some comments in VPIteration.
  • Moved getNumCachedLanes() and mapLaneToCacheIndex() to VPIteration.
  • Changed enum LaneKind to enum class LaneKind, which required introducing constructors in a previous patch.
david-arm marked 16 inline comments as done.Jan 29 2021, 5:38 AM
david-arm added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2557–2558

I thought that adding an operator overload to struct VPIteration seemed a bit unnecessary for just this one case. The way I've defined LK_ScalableFirst to work means that the pair (Instance.Lane=0,Instance.Kind=LK_ScalableFirst) actually refers to the last element of the vector, i.e. LastLane - Instance.Lane.

llvm/lib/Transforms/Vectorize/VPlan.h
189

Sadly this way I also get a compile-time warning about ending the function without returning a value, which means I still have to add a default here.

198

Hi @fhahn It's hard to come up with a much better name - how about "isKnownLane"? This terminology is used throughout the codebase to mean something that is known at compile time, e.g. ElementCount::getKnownMinValue()

david-arm updated this revision to Diff 321081.Feb 3 2021, 6:13 AM
david-arm marked 3 inline comments as done.
  • Rebase.
sdesmalen added inline comments.Mon, Feb 8, 9:18 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2557–2558

Okay, fair enough on not overloading the operator.

I'm a bit concerned about exposing Lane directly though, because this now means you'll need to check Instance.isKnownLane() to know whether this concerns the first lane, or the last. Any code that forgets to check this, may make the wrong assumptions.

I'd suggest making Lane private, and adding interfaces such as:

  • getAsFirst(int Lane)
  • getAsScalableLast(int Lane)

which asserts that the lane starts at the beginning (getFromFirst) of from the back of a scalable vector (getFromScalableLast).
It would also remove the need for additional asserts you added to check if the VPIteration Instance is a known lane.

2557–2558

nit:

assert((!Instance.isFirstLane() ||
        Cost->isUniformAfterVectorization(cast<Instruction>(V), VF))
          && "Uniform values only have lane zero");

?

4458

Can you restructure this so that either:

  • Lane/Kind are explicitly set for both Fixed and Scalable case. Or alternatively:
  • Both Lane and Kind are initialized for Fixed, and only updated for scalable.
4460

What I don't like about that definition is that the representation in the LoopVectorizer is entirely different for scalable and fixed-width vectors. {0, First} and {0, LastScalable} mean the first and last element, respectively. {3, First} means the last for <4 x i32> and {3, LastScalable} means 4th last element (for <vscale x 4 x i32>). I think this is confusing and error prone. If First/ScalableLast would mean the first and last scalable "chunk" of a vector, with the index being the index as normal, then the last iteration would be described as {3, First} for <4 x i32> and {3, ScalableLast} for <vscale x 4 x i32>, and so the representation in the LoopVectorizer is more or less the same. Code-generating it is slightly different of course, but there's probably only a single place where that has to happen.

david-arm added inline comments.Mon, Feb 8, 9:33 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
4460

OK, I'll have a think about it and see code changes would be required. I think in terms of code-generating the lane index, starting from the end is easier for the developer as the most common case will be the very last lane. That's why in my original patch I'd only catered specifically for the very last lane. If we define it in terms of chunks (really just subvectors I think) I might change some of the interfaces and names to make it clear that we're dealing with lane indices from the start of what is essentially a subvector.

david-arm added inline comments.Wed, Feb 10, 7:06 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2557–2558

I'm not sure about having enum specific getAsXX functions to be honest. Having to add a new function for any new types people want to add in the future seems a bit inflexible. However, I'll try to find a way of doing something better here if possible. I do take your point about making member variables private, but that will require another NFC patch to change struct to class and adding get/set variables first. It won't be a small change. :)

david-arm updated this revision to Diff 323781.Mon, Feb 15, 9:10 AM
  • Added a new VPLane class to contain all the lane offset and kind, plus add methods getKnownValue and getExpr to return the compile-time value and runtime expression, respectively.
  • Changed the definition of ScalableLast to Sander's suggestion where the lane value is the offset from the start of the last subvector.
  • Folded some asserts about the lane value into VPLane::getKnownValue() and VPLane::mapToCacheIndex.
david-arm marked 4 inline comments as done.Mon, Feb 15, 9:11 AM
Matt added a subscriber: Matt.Tue, Feb 16, 9:05 AM

Thanks for the changes @david-arm, I think this is moving in the right direction.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2557–2558

nit: Can you maybe write this as Instance.Lane.isFirst()? Personally I find the difference between isFirstLane and isFirstIteration a bit confusing. (same for other places)

4456–4462

is it worth creating a static VPIteration::getLastLaneForVF(ElementCount) for this?

llvm/lib/Transforms/Vectorize/VPlan.cpp
69

don't use default. Cover both cases explicitly, so that if another enum value is added, the compiler will emit a diagnostic this case is not covered.

llvm/lib/Transforms/Vectorize/VPlan.h
103

nit: if you put the comments above each enum-value, doxygen generates a nice table with a comment describing what each enum-value means.
See for example ARMLdStMultipleTiming in ARMSubtarget.h (and the generated doxygen here: https://llvm.org/doxygen/classllvm_1_1ARMSubtarget.html#ac7324b67d7e3be270177e6590f0bb1e5)

113

I'd prefer this to just be named Lane, because it is still a lane. The First or ScalableLast tells in which chunk of the vector this lane lives.

123

nit: getKnownLane.

130

nit: getLaneAsRuntimeExpr ?

david-arm added inline comments.Thu, Feb 18, 1:40 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2557–2558

Sure. I added isFirstIteration at your suggestion because it refers to Part=0 as well as Lane=0.

llvm/lib/Transforms/Vectorize/VPlan.cpp
69

OK I can do that - it just might mean adding an initialiser to Lane at the start of the function. I can't return directly from a case statement without a default as the compiler warns about functions returning void otherwise.

fhahn added inline comments.Thu, Feb 18, 1:45 AM
llvm/lib/Transforms/Vectorize/VPlan.cpp
69

can you just put llvm_unreachable after the switch to get rid of the warning?

david-arm added inline comments.Thu, Feb 18, 1:50 AM
llvm/lib/Transforms/Vectorize/VPlan.cpp
69

Yeah I can do that if that's the preferred way?

david-arm updated this revision to Diff 325002.Fri, Feb 19, 9:05 AM
  • Renamed and removed some interfaces.
  • Removed default case in getAsRuntimeExpr, but had to add llvm_unreachable() as a result in order to kill the resulting compiler warning.
  • Add VPLane::getLastLaneForVF().
david-arm marked 8 inline comments as done.Fri, Feb 19, 9:06 AM
sdesmalen added inline comments.Tue, Feb 23, 3:17 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2977

nit: maybe add a static VPIteration::getFirstLane? (seems to be useful in several places).

llvm/lib/Transforms/Vectorize/VPlan.cpp
261

should this use mapToCacheIndex ?

llvm/lib/Transforms/Vectorize/VPlan.h
186

nit: const VPLane &

188

nit: is isFirstIteration unused now?

david-arm updated this revision to Diff 326108.Wed, Feb 24, 8:40 AM
  • Rebase required changing VPTransformState since VectorizerValueMap has been removed.
  • Added VPLane::getFirstLane()
david-arm marked 4 inline comments as done.Wed, Feb 24, 8:41 AM
david-arm added inline comments.
llvm/lib/Transforms/Vectorize/VPlan.h
188

It's still used in two places so I've left this in.

sdesmalen accepted this revision.Tue, Mar 2, 1:00 AM

Thanks for all the changes. The patch LGTM with nits addressed.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
4451

nit: Did you need to move the call to Builder.SetInsertPoint?

llvm/lib/Transforms/Vectorize/VPlan.cpp
445

nit: it would be nice if you can write State->Instance->Lane = VPLane(Lane, VPLane::Kind::First);
(e.g. by providing VPLane(const VPlane &Other) in favour of having a set method).

llvm/test/Transforms/LoopVectorize/AArch64/sve-extract-last-veclane.ll
2

nit: passing the attribute in the command is redundant if it's already set by the IR attributes.

This revision is now accepted and ready to land.Tue, Mar 2, 1:00 AM
fhahn added inline comments.Tue, Mar 2, 1:15 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1111

It looks like this is only used in llvm/lib/Transforms/Vectorize/VPlan.cpp in this patch? Should it be defined directly there? If it needs to be shared between multiple files it would probably be better to just put the declaration into a header?

llvm/test/Transforms/LoopVectorize/extract-last-veclane.ll
56

Is there reason to use the metadata here? Can we instead just use the -force-vector-width option, which should be a bit simpler and the expected VF is clear from the run line?

david-arm marked an inline comment as done.Wed, Mar 3, 12:27 AM
david-arm added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1111

It will be used by code in this file in other patches - for example, @CarolineConcatto currently has a patch that also uses this function. I can try to find a suitable header to put this in.

david-arm updated this revision to Diff 328117.Thu, Mar 4, 3:55 AM
david-arm marked 5 inline comments as done.
fhahn accepted this revision.Thu, Mar 4, 3:58 AM

LGTM thanks!

llvm/lib/Transforms/Vectorize/VPlan.h
63

No need for extern here I think? Please also add a comment.