This is an archive of the discontinued LLVM Phabricator instance.

[LoopVectorize] NFC: Move UserVF feasibility checks to separate function.
AbandonedPublic

Authored by sdesmalen on Feb 4 2021, 5:16 AM.

Details

Summary

This patch is NFC and cleans up computeFeasibleMaxVF() by moving all
validity checking (and possibly clamping) of UserVF to a separate function
computeFeasibleUserVF().

This patch is a preparatory patch with the ultimate goal of making
computeMaxVF() return both a max fixed VF and a max scalable VF,
so that selectVectorizationFactor() can pick the most cost-effective
vectorization factor.

Diff Detail

Event Timeline

sdesmalen created this revision.Feb 4 2021, 5:16 AM
sdesmalen requested review of this revision.Feb 4 2021, 5:16 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 4 2021, 5:16 AM
sdesmalen updated this revision to Diff 322655.Feb 10 2021, 4:53 AM
sdesmalen added reviewers: c-rhodes, fhahn, evandro, gilr.
sdesmalen set the repository for this revision to rG LLVM Github Monorepo.
c-rhodes added inline comments.Feb 12 2021, 6:52 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1608–1610
/// \return UserVF if it is non-zero and there are no dependences, otherwise
/// return a clamped value. For a scalable UserVF, the resulting feasible VF
/// may be fixed-width.
5557–5559

should this be part of the lambda since it's not used elsewhere? Not sure if it's an issue but MinBWs could now be computed where it previously wasn't.

5561

should functions start with lower case?

5713

if (UserVF.isZero())?

5783–5784

can this be moved above MaxSafeVectorWidthInBits?

5788–5798

it's nice how you've reused the remark for the debug message above, not really important for this patch but it would be good to do the same thing here

sdesmalen updated this revision to Diff 323991.Feb 16 2021, 6:39 AM
sdesmalen marked 4 inline comments as done.

Addressed comments and rebased after D95245 landed.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5561

GetFeasibleMaxVF is also a local variable (with a function as value) so I thought it should start with upper case. But it seems the code-base is a bit undecided; in a quick grep I counted slightly (15%) more cases starting with an upper-case, than starting with a lower-case.

5788–5798

Thanks. I'm happy to see if I can do that in a separate patch (it requires some changes to the tests).

c-rhodes accepted this revision.Feb 16 2021, 7:28 AM

LGTM, might be worth leaving a little time before landing incase others have any comments, cheers

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5561

GetFeasibleMaxVF is also a local variable (with a function as value) so I thought it should start with upper case. But it seems the code-base is a bit undecided; in a quick grep I counted slightly (15%) more cases starting with an upper-case, than starting with a lower-case.

Ah fair enough, thanks for checking.

5788–5798

Thanks. I'm happy to see if I can do that in a separate patch (it requires some changes to the tests).

I agree that would be better in a separate patch considering the changes to tests required.

This revision is now accepted and ready to land.Feb 16 2021, 7:28 AM
fhahn added inline comments.Feb 18 2021, 12:59 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1609

it also checks if the reductions are supported for scalable vectors, right? Probably better to not go into the specifics in the comment and just say \return UserVF directly if it is valid. Otherwise clamp UserVF to the largest valid value.

5566

nit: the code might be slightly easier to follow if you just directly return if there is a valid UserVF, like

// If there is a valid UserVF, use it.
if (auto UserVF = computeFeasibleUserVF(...))`
   return UserVF.getValue();

return computeFeasibleMaxVF()
sdesmalen updated this revision to Diff 324569.Feb 18 2021, 2:24 AM
sdesmalen marked 2 inline comments as done.

Addressed comments.

fhahn added a comment.EditedFeb 18 2021, 2:49 PM

I had a look at the next patch and I am wondering if it would be possible to avoid splitting out the UserVF handling (which in turn requires splitting out the code to limit the VF by the the maximum safe VF, D96022). Unfortunately the code in computeFeasibleMaxVF has not been the most straight-forward, even before adding scalable support and I am a bit worried things will get harder to follow if the code is spread out more.

I think it might be simpler to operate in 3 stages in computeFeasibleMaxVF: 1) compute maximum fixed & scalable VFs, 2) use computed max VFs to pick/limit UserVF, if provided and 3) pick MaxVF based on registers, limit by computed MaxVF. I sketched out this approach in D96997.

I have not looked at the other patches in this stack in detail, so I might be missing something that makes things more complicated. But to pick a scalable VF, we'd just have to split out the logic to compute the MaxVectorSize (something like computeMaxVFBasedOnRegisters) and then call it for both scalable & fixed width vectors and limit by the computed maximum VFs. I think that would be similar to what's done in D96023, but overall I think it might be simpler than the clampFeasibleMaxVF approach, because less parameters need to be passed around and we also avoid re-computing the max VFs to clamp with.

Hi @fhahn, thanks for looking into these two patches.

Today the code in computeFeasibleMaxVF is roughly structured as:

if (Need to care about UserVF) {
  // Validate, clamp or fall back to fixed-width or scalars.
  // Meanwhile, giving various UserVF-specific remarks/debug output.
} else
  // Determine MaxFeasibleVF some other way.

Because the function has gotten so big and have practically non-overlapping code paths, I don't see any downside in splitting this code into separate functions to handle their respective cases.

My main reason for doing these two NFC patches first is that the code-paths for fixed- and scalable max VFs would be identical, i.e. no need to maintain separate variables for Max(Scalable|Fixed)VF, or have lots of conditional code based on whether the UserVF or MaxVF is scalable. With these NFC tweaks, scalable vectors just fit in naturally without a lot of specific 'scalable' changes (see D96023 which extends the function to work on scalable vectors, by calling a different interface to determine the WidestRegister. Not really any other scalable-specific changes are needed).

Patch D96022 moves the clamping operation to a separate function, so that it works for both scalable and fixed-width vectors (based on MaxVscale), making it one of the few only places that need to be very scalable-specific. The interface is generic though, it only asks to clamp "some VF" to some "num elements". computeFeasibleUserVF then only emits different remarks/debug-output based on the clamped value.
After this bit of cleanup, the code-paths for both computeMaxUserVF and computeFeasibleMaxVF do something very similar to what you suggested in D96997:

  • Start of with a large vector size (based on register width)
  • Clamp the vector size to max number of elements <legal max or tripcount>
  • Pick MaxVF.

I find the code in D96997 quite difficult to follow. Not necessarily because of the changes you made, I already find the code difficult to follow for the fixed-width case alone, but adding the scalable case just adds to the complication. Things like:

if (some condition specific to scalable VFs) {
  :
  ScalableMaxVF = ElementCount::getScalable(0)
  UserVF = ElementCount::getScalable(0)
}

:

if (some other condition specific to scalable VFs) {
  :
  ScalableMaxVF = ElementCount::getScalable(0)
}

make me weary for example. I think having the code split up makes the code a lot easier to follow than having the combination {UserVF, auto-select} x {Fixed, Scalable} in one function. After D96021 and D96022, practically all code-paths are the same for both the fixed- and scalable case.

I have not looked at the other patches in this stack in detail, so I might be missing something that makes things more complicated. But to pick a scalable VF, we'd just have to split out the logic to compute the MaxVectorSize (something like computeMaxVFBasedOnRegisters) and then call it for both scalable & fixed width vectors and limit by the computed maximum VFs. I think that would be similar to what's done in D96023, but overall I think it might be simpler than the clampFeasibleMaxVF approach, because less parameters need to be passed around and we also avoid re-computing the max VFs to clamp with.

I'd be happy to see if more things can be simplified once things work for scalable vectors, but I don't really see many things that are recomputed after D96023?

sdesmalen updated this revision to Diff 327893.Mar 3 2021, 1:46 PM

Moved calculation for MaxSafeElements out of getFeasibleUserVF.

Hi @fhahn, following your feedback, I've refactored the patches in this series a bit to move out some common calculations from getFeasibleUserVF and computeFeasibleMaxVF. I've also tried to implement your idea to start off with a very wide vector, that gets subsequently limited by: loop dependence-distance, vector-register-width, iteration count, etc. You'll see in D96023 that the code is now relatively straight-forward in how it limits the MaxVF, and how it works for both fixed- and scalable vectors. You'll also notice that computeFeasibleMaxVF no longer takes an explicit bool ComputeScalableMaxVF flag.

Hopefully this is more along the lines that you had in mind!

Patches D96021 and D96022 just move some code around, which I think makes the code quite a bit more readable. Specifically for this patch, if you have no objections I'd be quite eager to land this patch, since it just moves some existing functionality into its own function which I think is an immediate improvement. Discussion on how things will end up for scalable vectors can then be done on the subsequent patches.

fhahn added a comment.Mar 4 2021, 2:19 PM

Hi @fhahn, thanks for looking into these two patches.

Hi, sorry for the delay.

Today the code in computeFeasibleMaxVF is roughly structured as:

if (Need to care about UserVF) {
  // Validate, clamp or fall back to fixed-width or scalars.
  // Meanwhile, giving various UserVF-specific remarks/debug output.
} else
  // Determine MaxFeasibleVF some other way.

Because the function has gotten so big and have practically non-overlapping code paths, I don't see any downside in splitting this code into separate functions to handle their respective cases.

Yes that's indeed a problem, but I think that was mostly a consequence of adding scalable support for UserVFs only. Before it was much simpler IIRC and structured similar to outlined earlier (compute max bound, apply to UserVF if present otherwise apply bound when computing max VF).

I find the code in D96997 quite difficult to follow. Not necessarily because of the changes you made, I already find the code difficult to follow for the fixed-width case alone, but adding the scalable case just adds to the complication. Things like:

if (some condition specific to scalable VFs) {
  :
  ScalableMaxVF = ElementCount::getScalable(0)
  UserVF = ElementCount::getScalable(0)
}

:

if (some other condition specific to scalable VFs) {
  :
  ScalableMaxVF = ElementCount::getScalable(0)
}

make me weary for example. I think having the code split up makes the code a lot easier to follow than having the combination {UserVF, auto-select} x {Fixed, Scalable} in one function. After D96021 and D96022, practically all code-paths are the same for both the fixed- and scalable case.

Agreed, that was not ideal, but I think not necessarily due to the new structure, but because it inlined the upper bound computation for scalable VFs, which is much more verbose than for fixed vectors (BTW I think the extra debug messages are great!). I did a quick pass over the patch to move the scalable VF upper bound computation to a separate function, so those separate checks should be gone now.

Hi @fhahn, following your feedback, I've refactored the patches in this series a bit to move out some common calculations from getFeasibleUserVF and computeFeasibleMaxVF. I've also tried to implement your idea to start off with a very wide vector, that gets subsequently limited by: loop dependence-distance, vector-register-width, iteration count, etc. You'll see in D96023 that the code is now relatively straight-forward in how it limits the MaxVF, and how it works for both fixed- and scalable vectors. You'll also notice that computeFeasibleMaxVF no longer takes an explicit bool ComputeScalableMaxVF flag.

Hopefully this is more along the lines that you had in mind!

Thanks for the update, I just had a look. One thing I am not sure about is whether we want to emit the same debug messages & remarks for scalable vectors when we just compute the max scalable VF, without UserVF? Personally I think it would be helpful to emit them independently of whether UserVF is used or not, if scalable vectorization is enabled. For example, if there's a dependence that prevents scalable vectorization due to the MaxVScale computation, I think it would make sense to display the remark/debug output saying this is causing scalable vectorization to be disabled.

As alluded to earlier, I am not sure if splitting up the logic in getFeasibleUserVF and computeFeasibleMaxVF is ideal. The way I see it, we have to apply the same 'rules'/'restrictions' to both the UserVF and/or the max VF we compute. With that in mind, the UserVF handling should be really simple: we just have to check if it is within the upper bound. For scalable vectors, we also have the fallback. But this may just be temporary, because once we also pick a max scalable VF, we can instead proceed with ignoring the UserVF, same as we do for fixed vectors.

I think the main advantages of the approach outlined earlier are:

  • There's one place we apply the legality restrictions to the upper bound, which is then used for both the computed max VF and UserVF. I think the advantage here is that it is clearer from the code that the same rules apply in both cases (max VF & UserVF) and if we need to add a new constraint, there's a single place to add it.
  • The 'clamping' is done by a simple minimum function, whereas clampFeasibleMaxVF contains scalable vector specific code and also seems slightly more complicated (e.g. needs an extra optional out parameter)
  • The same debug messages &Remarks are generated when scalable vectorization is enabled.
  • Maybe a bit less indirection/complex function calls.

In the end, the code is not vastly different and I think there are no absolute answers here. I hope the above explains the direction from which I am looking at this a bit better than I did initially.

I updated D96997 a bit to better sketch out how it could integrate with computing scalable max VFs. It's still not too polished, it should mainly illustrate the overall structure and there's definitely lots of small things that could be improved further.

Another thing I am not entirely sure is the reduction checks. For now it seems like for AArch64 the VF is not really used, so in the patch it just passes any scalable VF. not sure if that might change in the future, but I think I saw in one of the comments mentioning that we might to remove VFs after computing the max VF anyways.

Hi @fhahn,

[..]
In the end, the code is not vastly different and I think there are no absolute answers here. I hope the above explains the direction from which I am looking at this a bit better than I did initially.

Yes, thanks for this, that was helpful! My original reason for splitting the patch series out like this was to make some trivial NFC "move code around" changes that were relatively simple to review, before making functional changes for scalable vectors. After seeing more clearly what you mean, I agree with you that the code becomes a bit more readable by being a bit more rigorous in the refactoring, and it also simplified the follow-up patches a bit.

Long story short, I have updated my patch series to align it more with the approach that you outlined in D96997. My new patch is in D98509 and I'll abandon this series.

Another thing I am not entirely sure is the reduction checks. For now it seems like for AArch64 the VF is not really used, so in the patch it just passes any scalable VF. not sure if that might change in the future, but I think I saw in one of the comments mentioning that we might to remove VFs after computing the max VF anyways

I initially thought we could discard individual VF candidates that don't match the "can this vectorize" criteria. This may actually be too fine grained because of how VFRange is used to specify a range of VFs, making it difficult to discard individual VFs as candidates. As you say, for SVE it doesn't look at the specific VF anyway. For now, I've added a FIXME around it, indicating this will change in the future.

With that in mind, the UserVF handling should be really simple: we just have to check if it is within the upper bound. For scalable vectors, we also have the fallback. But this may just be temporary, because once we also pick a max scalable VF, we can instead proceed with ignoring the UserVF, same as we do for fixed vectors.

Yes, I have actually implemented that behaviour in the new patch. If a scalable UserVF is not valid, it should not fall back on a less-wide scalable VF, or a VF with the same minimum number of lanes as the UserVF and drop the 'scalable' flag, as it does now. It should instead ignore the UserVF, and do what it would otherwise do: pick a suitable VF that is most cost-effective.

Thanks again!

Abandoned in favour of D98509

sdesmalen abandoned this revision.Mar 12 2021, 7:36 AM
fhahn added a comment.Mar 21 2021, 2:36 PM

Hi @fhahn,

[..]
In the end, the code is not vastly different and I think there are no absolute answers here. I hope the above explains the direction from which I am looking at this a bit better than I did initially.

Yes, thanks for this, that was helpful! My original reason for splitting the patch series out like this was to make some trivial NFC "move code around" changes that were relatively simple to review, before making functional changes for scalable vectors. After seeing more clearly what you mean, I agree with you that the code becomes a bit more readable by being a bit more rigorous in the refactoring, and it also simplified the follow-up patches a bit.

Long story short, I have updated my patch series to align it more with the approach that you outlined in D96997. My new patch is in D98509 and I'll abandon this series.

Thanks! I would have been happy to update my patch, but it's great you put up an update patch. I'll take a closer look early in the coming week!