This is an archive of the discontinued LLVM Phabricator instance.

[Clang] Ensure vector predication loop metadata is always emitted when pragma is specified.
ClosedPublic

Authored by malharJ on Jan 15 2021, 6:54 AM.

Details

Summary

This patch ensures that vector predication and vectorization width pragmas work together correctly/as expected.
Specifically, this patch fixes the issue that when vectorization_width > 1, the vector predication behaviour (this would matter
if it has NOT been disabled explicitly by a pragma) was getting ignored, which was incorrect.

The fix here removes the dependence of vector predication on the vectorization width.
The loop metadata corresponding to clang loop pragma vectorize_predicate is always emitted,
if the pragma is specified, even if vectorization is disabled by vectorize_width(1) or vectorize(disable).

Diff Detail

Event Timeline

malharJ created this revision.Jan 15 2021, 6:54 AM
malharJ requested review of this revision.Jan 15 2021, 6:54 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 15 2021, 6:54 AM
SjoerdMeijer added inline comments.Jan 15 2021, 7:26 AM
clang/test/CodeGenCXX/pragma-loop-predicate.cpp
123

Do we also expect:

!{!"llvm.loop.vectorize.predicate.enable", i1 false}

here since the test sets:

vectorize_predicate(disable) vectorize_width(4)
malharJ added inline comments.Jan 15 2021, 7:34 AM
clang/test/CodeGenCXX/pragma-loop-predicate.cpp
123

GEN8 covers checking for that ?

I am trying to remember details here, but first about this:

vectorize_width(1) is also used, since that effectively disables vectorization.

I am not sure this is true (i.e. effectively disabling auto-vec) since in LangRef we specify:

The vector width is specified by vectorize_width(_value_[, fixed|scalable]), where _value_ is a positive integer and the type of vectorization can be specified with an optional second parameter.

So, I *think* vectorize_width(1) enables the vectoriser, just the width is set to 1. And this explains why I added that condition VectorizeWidth < 1.

This leads me to wonder what the actual problem is with this?

clang/test/CodeGenCXX/pragma-loop-predicate.cpp
123

Ah yep, of course, I missed that, so ignore this!

fhahn added a comment.Jan 18 2021, 3:52 AM

This ensures that the Clang loop pragma vectorize_predicate([enable|disable]) is ignored
when vectorize_width(1) is also used, since that effectively disables vectorization.

Could you elaborate *why* it vectorize_predicate should be ignored when vectorize_width(1)? In general, I do not think that Clang silently ignoring the pragma is great for the user experience. If we can detect cases where we know the pragma will be ignored, could we at least warn the user?

I had a look at the Clang Language Extension ... and I saw this:

Specifying a width/count of 1 disables the optimization, and is equivalent to vectorize(disable) or interleave(disable).

And regarding the original condition, it seems to check whether vector predication has been specified and vectorization is not disabled explicitly.
I think at the very least we should have Attrs.VectorizeWidth > 1 since it is being ANDed ?

if (Attrs.VectorizePredicateEnable != LoopAttributes::Unspecified &&
    Attrs.VectorizeEnable != LoopAttributes::Disable &&
     Attrs.VectorizeWidth < 1)

I put in

&& Attrs.VectorizeWidth != 1

since the Clang parser threw an error when passing a value of 0 for the width and I thought it would be good to indicate explicitly that we don't want equal to 1.

I had a look at the Clang Language Extension ... and I saw this:

Specifying a width/count of 1 disables the optimization, and is equivalent to vectorize(disable) or interleave(disable).

Ah yes, thanks. I think we should improve the langref spec also, this should be described in the same paragraph, not floating below some example.

With this cleared up, I agree with Florian, ignoring things isn't great.
But this brings me back to my question earlier: what is the problem with the current behaviour (i.e. not ignoring/handling it)?

malharJ added a comment.EditedJan 18 2021, 5:52 AM

what is the problem with the current behaviour (i.e. not ignoring/handling it)?

The current behaviour is ignoring/not handling the vectorize_predication pragma whenever vectorization width is non-zero.
Could you kindly see the original condition I stated in my previous comment ?

My patch is trying to fix this by handling the predication pragma for all cases (other than when vecotrization width = 1).
Please let me know if Im mistaken.

I believe this sentence is the important part (the width=1 is just an edge case):

For all other non-zero vectorization widths, the pragma is not ignored unless vectorization is explicitly disabled using vectorize(disable)

And this sounds like a sensible fix to me. As in - when the vector width is specified that should not mean the predicate(enabled/disabled) is ignored.

Can you expand the test a little for both predicate(enable) and disable?

clang/test/CodeGenCXX/pragma-loop-predicate.cpp
77

Can you add a test for vectorize_predicate(enable) vectorize_width(4)

malharJ updated this revision to Diff 317345.Jan 18 2021, 6:39 AM

added tests for loop predication enabled case.

Thanks. @fhahn @SjoerdMeijer what do we think about the edge case where the width==1? As far as I understand (with this patch):

#pragma clang loop vectorize_predicate(disable) vectorize_width(4)
Gives llvm.loop.vectorize.predicate.enable=false, llvm.loop.vectorize.width=4, llvm.loop.vectorize.scalable.enable=false, llvm.loop.vectorize.enable=true

#pragma clang loop vectorize_predicate(disable) vectorize_width(4)
Gives llvm.loop.vectorize.predicate.enable=true, llvm.loop.vectorize.width=4, llvm.loop.vectorize.scalable.enable=false, llvm.loop.vectorize.enable=true

#pragma clang loop vectorize_predicate(enable) vectorize_width(1)
Gives llvm.loop.vectorize.width=1, llvm.loop.vectorize.scalable.enable=false

#pragma clang loop vectorize_predicate(disable) vectorize_width(1)
Gives llvm.loop.vectorize.width=1, llvm.loop.vectorize.scalable.enable=false

Should we be adding llvm.loop.vectorize.predicate.enable metadata even when width=1? I would be inclined to emit the predication pragma anyway.

SjoerdMeijer added a comment.EditedJan 22 2021, 6:44 AM

Thanks. @fhahn @SjoerdMeijer what do we think about the edge case where the width==1? As far as I understand (with this patch):

#pragma clang loop vectorize_predicate(disable) vectorize_width(4)
Gives llvm.loop.vectorize.predicate.enable=false, llvm.loop.vectorize.width=4, llvm.loop.vectorize.scalable.enable=false, llvm.loop.vectorize.enable=true

#pragma clang loop vectorize_predicate(disable) vectorize_width(4)

I guess this is a typo, and should be vectorize_predicate(enable)

Gives llvm.loop.vectorize.predicate.enable=true, llvm.loop.vectorize.width=4, llvm.loop.vectorize.scalable.enable=false, llvm.loop.vectorize.enable=true

#pragma clang loop vectorize_predicate(enable) vectorize_width(1)
Gives llvm.loop.vectorize.width=1, llvm.loop.vectorize.scalable.enable=false

#pragma clang loop vectorize_predicate(disable) vectorize_width(1)
Gives llvm.loop.vectorize.width=1, llvm.loop.vectorize.scalable.enable=false

Should we be adding llvm.loop.vectorize.predicate.enable metadata even when width=1? I would be inclined to emit the predication pragma anyway.

So yeah, I got confused about this edge case width=1 earlier, but have refreshed my memory.
I would like to ask the question if it matters? Vectorize.width = 1 disables vectorisation, so unless I miss something, I don't think it matters if we emit llvm.loop.vectorize.predicate.enable=true or llvm.loop.vectorize.predicate.enable=false or if we don't emit it all, because it is ignored anyway? But if we have a combination that doesn't make sense, like width(1) and predicate(enable), should we emit a diagnostic if this is not already done?

malharJ updated this revision to Diff 322445.Feb 9 2021, 10:58 AM

This update removes the dependence of (emitting) vectorize_predicate metadata on whether vectorization
is disabled (via vectorize(disable) or vector_width(1)) or not.

This means that vectorize_predicate loop metadata is emitted whenever the pragma has been specified.

This makes sense to me. If the user specifies the pragma then we send that through to the vectorizer, whatever it is. As much as vectorize_width(1) interleave_count(4) vectorize_predicate(enable) doesn't make a lot of sense, it's still something that should work. And if the user asks for it, that's what should be done.

clang/test/CodeGenCXX/pragma-loop-predicate.cpp
88

Can you add another test for #pragma clang loop vectorize(disable) vectorize_predicate(enable). My understanding is that it will produce the same as vectorize_predicate(enable) vectorize_width(1)

malharJ added inline comments.Feb 11 2021, 2:38 AM
clang/test/CodeGenCXX/pragma-loop-predicate.cpp
88

I believe test5( ) above takes care of this ?

comparing the output of the two (test8 and test5), test8 produces an extra metadata :

{!"llvm.loop.vectorize.scalable.enable", i1 false}
dmgreen accepted this revision.Feb 11 2021, 4:51 AM

OK Thanks. LGTM

clang/test/CodeGenCXX/pragma-loop-predicate.cpp
88

Oh yeah. I didn't see the existing checks. Strange that it would not emit scalable then, but still doing the expected thing.

This revision is now accepted and ready to land.Feb 11 2021, 4:51 AM
malharJ retitled this revision from [Clang] Ensure vector predication pragma is ignored only when vectorization width is 1. to [Clang] Ensure vector predication pragma is emitted when specified..Feb 11 2021, 6:58 AM
malharJ edited the summary of this revision. (Show Details)
malharJ retitled this revision from [Clang] Ensure vector predication pragma is emitted when specified. to [Clang] Ensure vector predication loop metadata is always emitted when pragma is specified..Feb 11 2021, 7:01 AM
malharJ edited the summary of this revision. (Show Details)
malharJ added inline comments.Feb 11 2021, 7:07 AM
clang/test/CodeGenCXX/pragma-loop-predicate.cpp
88

I had a quick look at the logic for scalable, and it seems like it's only emitted when a fixed/scalable vectorization width has been specified.

I do not have commit access. Can someone with commit access please commit this patch ?

This makes sense to me. If the user specifies the pragma then we send that through to the vectorizer, whatever it is. As much as vectorize_width(1) interleave_count(4) vectorize_predicate(enable) doesn't make a lot of sense, it's still something that should work. And if the user asks for it, that's what should be done.

Unfortunately, it is not that simple. How vectorize metadata is emitted, how it is interpreted by the LoopVectorize pass, and when the WarnMissedTransformations pass emits a warning is non-trivial and I would not mess with it with no reason or if we generally wanted to clean this up. For instance, is implicitly enabling the vectorizer when the width is set the front-end's jobs or the LoopVectorizers? The function hasVectorizeTransformation in LoopUtils.h tries to abstract from that. That is, there is no 1-to-1 relationship between loop hint pragmas and loop metadata as you seem to assume.

The summary motivates this as "This ensures better feedback for the user/programmer." However, the user will not see any difference since the LoopVectorize will just ignore it. Worse, the IR contains contradicting information. IMHO, it's the font-end's job to resolve semantic inconsistencies and potentially warn users about it.

I don't intend to block this patch because it seems to have no consequence, but the motivation is insufficient. If the author uses the predicate in an upstream pass they should mention it.

To illustrate how complicated interpreting the vectorize pragma has become in the mid-end:

TransformationMode llvm::hasVectorizeTransformation(Loop *L) {
  Optional<bool> Enable =
      getOptionalBoolLoopAttribute(L, "llvm.loop.vectorize.enable");

  if (Enable == false)
    return TM_SuppressedByUser;

  Optional<ElementCount> VectorizeWidth =
      getOptionalElementCountLoopAttribute(L);
  Optional<int> InterleaveCount =
      getOptionalIntLoopAttribute(L, "llvm.loop.interleave.count");

  // 'Forcing' vector width and interleave count to one effectively disables
  // this transformation.
  if (Enable == true && VectorizeWidth && VectorizeWidth->isScalar() &&
      InterleaveCount == 1)
    return TM_SuppressedByUser;

  if (getBooleanLoopAttribute(L, "llvm.loop.isvectorized"))
    return TM_Disable;

  if (Enable == true)
    return TM_ForcedByUser;

  if ((VectorizeWidth && VectorizeWidth->isScalar()) && InterleaveCount == 1)
    return TM_Disable;

  if ((VectorizeWidth && VectorizeWidth->isVector()) || InterleaveCount > 1)
    return TM_Enable;

  if (hasDisableAllTransformsHint(L))
    return TM_Disable;

  return TM_Unspecified;
}

Thanks for your comments. This patch is not intended to have no consequence - it's just not being communicated well in the commit message.

The problem at the moment, with mainline clang, is that specifying:

#pragma clang loop vectorize_width(4) vectorize_predicate(enable)

(or under MVE, as predication is often done by default):

#pragma clang loop vectorize_width(4) vectorize_predicate(disable)

The predication is ignored because the width is set. As in:
https://godbolt.org/z/ex8jvW
So there is no way at the moment to control predication and width at the same time.

That is what this patch is aiming to fix, and that fix alone is quite simple. Unfortunately in order to fix that we also need to deal with (like you say) the awkward edge case where vectorize_width(1) is set and predication is enabled. But that's really just an edge case. This patch makes it work similar to interleaving, as far as I understand. If the user specifies

#pragma clang loop vectorize_width(1) interleave_count(4) vectorize_predicate(enable)

They get no vectorization, but interleaving which is predicated. I.e what they asked for. (There's a chance that is then SLP vectorizes, but that's a different problem :-) ) The predicated interleaving will likely be horrible for performance, but if the user asked for it I don't see a reason not to give it to them in this case. And the more interesting case of when a vector width is specified should start working. The only thing that is really changing is when llvm.loop.vectorize.predicate.enable is emitted, and I only meant to imply for the predicate pragma specifically, that the user should get what they ask for so long as the vectorizer can provide it.

I agree with you in general that pragmas/metadata are difficult and should not be changed without good reason.

Meinersbur accepted this revision.Feb 11 2021, 11:10 AM

Thank you for the clarification. It would be great if that could be communicated in the summary/commit message. It's sounds like emitting llvm.loop.vectorize.predicate.enable when vectorization is disabled is the motivation behind the patch.

It's not great that vectorize_predicate also influences interleave(enable), but due to both being done be the LoopVectorize pass, seems hard to fix.

malharJ edited the summary of this revision. (Show Details)Feb 11 2021, 2:05 PM

I just pushed the patch with an slight addition to the summary motivitating why always emitting the predicate metadata: https://github.com/llvm/llvm-project/commit/74ddacd30de54e19b28218bc8563bd0f34f32cad