This is an archive of the discontinued LLVM Phabricator instance.

[Clang] Pragma vectorize_predicate implies vectorize
ClosedPublic

Authored by SjoerdMeijer on Aug 5 2019, 2:19 PM.

Details

Summary

New pragma "vectorize_predicate(enable)" now implies "vectorize(enable)",
and it is ignored when vectorization is disabled with e.g.
"vectorize(disable) vectorize_predicate(enable)".

Diff Detail

Repository
rL LLVM

Event Timeline

SjoerdMeijer created this revision.Aug 5 2019, 2:19 PM
SjoerdMeijer added a comment.EditedAug 5 2019, 2:39 PM

Forgot that this requires a doc change too. Will add that once we're happy with the proposed behaviour.

Mmmh, I would have expected this to work the same way as vectorize_width. According to the docs:

The following example implicitly enables vectorization and interleaving by specifying a vector width and interleaving count:
#pragma clang loop vectorize_width(2) interleave_count(2)
for(...) {
...
}

However, vectorize_width does not automatically set llvm.loop.vectorize.enable. Neither does llvm.loop.vectorize.width > 1 imply LoopVectorizeHints::getForce(). At some places they are checked together, such as LoopVectorizeHints::allowReordering(). Other places, notably LoopVectorizationCostModel::selectVectorizationFactor(), only queries getForce(). That is, vectorize_width(2) does not implicitly force vectorization in practice.

Thanks for pointing this all out!

I am not entirely sure yet what to think about all this as I am new to the loop pragma business, but I think it looks inconsistent to me!

I think I find allowReordering() a little bit ugly, because it is also checking getWidth() > 1, and I am probably expecting the allowReordering decision to be based on 1 thing (the Force). But in the current implementation that probably makes sense though if vectorizer_width doesn't set vectorizer.enable (again, which is not what I would expect). Thus, it makes sense to me that LoopVectorizationCostModel::selectVectorizationFactor() only checks getForce()

Mmmh, I would have expected this to work the same way as vectorize_width.

So I don't see at this moment how it could work in the same way. And by working in the same way I think you mean adding vectorize_predicate checks to different places where vectorize_width is also checked, but that doesn't sound ideal to me.

Could a conclusion be that I was lucky? Lucky, because without all this knowledge that I have now, this patch looks to do what we would expect?

fhahn added a comment.Aug 6 2019, 9:02 AM

Mmmh, I would have expected this to work the same way as vectorize_width. According to the docs:

The following example implicitly enables vectorization and interleaving by specifying a vector width and interleaving count:
#pragma clang loop vectorize_width(2) interleave_count(2)
for(...) {
...
}

However, vectorize_width does not automatically set llvm.loop.vectorize.enable. Neither does llvm.loop.vectorize.width > 1 imply LoopVectorizeHints::getForce(). At some places they are checked together, such as LoopVectorizeHints::allowReordering(). Other places, notably LoopVectorizationCostModel::selectVectorizationFactor(), only queries getForce(). That is, vectorize_width(2) does not implicitly force vectorization in practice.

The current behavior with respect to vectorize_width seems a bit counterintuitive. The docs say vectorize_width implicitly enables vectorization, I would expect it to behave as if vectorize(enable) was also passed. There's also a PR about that mismatch: https://bugs.llvm.org/show_bug.cgi?id=27643

IMO it would make sense to have the more specific pragmas imply vectorize(enable) here (or update the docs accordingly).

Hi Florian, thanks for your input!

IMO it would make sense to have the more specific pragmas imply vectorize(enable) here (or update the docs accordingly).

Yep, fully agree with that, as I also wrote in my previous comment. And thanks for digging up that PR. I think that supports this case and that the proposed behaviour in this patch is what one would expect. If we agree on that, I wouldn't mind fixing that PR too as that seems low hanging-fruit to me.

And to be honest, at the moment I have no clue what the meaning is, or could/should be, when a more specific pragma like vector_width or vector_predicate is set, without also setting/implying vectorize(enable). I need to look into that...

Looking at the similar situation of unroll(enable)/unroll_count(4), unroll_count also does not set llvm.loop.unroll.enable, but it is handled by the LoopUnroll pass itself:

bool ExplicitUnroll = PragmaCount > 0 || PragmaFullUnroll ||
                      PragmaEnableUnroll || UserUnrollCount;

(LoopUnrollPass.cpp line 770f)

I do not know whether/how "setting a transformation option implicitly enables the transformation" should be implemented, maybe we should discuss this. It is currently inconsistent. Also consider that non-Clang frontends and .ll files in the wild might also expect a specific behavior.

I do not know whether/how "setting a transformation option implicitly enables the transformation" should be implemented, maybe we should discuss this.

Yep, agreed. I've sent a mail to the dev list:
http://lists.llvm.org/pipermail/cfe-dev/2019-August/063054.html

thanks @Meinersbur and @fhahn for the discussions on this, also on the dev list.
It looks like we are happy with this behaviour?

fhahn accepted this revision.Aug 13 2019, 12:30 PM
fhahn added reviewers: hfinkel, reames.

LGTM, thanks! Please wait a few days with committing, in case any additional comments come in via the mailing list.

This revision is now accepted and ready to land.Aug 13 2019, 12:30 PM

Thanks for your help! And I will wait a few days.

After this, I will look at that PR, will have a look at diagnostics, and then at the LLVM side of things.

Meinersbur accepted this revision.Aug 13 2019, 7:43 PM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptAug 14 2019, 11:23 PM