This is an archive of the discontinued LLVM Phabricator instance.

[LV] Tail-loop Folding
ClosedPublic

Authored by SjoerdMeijer on Jul 24 2019, 3:57 AM.

Details

Summary

This allows folding of the scalar epilogue loop (the tail) into the main
vectorised loop body when the loop is annotated with a "vector predicate"
metadata hint. To fold the tail, instructions need to be predicated (masked),
enabling/disabling lanes for the remainder iterations.

This depends on D64744 that introduces the llvm.loop.vectorize.predicate.enable
pragma and metadata node, and D64916 which is a refactoring to make tail
folding a more general concept.

Diff Detail

Event Timeline

SjoerdMeijer created this revision.Jul 24 2019, 3:57 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 24 2019, 3:57 AM

[serious] There is a LoopVectorizeHints class in LoopVectorizationLegality.cpp that should be used.

[serious] Documentation of llvm.loop.vectorize.predicate.enable is missing.

Just just realized that docs for llvm.loop.vectorize.predicate.enable is part of D64744, which otherwise is a clang-only patch.

Hi Michael, thanks for taking a look again!

Just just realized that docs for llvm.loop.vectorize.predicate.enable is part of D64744, which otherwise is a clang-only patch.

Yep, indeed, so I assume that's all good.

There is a LoopVectorizeHints class in LoopVectorizationLegality.cpp that should be used.

Ah yes, thanks for the suggestion, I will start looking into this, and will move the pragma handling to some function in there.

Just just realized that docs for llvm.loop.vectorize.predicate.enable is part of D64744, which otherwise is a clang-only patch.

Yep, indeed, so I assume that's all good.

Before we had the monoropo reviewers frequently asked to split patches into the LLVM and Clang part. With the monorepo, I am not sure the rule still needs to be followed. At least, I did not expect LLVM documentation in a clang patch, so sorry for the non-applicable comment.

Before we had the monoropo reviewers frequently asked to split patches into the LLVM and Clang part. With the monorepo, I am not sure the rule still needs to be followed. At least, I did not expect LLVM documentation in a clang patch, so sorry for the non-applicable comment.

It does because we're still committing to SVN. Once we enable write mode on the monorepo, that'll change.

Ha, that's funny, because before noticing these comments here, I was just doing a test commit (366904) with the github monorepo workflow.
With the discussion on the dev list that the transition date is near, and just following the public documentation in https://llvm.org/docs/GettingStarted.html, it really looks like I committed to the clang and llvm repo at the same time using the git llvm push script from a local git monorepo. I was of course aware of separating clang and llvm patches, but again, thought that this new workflow is fully accepted/supported.

Anyway, back to looking at LoopVectorizationLegality.cpp :-)

We probably need to discuss whether vectorize_predicate(enable) should (or should not) implicitly turns on vectorize(enable) or not. I guess the current behavior is "does not", right? We don't have to discuss that in this review, but we still want to make a conscious decision one way or the other, or did I miss that discussion?

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
849

I think the nuance here is rather ScalarEpilogueNotNeededPredicatePragma. In other words, if scalar epilogue is needed for some other reason (but still okay to skip scalar epilogue execution when vector code executes), scalar epilogue can be emitted/utilized. Runtime vectorization legality check of all kinds fits in that profile. We shouldn't overload "predicated vector code" pragma with "don't emit scalar epilogue" meaning.

4777

-Os/-Oz message comes out from fall through. Not desired.

4801

How about

// Accept MaxVF if we don't have a tail at all.

and move the comment inside IF.

About:

We probably need to discuss whether vectorize_predicate(enable) should (or should not) implicitly turns on vectorize(enable) or not. I guess the current behavior is "does not", right? We don't have to discuss that in this review, but we still want to make a conscious decision one way or the other, or did I miss that discussion?

Nope, you're exactly right. We haven't discussed this yet, it had also crossed my mind, and we should discuss it. Your statement about the current behaviour is also right.

I will first look into addressing previous comments. My responses might be delayed due to an upcoming holiday, but finishing this is my highest priority.

For SVE we found that there are sometimes benefits to using an unpredicated vector body plus a predicated tail. When the main vectorized loop-body is unpredicated, we know all lanes in the vector are executed and can produce more efficient set of instructions. The scalar tail can then still be vectorized using predication to mask off the inactive lanes, or depending on the cost of vectorizing the tail loop the compiler may want to choose not vectorizing the tail loop at all. It would be nice if your design allows for this use-case.
So maybe instead of having a boolean 'llvm.loop.vectorize.predicate.enable' you can make it into an enum, or perhaps rename the attribute to emphasises the difference so we can add this logic later?

llvm/lib/Analysis/LoopInfo.cpp
516 ↗(On Diff #211460)

nit:

return Name.equals(S->getString()) &&
       mdconst::extract<ConstantInt>(MD->getOperand(1))->getZExtValue());
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
7367

nit: unnecessary whitespace.

SjoerdMeijer added a comment.EditedJul 25 2019, 3:51 AM

Thanks for taking a look at this!

Some initial thoughts on this:

For SVE we found that there are sometimes benefits to using an unpredicated vector body plus a predicated tail. When the main vectorized loop-body is unpredicated, we know all lanes in the vector are executed and can produce more efficient set of instructions. The scalar tail can then still be vectorized using predication to mask off the inactive lanes, or depending on the cost of vectorizing the tail loop the compiler may want to choose not vectorizing the tail loop at all. It would be nice if your design allows for this use-case.
So maybe instead of having a boolean 'llvm.loop.vectorize.predicate.enable' you can make it into an enum, or perhaps rename the attribute to emphasises the difference so we can add this logic later?

In the current flow, the only use-case that we have so far, is that predicate.enable set by a pragma. As it is a pragma, like any other pragma, it is the user's responsibility whether this makes sense and is profitable, etc.

Another use case, is that predicate.enable is set by a loop vectorisation profitability analysis. Whether this is profitable or not, will indeed depend on the target (SVE, MVE, AVX, etc.), the core implementation, and different loop properties. So I can imagine that different target hooks will be required for this decision making, which can then result in setting predicate.enable. Thus, I don't think it excludes any use case, but in fact is the ground work for other use-cases.

I think I've addressed all comments, the main ones are:

  • I've moved the loop hint handling to LoopVectorizationLegality.cpp
  • I've renamed ScalarEpilogueNotNeededPredicatePragma
  • and finally created a helper function to avoid some code duplication that has been bothering me for a while
fhahn added inline comments.Jul 25 2019, 6:08 AM
llvm/test/Transforms/LoopVectorize/tail_loop_folding.ll
1

It looks like assertions are not required for the test case.

5

If this test relies on the x86 cost model/x86 masked instructions, it should go into the subfolder I think.

I've moved the test case to the X86 subfolder and removed the ASSERT.

I obviously want to add some MVE tests too, but will do that later. The X86 cost model and masked instructions are a nice demonstrator :-)

ran clang-format.

Friendly ping :-)

Looks fine to me, but see what the other reviewers say.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
870–880

[nit] this seems unrelated?

7278–7279

[nit] formatting-only change?

7463–7464

[nit] unrelated change?

Friendly ping :-)

Looks like we are converging. One minor comment only.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
849

Thanks for addressing. May not be immediately effective, but should help if someone wants to move towards that direction.

4777

Thanks for taking care of it.

4801

Suggest moving this comment between the Lines 4806 and 4807.

Thanks for taking another look!

Feedback addressed, moved comment.

SjoerdMeijer added a comment.EditedJul 30 2019, 11:11 AM

Sorry for asking, @hsaito , but I was just wondering and wanted to check if your last comment was in fact a LGTM with a minor nit.

hsaito accepted this revision.Jul 31 2019, 9:52 AM

LGTM, pending the discussion about the exact meaning of the newly introduced "vector predicate" pragma (expect this to happen outside of this review). Please wait for another day to give others last minute opportunity to give feedback.

This revision is now accepted and ready to land.Jul 31 2019, 9:52 AM
SjoerdMeijer added a comment.EditedAug 1 2019, 11:17 AM

Many thanks for all your help and reviews!

I will start the discussion about the interaction between the vector predicate and vectorize pragmas as soon as I am back in the office. I will have a closer look first and try to form a better opinion, but my first thought at this moment is that enabling "vectorize_predicate" should simply imply "vectorize". As soon as I'm ready, I will upload a patch and perhaps a message to cfe dev.

This revision was automatically updated to reflect the committed changes.