Page MenuHomePhabricator

[TTI][LV] preferPredicateOverEpilogue
ClosedPublic

Authored by SjoerdMeijer on Oct 16 2019, 7:20 AM.

Details

Summary

We currently have two ways to steer creating a predicated vector body over creating a scalar epilogue. To force this, we have 1) a command line option and 2) a pragma available. This adds a third, a target hook to TargetTransformInfo that can be queried whether predication is preferred or not, which allows the vectoriser to make the decision (without forcing it).

I did the initial TTI plumbing for this, added usage of this new hook to the vectoriser where this should be queried, and added the beginning of an ARM MVE implementation. While this isn't complete yet, it currently behaves as a non-functional change, it demonstrates the required function interfaces. I.e., for MVE, we would like the vectoriser to do tail-folding when we know we will be generating a hardware-loop and these checks are implemented in the ARM specific hook. I will follow-up on this soon, but that will be an entirely ARM specific patch.

Diff Detail

Event Timeline

SjoerdMeijer created this revision.Oct 16 2019, 7:20 AM
SjoerdMeijer edited the summary of this revision. (Show Details)Oct 16 2019, 7:21 AM
SjoerdMeijer edited the summary of this revision. (Show Details)
samparker added inline comments.Oct 16 2019, 7:32 AM
llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
1010

What about MVE..? We also need to have masked load/stores enabled.

llvm/test/Transforms/LoopVectorize/ARM/tail-loop-folding.ll
10

I think it's worth adding your tests now for the conditions that you're adding, even if this means adding an extra backend command to enable/disable tail predication. I imagine we'll be testing for a while so I think it's worth it.

Thinking about this some more, I think it would be best to at least check some features of the loop for legality:

  • no vector widths greater than 128 bits.
  • all vector operations should have the same number of lanes.

Thinking about this some more, I think it would be best to at least check some features of the loop for legality:

    • no vector widths greater than 128 bits.
  • all vector operations should have the same number of lanes.

Yep, this is exactly what I wanted to do in the follow up as this is mostly target dependent.
But you do have good points, and perhaps addressing some of that now is a good thing. And so, looking into things now, what I wanted to implement in the target specific part, is something very similar to LoopVectorizationCostModel::getSmallestAndWidestTypes. I will reuse that, and also pass the smallest and widest type to preferPredicateOverEpilogue, this is the info we need for the legality checks.

Yeah. It might be better expressed as "what the vectorizer produces will need to end up fitting into 128 bits". You could imagine the vectorizer producing something wider but then folding them in ISel into something legal.

That's maybe something to leave till later, but keep in mind non-the-less.

hsaito added inline comments.Fri, Oct 18, 3:41 PM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
7437

@SjoerdMeijer thanks for the work. I like the direction you are heading to. Now that you are stepping into making this behavior default for some arch for some loop, please think about the hint needed to reverse that default choice, something along the lines of (TTI->prefer...(...) && Hints.getDoNotPredicate())

Thanks, will take that on board, and I will pick this up after the llvm dev conference.

I have:

  • changed the ARM implementation to check the required architecture extensions: check for MVE, the remaining checks are done in isHardwareLoopProfitable so there was no point in duplicating that.
  • added a new test case, to cover testing these architecture extension combo's, and profitability check.

@samparker, @dmgreen :
I've given it quite some thoughts to see if the current interface of preferPredicateOverEpilogue would be good enough to make a well informed decision whether we prefer to predicate or not. For example, questions were if we should also pass the Vectorization Factor (VF), and the vector types, but there are quite a few chicken-and-eggs-problems here. That is, we decide quite early if we want to predicate or not, long before we have calculated the VF, so let alone about the vector types. So, should preferPredicateOverEpiloguethen possibly return a set of VF candidates, or calculate the VF a lot earlier? Well, possibly, but that's also what the rest of pipeline is trying to figure out. But more importantly, I don't think we need without sacrificing too much. That is, we are going to be conservative anyway and I think the current interface is good decide on tail-predication in this way:

  • we bail if we find operations on i64, because we don't support (vector) operations on them,
  • we look if the data processing instructions all operate on the same and uniform data types (data type less or equal i32s),
  • we look at the loads and stores, and see access strides are the same and in the same directions. This makes sure we don't generate shuffles, which would mess with data elements and possibly masking/tail-predicating the wrong lanes.
  • and then we then rely on the vectoriser and the rest of the pipeline to choose a vectorisation factor.

@hsaito :
Many thanks for the suggestion! I will do that right away after this one as a follow up.

samparker accepted this revision.Mon, Nov 4, 3:47 AM

Ok, fair enough.

This revision is now accepted and ready to land.Mon, Nov 4, 3:47 AM
SjoerdMeijer added a comment.EditedTue, Nov 5, 7:40 AM

Thanks all!

I will commit this soon'ish as this is a NFC change, and in the mean time I have uploaded a work-in-progress patch D69845 just to demonstrate usage of this for MVE. It also contains a test case showing that we do not yet respect hint vectorize_predicate(disable) as mentioned by Hideki, which I will work on next as well as finishing D69845.

This revision was automatically updated to reflect the committed changes.