This is an archive of the discontinued LLVM Phabricator instance.

[LV] Allow tail folded reduction selects to remain in the loop
ClosedPublic

Authored by dmgreen on Jul 28 2020, 3:35 AM.

Details

Summary

The normal scheme for tail folding reductions is to use:

loop:
  p = phi(0, a)
  mask = ...
  x = masked_load(..., mask)
  a = add(x, p)
s = select(mask, a, p)

This means we need to keep the register p and a alive out of the loop, plus the mask. On a target with predicated operations we can instead generate the phi as p = phi(0, s). This ensures the select in the loop and we can fold select(m, add(a, b), c) to something like a vaddt c, a, b using the m predicate. This in turn allows us to tail predicate the entire loop.

Diff Detail

Event Timeline

dmgreen created this revision.Jul 28 2020, 3:35 AM
dmgreen requested review of this revision.Jul 28 2020, 3:35 AM
SjoerdMeijer added inline comments.Aug 11 2020, 8:49 AM
llvm/include/llvm/Analysis/TargetTransformInfo.h
1261 ↗(On Diff #281157)

typo: to to

1262 ↗(On Diff #281157)

typo: e -> be

1264 ↗(On Diff #281157)

Probably best to do the TTI changes separately. Just kind of echoing what people told me last time...but I guess it is more convenient in case of reverts of the LV part for example.

1264 ↗(On Diff #281157)

Bikeshedding names, so ignore if you don't think it is a good fit.
"InLoop" is used in helpers above. I was thinking if preferInLoopReductionSelect would be more consistent and clear.

Then I was wondering the next thing.... can we not simply use preferInLoopReduction? Is that not more or less the same, or can it be the same?

llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
1560 ↗(On Diff #281157)

Is this an unrelated change (the deletions)?

Thanks for taking a look. I will update this soon...

llvm/include/llvm/Analysis/TargetTransformInfo.h
1264 ↗(On Diff #281157)

You read my mind. I had it called preferInLoopReductionSelect originally, but I changed it because it is really a different concept to the inloop reductions. Inloop reductions are about placing a vecreduce in the loops as opposed to after it. This patch is about being able to transform add; select into a predicated add, which can make the select more efficient in the loop.

Hopefully that would make it useful in different architectures too, if they have the ability to predicate the add.

1264 ↗(On Diff #281157)

Yeah. I was trying to avoid having to add an option for it, and it wouldn't do anything without a target hook. I'll add one though and move the TTI part out.

llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
1560 ↗(On Diff #281157)

The idea is that we can now handle any reductions in tail predicated loops, not just integers. But we still have an option to disable them. I have to check that they actually do OK all the time though. I know predicated VFMA was missing until now. I'm not sure if any others are needed...

dmgreen updated this revision to Diff 285670.Aug 14 2020, 10:54 AM
dmgreen retitled this revision from [LV][ARM] Allow tail folded reduction selects to remain in the loop to [LV] Allow tail folded reduction selects to remain in the loop.

Now just the Vectorizer part and an option to test it with, plus a new test.

The rest is now in D85980

SjoerdMeijer accepted this revision.Aug 17 2020, 1:19 AM

Thanks, this now looks like a small and good change, LGTM.

This revision is now accepted and ready to land.Aug 17 2020, 1:19 AM