This is an archive of the discontinued LLVM Phabricator instance.

[SLP] delete unused pairwise reduction option
ClosedPublic

Authored by spatel on Dec 28 2020, 8:12 AM.

Details

Summary

SLP tries to model 2 forms of vector reductions: pairwise and splitting. From the cost model code comments, those are defined using an example as:

/// Pairwise:
///  (v0, v1, v2, v3)
///  ((v0+v1), (v2+v3), undef, undef)
/// Split:
///  (v0, v1, v2, v3)
///  ((v0+v2), (v1+v3), undef, undef)

I don't know the full history of this functionality, but it was partly added back in D29402. There are apparently no users at this point (no regression tests change). X86 might have managed to work-around the need for this through cost model and codegen improvements.

Removing this code makes it easier to continue the work that was started in D87416 / D88193. The alternative -- if there is some target that is silently using this option -- is to move this logic into LoopUtils. We have related/duplicate functionality there via llvm::createTargetReduction().

Diff Detail

Event Timeline

spatel created this revision.Dec 28 2020, 8:12 AM
spatel requested review of this revision.Dec 28 2020, 8:12 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 28 2020, 8:12 AM

Hi Sanjay, pairwise reduction seems not adopted by any targets. But do we need to consider the FMF here? I found LangRef says the intrinsic is sequential for fadd and fmul if reassoc flag is not set.

This would probably mean we can remove all the pairwise support from the TTI reduction cost callbacks as a followup.

Hi Sanjay, pairwise reduction seems not adopted by any targets. But do we need to consider the FMF here? I found LangRef says the intrinsic is sequential for fadd and fmul if reassoc flag is not set.

Hi @pengfei -
Sorry for the delayed reply - I did not get an email notification for your comment here in Phab.

We do not need to care about FMF directly here because neither of these shuffle variations corresponds to "sequential" reduction of the elements. In other words, we should require reassoc (at the least) to form an fadd or fmul reduction that is expected to need any shuffles. Let me know if you see any holes in that theory.

One motivation for this cleanup is to correct bugs with using/propagating FMF (that was also true for D87416). For example - https://llvm.org/PR35538

Hi Sanjay, pairwise reduction seems not adopted by any targets. But do we need to consider the FMF here? I found LangRef says the intrinsic is sequential for fadd and fmul if reassoc flag is not set.

Hi @pengfei -
Sorry for the delayed reply - I did not get an email notification for your comment here in Phab.

We do not need to care about FMF directly here because neither of these shuffle variations corresponds to "sequential" reduction of the elements. In other words, we should require reassoc (at the least) to form an fadd or fmul reduction that is expected to need any shuffles. Let me know if you see any holes in that theory.

One motivation for this cleanup is to correct bugs with using/propagating FMF (that was also true for D87416). For example - https://llvm.org/PR35538

Thanks for explanation. I was confused it with reduction intrinsics. I noticed tryToReduce sets fast flag instead of requires. Is that the problem you mentioned? By the way, can we build reduction intrinsics directly here instead of doing shuffle?

Hi Sanjay, pairwise reduction seems not adopted by any targets. But do we need to consider the FMF here? I found LangRef says the intrinsic is sequential for fadd and fmul if reassoc flag is not set.

Hi @pengfei -
Sorry for the delayed reply - I did not get an email notification for your comment here in Phab.

We do not need to care about FMF directly here because neither of these shuffle variations corresponds to "sequential" reduction of the elements. In other words, we should require reassoc (at the least) to form an fadd or fmul reduction that is expected to need any shuffles. Let me know if you see any holes in that theory.

One motivation for this cleanup is to correct bugs with using/propagating FMF (that was also true for D87416). For example - https://llvm.org/PR35538

Thanks for explanation. I was confused it with reduction intrinsics. I noticed tryToReduce sets fast flag instead of requires. Is that the problem you mentioned? By the way, can we build reduction intrinsics directly here instead of doing shuffle?

SLP requires fast as a pre-condition before matching a reduction (although I'm not sure if that implementation is completely sound), so the hard-coded logic:

Unsafe.setFast();
Builder.setFastMathFlags(Unsafe);

...is theoretically safe here. But yes, that is one bug I am hoping to fix. LoopVectorizer appears to have different bug(s).

The code in SLP's emitReduction() is strange - we rely on createSimpleTargetReduction() on one path, but not the other (without this patch).
We should use that utility all the time, so we are not duplicating it here in SLP. I am trying to clean up that code too - for example 8d18bc8e6db7 .
So that is why I suggested that the alternative to this patch would be to push the shuffle logic into that utility. That way, LoopVectorizer could use it too (if it was actually needed by some target).
Let me know if I am not answering/understanding the questions.

SLP requires fast as a pre-condition before matching a reduction (although I'm not sure if that implementation is completely sound), so the hard-coded logic:

Unsafe.setFast();
Builder.setFastMathFlags(Unsafe);

...is theoretically safe here. But yes, that is one bug I am hoping to fix. LoopVectorizer appears to have different bug(s).

I was wrong - I thought we had test coverage for this, but I don't see it now. Here is an example of SLP miscompile: 3567908d8ceb
The transformed code with reduction assumes things like ninf and nnan, but those do not exist in the original code.

Let me know if I am not answering/understanding the questions.

That's more clear to me. Thank you.

The transformed code with reduction assumes things like ninf and nnan, but those do not exist in the original code.

Yep, that's definitely wrong. But I think FMF propagation is not an easy thing. We cannot always copy the original FMF, right? E.g. reassoc reduction may result in inf which "sequential" doesn't. Should ninf be propagated or not after reduction?

Yep, that's definitely wrong. But I think FMF propagation is not an easy thing. We cannot always copy the original FMF, right? E.g. reassoc reduction may result in inf which "sequential" doesn't. Should ninf be propagated or not after reduction?

We would propagate ninf in that situation. For example, if the incoming ops are all fast, then the outgoing ops will still be fast. We know that reassoc means that effectively anything is possible with FP, so it is the programmer's responsibility to deal with edge cases, not the compiler's responsibility. The LangRef defines this behavior correctly with poison:
http://llvm.org/docs/LangRef.html#fast-math-flags

RKSimon accepted this revision.Jan 2 2021, 7:15 AM

LGTM to removing the pairwise reduction handling - the FMF cleanups appear to be completely separate.

This revision is now accepted and ready to land.Jan 2 2021, 7:15 AM
This revision was automatically updated to reflect the committed changes.