This is an archive of the discontinued LLVM Phabricator instance.

[Intrinsics] define semantics for experimental fmax/fmin vector reductions
ClosedPublic

Authored by spatel on Sep 9 2020, 9:18 AM.

Details

Summary

As discussed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2020-April/140729.html

This is hopefully the final remaining showstopper before we can remove the 'experimental' from the reduction intrinsics.

No behavior was specified for the FP min/max reductions, so we have a mess of different interpretations.

There are a few potential options for the semantics of these max/min ops. I think this is the simplest based on current behavior/implementation: make the reductions inherit from the existing llvm.maxnum/minnum intrinsics. These correspond to libm fmax/fmin, and those are similar to the (now deprecated?) IEEE-754 maxNum/minNum functions (NaNs are treated as missing data). So the default expansion creates calls to libm functions.

Another option would be to inherit from llvm.maximum/minimum (NaNs propagate), but most targets just crash in codegen when given those nodes because no default expansion was ever implemented AFAICT.

We could also just assume 'nnan' semantics by default (we are already assuming 'nsz' semantics in the maxnum/minnum intrinsics), but some targets (AArch64, PowerPC) support the more defined behavior, so it doesn't make much sense to not allow a tighter spec. Fast-math-flags (nnan) can be used to loosen the semantics.

(Note that D67507 was proposed to update the LangRef to acknowledge the more recent IEEE-754 2019 standard, but that patch seems to have stalled. If we do update based on the new standard, the reduction instructions can seamlessly inherit from whatever updates are made to the max/min intrinsics.)

x86 sees a regression here on 'nnan' tests because we have underlying, longstanding bugs in FMF creation/propagation. Those need to be fixed apart from this change (for example: https://llvm.org/PR35538). The expansion sequence before this patch may not have been correct.

Diff Detail

Event Timeline

spatel created this revision.Sep 9 2020, 9:18 AM
Herald added a project: Restricted Project. · View Herald Transcript
spatel requested review of this revision.Sep 9 2020, 9:18 AM
nikic added a comment.Sep 9 2020, 9:54 AM

The LangRef change looks good to me. I've pointed out one place where the SDAG expansion doesn't seem quite compatible with it.

llvm/lib/CodeGen/ExpandReductions.cpp
163

Would it be sufficient to only check nnan here, or does the expansion rely on something more?

llvm/test/CodeGen/Thumb2/mve-vecreduce-fminmax.ll
247

This lowering looks incorrect for the case where both elements are NaN. We'll fold to +INF then. We probably have an expansion that assumes +INF is a neutral element for fminnum, but it isn't in the presence of NaNs :/

spatel added inline comments.Sep 9 2020, 10:17 AM
llvm/lib/CodeGen/ExpandReductions.cpp
163

It's not safe currently.

llvm::createMinMaxOp() always creates instructions that are fully 'fast':

// We only match FP sequences that are 'fast', so we can unconditionally
// set it on any generated instructions.

And I think we are seeing that bug manifested in PR35538.

spatel added inline comments.Sep 9 2020, 10:26 AM
llvm/test/CodeGen/Thumb2/mve-vecreduce-fminmax.ll
247

Good catch - something in vector legalization does that:

    t15: v4f32 = insert_vector_elt t12, ConstantFP:f32<INF>, Constant:i32<2>
  t17: v4f32 = insert_vector_elt t15, ConstantFP:f32<INF>, Constant:i32<3>
t18: f32 = vecreduce_fmin t17
dmgreen added inline comments.Sep 9 2020, 10:40 AM
llvm/test/CodeGen/Thumb2/mve-vecreduce-fminmax.ll
247

I originally thought this was because we don't go through ExpandReductions, widening them in ISel instead. They do look like they get padded with +/- Inf in that case.

But we do expand pre-isel if NoNan isn't present in shouldExpandReduction. I looks like some of the expansion of min/max is unconditionally setting fast flags in llvm::createMinMaxOp. Unless I'm mistaken.

The padding with +/- inf is likely a problem on it's own right too.

300–306

It seems like some of these are _better_ than the fast math versions! :)

dmgreen added inline comments.Sep 9 2020, 10:52 AM
llvm/test/CodeGen/Thumb2/mve-vecreduce-fminmax.ll
247

Oh I see you are change how that works. It sounds like shouldExpandReduction could be updated then?

spatel updated this revision to Diff 290773.Sep 9 2020, 11:15 AM

Patch updated:
Change the neutral element for min/max reductions in WidenVecOp_VECREDUCE() from INF to quiet NaN.
We're probably missing generic and/or target-specific simplifications for those patterns, but I think that can be an enhancement.

craig.topper added inline comments.Sep 9 2020, 11:19 AM
llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
4811–4812

Do we need to drop nonan FMF then? Probably should have been dropping noinf before.

Do we have non-power of 2 tests for X86? X86 needs nonan to optimally lower fmaxnum/fminnum. But if you put a nan here then we shouldn't be using optimal lowering.

spatel added inline comments.Sep 9 2020, 11:26 AM
llvm/test/CodeGen/Thumb2/mve-vecreduce-fminmax.ll
247

Hmm...not sure.
It's not clear to me what the benefit of expanding in IR was/is. Was that needed because there was no common definition for these intrinsics/nodes?

The ARM override says:

// Can't legalize reductions with soft floats, and NoNan will create
// fminimum which we do not know how to lower.
return TLI->useSoftFloat() || !TLI->getSubtarget()->hasFPRegs() ||
       !II->getFastMathFlags().noNaNs();

So at the least I should update the comment. Leave the TLI checks but remove the FMF check?

nikic added inline comments.Sep 9 2020, 11:31 AM
llvm/test/CodeGen/Thumb2/mve-vecreduce-fminmax.ll
247

That's right, you can drop the noNaNs check now (there should be a similar one in AArch64). This is intended to never use the IR expansion unless needed to avoid SDAG assertions. Those will be gone for the nnan case now.

spatel added inline comments.Sep 9 2020, 11:49 AM
llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
4811–4812

Yes, we need to drop 'nnan' - otherwise this would create poison.
No, we don't have non-pow-2 vector sizes in x86 tests from what I see. I'll add some.

spatel updated this revision to Diff 290813.Sep 9 2020, 1:19 PM

Patch updated:

  1. Removed FMF conditions for AArch64/ARM expansion of reductions in IR (no test diffs from that IIUC).
  2. Clear 'nnan' when legalizing vector types by using NaN elements.
  3. Added x86 tests for odd vector types (several of these would crash without this patch because there is no default or x86 lowering for ISD::FMAXIMUM / ISD::FMINIMUM).
nikic added inline comments.Sep 9 2020, 1:58 PM
llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
4811–4812

Given how much X86 needs nnan for a decent lowering here, would it make sense to keep using +/- infinity if nnan is set, and only use qNan if it is not set?

spatel added inline comments.Sep 9 2020, 2:45 PM
llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
4811–4812

Yes, but I think it's a little trickier than that. As Craig hinted, if we use inf, then we need to clear 'ninf' or we have the same poison problem.

Given that this is probably just crashing currently, the bar for quality is pretty low. :)
I'd defer enhancements to a follow-up if that's ok.

dmgreen added inline comments.Sep 10 2020, 12:02 AM
llvm/test/CodeGen/Thumb2/mve-vecreduce-fminmax.ll
247

It's not clear to me what the benefit of expanding in IR was/is.

I agree. I think a lot of it was legacy, and expanding during ISel seems like a better way forward if we can make it work.

nikic added inline comments.Sep 10 2020, 12:53 AM
llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
4811–4812

That's okay as well. In that case I'd suggest to duplicate the fmin-nnan tests into fmin-fast for X86, so we retain coverage for the lowerings we actually want to see. Previously nnan was sufficient for that, now it isn't. (Though not just due to this issue, I guess our vecreduce legalization just generally doesn't work great for X86 right now).

spatel updated this revision to Diff 290954.Sep 10 2020, 6:15 AM

Patch updated - no code changes, but:

  1. Rebased after rG0a5dc7e (thanks, @nikic !) - so we have optimal lowering in several cases now.
  2. Removed some 'FIXME' comments in the Thumb2 file based on above.
  3. Added test file for x86 with 'fast' rG1ebb31b1 to retain coverage for optimal lowerings (not changed with this patch).
nikic accepted this revision.Sep 10 2020, 6:40 AM

LGTM

llvm/test/CodeGen/Generic/expand-experimental-reductions.ll
306

Don't think there's any plans to support non-pow2 vectors in the IR expansions.

This revision is now accepted and ready to land.Sep 10 2020, 6:40 AM
dmgreen added inline comments.Sep 10 2020, 6:43 AM
llvm/test/CodeGen/Thumb2/mve-vecreduce-fminmax.ll
5

Thanks!