This is an archive of the discontinued LLVM Phabricator instance.

[Intrinsic] Introduce reduction intrinsics for minimum/maximum
ClosedPublic

Authored by anna on Jun 7 2023, 6:47 AM.

Details

Summary

This patch introduces the reduction intrinsic for floating point minimum
and maximum which has the same semantics as llvm.minimum and
llvm.maximum, i.e. it supports NaNs and signed zeroes.

Diff Detail

Event Timeline

anna created this revision.Jun 7 2023, 6:47 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 7 2023, 6:47 AM
anna requested review of this revision.Jun 7 2023, 6:47 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 7 2023, 6:47 AM
anna added a reviewer: dmgreen.Jun 7 2023, 6:48 AM
nikic added a comment.Jun 7 2023, 8:33 AM

I'm fine with this addition, but...

We do not introduce a new SelectionDAG node for this, since we support
the lowering for these intrinsics by transforming them into a series of
shuffle and use the vectorized versions of llvm.minimum and
llvm.maximum (D152371).

You do need to add VECREDUCE nodes for this. The ExpandReductions pass is considered legacy functionality, mostly for use by X86. Targets like AArch64 do not use it and rely on VECREDUCE legalization instead.

anna added a comment.EditedJun 7 2023, 9:57 AM

You do need to add VECREDUCE nodes for this. The ExpandReductions pass is considered legacy functionality, mostly for use by X86. Targets like AArch64 do not use it and rely on VECREDUCE legalization instead.

I'm not really sure we need to introduce a node for this, since we have a generic lowering using the series of shuffles and vectorized fminimum.
There is a TTI hook for expanding reductions based on the intrinsic TTI->shouldExpandReduction(II), which calls to various <Target>TTIImpl (it's false for most targets except RISCV and X86).
We can change shouldExpandReduction for all targets <Target>TTIImpl to return true if II are these two reduction intrinsic and the base TTIImpl as well (for new targets).

I'm just curious what does relying on legalization helps with? Is it just a cleaner approach?

arsenm added a subscriber: arsenm.Jun 7 2023, 9:59 AM
arsenm added inline comments.
llvm/docs/LangRef.rst
17874–17875

This is the definition of the nnan flag, there's no point in re-documenting it on an individual operation

nikic added a comment.Jun 7 2023, 1:18 PM

I'm just curious what does relying on legalization helps with? Is it just a cleaner approach?

Basically yes. It makes it easy to custom lower for targets that have native reductions and produces a sensible tree legalization (that is easy to isel match) otherwise. The shuffle representation is the only way we have in IR, but it's not a convenient representation in DAG.

anna updated this revision to Diff 530077.Jun 9 2023, 2:10 PM

Introduced SelectionDAG nodes, legalization etc. Also, some tests in X86 and AArch64 to make sure that legalization works and the code is lowered using expandVecReduce.

anna updated this revision to Diff 530078.Jun 9 2023, 2:12 PM

addressed review comment about redundant line in langref.

anna marked an inline comment as done.Jun 9 2023, 2:12 PM
anna edited the summary of this revision. (Show Details)
craig.topper added inline comments.
llvm/include/llvm/CodeGen/ISDOpcodes.h
1321

FMAX/FMIN support NaNs they just don't propagate them unless all inputs are NaN.

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
12393

The neutral element for FMINIMUM and FMAXIMUM shouldn't be NaN. That would turn the entire result into NaN.

anna added a comment.Jun 9 2023, 2:16 PM

I haven't added any custom actions for any targets, since that is out of the scope of this patch (any improvements can be done over this code).

anna added inline comments.Jun 9 2023, 2:24 PM
llvm/include/llvm/CodeGen/ISDOpcodes.h
1321

yes, the way of propagation is different. will update comment.

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
12393

you're right. I completely missed what "neutral" actually means here. Would this be correct:

case ISD::FMAXIMUM:
case ISD::FMINIMUM:
  
const fltSemantics &Semantics = EVTToAPFloatSemantics(VT);
    APFloat NeutralAF = 
                        !Flags.hasNoInfs() ? APFloat::getInf(Semantics) :
                        APFloat::getLargest(Semantics);
    if (Opcode == ISD::FMAXIMUM)
      NeutralAF.changeSign();

i.e. neutral is either inf (if present) or the largest element for the VT.

nikic added inline comments.Jun 9 2023, 2:44 PM
llvm/test/CodeGen/AArch64/vecreduce-fmaximum.ll
4

I'd suggest to copy over the full set of tests from llvm/test/CodeGen/AArch64/vecreduce-fmax-legalization.ll here, which should have test coverage for most of the legalizations.

anna added inline comments.Jun 9 2023, 3:54 PM
llvm/test/CodeGen/AArch64/vecreduce-fmaximum.ll
4

okay, I tried that and this test fails at AArch64ISel:

define half @test_v4f16(<4 x half> %a) nounwind {
  %b = call half @llvm.vector.reduce.fmaximum.v4f16(<4 x half> %a)
  ret half %b
}

It passes with +fullfp16 flag passed in. The reason looks like for AArch64 for f16 type, the default setOperationAction is PROMOTE for the baseOpCode being FMAXIMUM. When FP16 flag is passed in, setOperationAction is LEGAL, which satisfies the constraint isOperationLegalOrCustom for vector types in expandVecReduce. I think this needs fixing in AArch64 backend lowering (by adding some custom lowering for FMAXIMUM?), but I do not know this enough to fix it.

anna added inline comments.Jun 9 2023, 7:51 PM
llvm/test/CodeGen/AArch64/vecreduce-fmaximum.ll
4

RootCaused the issue through debugging the selection DAG: we had a latent bug where we missed promotion for FPMINIMUM and FPMAXIMUM nodes in LegalizeDAG. I'll either fix it within this patch or add a separate one.

I haven't added any custom actions for any targets, since that is out of the scope of this patch (any improvements can be done over this code).

Sounds good. I'm happy to look into the legal instructions for AArch64. As far as I understand fminimum/fmaximum were relatively uncommon so there may be holes in the lowering. Let me know if I can help.

anna added a comment.Jun 12 2023, 9:06 AM

I haven't added any custom actions for any targets, since that is out of the scope of this patch (any improvements can be done over this code).

Sounds good. I'm happy to look into the legal instructions for AArch64. As far as I understand fminimum/fmaximum were relatively uncommon so there may be holes in the lowering. Let me know if I can help.

Thanks @dmgreen. As far as I can tell, the patch I've linked to this one should close the legalization bugs showing up in AArch64 for fmaximum/fminimum.

anna updated this revision to Diff 530652.Jun 12 2023, 1:25 PM
anna marked an inline comment as done.
anna edited the summary of this revision. (Show Details)

addressed review comments. Added more tests in legalization.

nikic added inline comments.Jun 12 2023, 1:50 PM
llvm/include/llvm/IR/IRBuilder.h
761

"supports NaNs and signed zeros" -> "follows the NaN and signed zero semantics of llvm.maximum" or so.

llvm/test/CodeGen/AArch64/vecreduce-fmaximum.ll
21

Why do these all have nnan fmf?

anna added inline comments.Jun 12 2023, 2:01 PM
llvm/test/CodeGen/AArch64/vecreduce-fmaximum.ll
21

I'll change this to (nnan for test we actually code generate v16f32 and one without for the same test) and remove for the rest of the tests.

anna added inline comments.Jun 12 2023, 2:08 PM
llvm/test/CodeGen/AArch64/vecreduce-fmaximum.ll
21

removed nnan from all. they don't make a difference in the generated assembly. I thought code would be optimized for the nnan case.

anna updated this revision to Diff 530663.Jun 12 2023, 2:09 PM

updated tests. addressed review.

anna marked 2 inline comments as done.Jun 12 2023, 2:10 PM
nikic accepted this revision.Jun 12 2023, 2:19 PM

LGTM

llvm/test/CodeGen/AArch64/vecreduce-fmaximum.ll
185

Can you please restore the ninf variant of this test? That one was useful to check the different neutral element used for the padding in the widened vector.

This revision is now accepted and ready to land.Jun 12 2023, 2:19 PM
anna added inline comments.Jun 12 2023, 7:55 PM
llvm/test/CodeGen/AArch64/vecreduce-fmaximum.ll
185

good point. I've also restored it and added a comment for the test.

anna updated this revision to Diff 530941.Jun 13 2023, 9:27 AM

updated tests.

This revision was landed with ongoing or failed builds.Jun 13 2023, 9:30 AM
This revision was automatically updated to reflect the committed changes.