This patch introduces the reduction intrinsic for floating point minimum
and maximum which has the same semantics as llvm.minimum and
llvm.maximum, i.e. it supports NaNs and signed zeroes.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
I'm fine with this addition, but...
We do not introduce a new SelectionDAG node for this, since we support
the lowering for these intrinsics by transforming them into a series of
shuffle and use the vectorized versions of llvm.minimum and
llvm.maximum (D152371).
You do need to add VECREDUCE nodes for this. The ExpandReductions pass is considered legacy functionality, mostly for use by X86. Targets like AArch64 do not use it and rely on VECREDUCE legalization instead.
You do need to add VECREDUCE nodes for this. The ExpandReductions pass is considered legacy functionality, mostly for use by X86. Targets like AArch64 do not use it and rely on VECREDUCE legalization instead.
I'm not really sure we need to introduce a node for this, since we have a generic lowering using the series of shuffles and vectorized fminimum.
There is a TTI hook for expanding reductions based on the intrinsic TTI->shouldExpandReduction(II), which calls to various <Target>TTIImpl (it's false for most targets except RISCV and X86).
We can change shouldExpandReduction for all targets <Target>TTIImpl to return true if II are these two reduction intrinsic and the base TTIImpl as well (for new targets).
I'm just curious what does relying on legalization helps with? Is it just a cleaner approach?
llvm/docs/LangRef.rst | ||
---|---|---|
17874–17875 | This is the definition of the nnan flag, there's no point in re-documenting it on an individual operation |
Basically yes. It makes it easy to custom lower for targets that have native reductions and produces a sensible tree legalization (that is easy to isel match) otherwise. The shuffle representation is the only way we have in IR, but it's not a convenient representation in DAG.
Introduced SelectionDAG nodes, legalization etc. Also, some tests in X86 and AArch64 to make sure that legalization works and the code is lowered using expandVecReduce.
I haven't added any custom actions for any targets, since that is out of the scope of this patch (any improvements can be done over this code).
llvm/include/llvm/CodeGen/ISDOpcodes.h | ||
---|---|---|
1321 | yes, the way of propagation is different. will update comment. | |
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp | ||
12393 | you're right. I completely missed what "neutral" actually means here. Would this be correct: case ISD::FMAXIMUM: case ISD::FMINIMUM: const fltSemantics &Semantics = EVTToAPFloatSemantics(VT); APFloat NeutralAF = !Flags.hasNoInfs() ? APFloat::getInf(Semantics) : APFloat::getLargest(Semantics); if (Opcode == ISD::FMAXIMUM) NeutralAF.changeSign(); i.e. neutral is either inf (if present) or the largest element for the VT. |
llvm/test/CodeGen/AArch64/vecreduce-fmaximum.ll | ||
---|---|---|
4 | I'd suggest to copy over the full set of tests from llvm/test/CodeGen/AArch64/vecreduce-fmax-legalization.ll here, which should have test coverage for most of the legalizations. |
llvm/test/CodeGen/AArch64/vecreduce-fmaximum.ll | ||
---|---|---|
4 | okay, I tried that and this test fails at AArch64ISel: define half @test_v4f16(<4 x half> %a) nounwind { %b = call half @llvm.vector.reduce.fmaximum.v4f16(<4 x half> %a) ret half %b } It passes with +fullfp16 flag passed in. The reason looks like for AArch64 for f16 type, the default setOperationAction is PROMOTE for the baseOpCode being FMAXIMUM. When FP16 flag is passed in, setOperationAction is LEGAL, which satisfies the constraint isOperationLegalOrCustom for vector types in expandVecReduce. I think this needs fixing in AArch64 backend lowering (by adding some custom lowering for FMAXIMUM?), but I do not know this enough to fix it. |
llvm/test/CodeGen/AArch64/vecreduce-fmaximum.ll | ||
---|---|---|
4 | RootCaused the issue through debugging the selection DAG: we had a latent bug where we missed promotion for FPMINIMUM and FPMAXIMUM nodes in LegalizeDAG. I'll either fix it within this patch or add a separate one. |
I haven't added any custom actions for any targets, since that is out of the scope of this patch (any improvements can be done over this code).
Sounds good. I'm happy to look into the legal instructions for AArch64. As far as I understand fminimum/fmaximum were relatively uncommon so there may be holes in the lowering. Let me know if I can help.
Thanks @dmgreen. As far as I can tell, the patch I've linked to this one should close the legalization bugs showing up in AArch64 for fmaximum/fminimum.
llvm/test/CodeGen/AArch64/vecreduce-fmaximum.ll | ||
---|---|---|
21 | I'll change this to (nnan for test we actually code generate v16f32 and one without for the same test) and remove for the rest of the tests. |
llvm/test/CodeGen/AArch64/vecreduce-fmaximum.ll | ||
---|---|---|
21 | removed nnan from all. they don't make a difference in the generated assembly. I thought code would be optimized for the nnan case. |
LGTM
llvm/test/CodeGen/AArch64/vecreduce-fmaximum.ll | ||
---|---|---|
185 | Can you please restore the ninf variant of this test? That one was useful to check the different neutral element used for the padding in the widened vector. |
llvm/test/CodeGen/AArch64/vecreduce-fmaximum.ll | ||
---|---|---|
185 | good point. I've also restored it and added a comment for the test. |
This is the definition of the nnan flag, there's no point in re-documenting it on an individual operation