Custom-expand legal VECREDUCE_FADD SDNodes
to benefit from pair-wise faddp instructions.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
test/CodeGen/AArch64/aarch64-faddv.ll | ||
---|---|---|
2 ↗ | (On Diff #190266) | Can you commit this to trunk with the current codegen so this patch shows the codegen delta? |
Would it be possible to avoid the FADDV node and directly match on VECREDUCE_FADD? It would be nice to replace all of these target specific nodes with their generic equivalents.
This patch now matches ISD::VECREDUCE_FADD directly, and shows the diff with test/CodeGen/AArch64/vecreduce-fadd.ll.
The change generally looks good to me, but I'm not familiar enough with isel to review this on a technical level.
include/llvm/Target/TargetSelectionDAG.td | ||
---|---|---|
284 ↗ | (On Diff #190422) | What happened to the accumulator argument? IMO we shouldn't have it all but the intrinsic currently defines it. |
include/llvm/Target/TargetSelectionDAG.td | ||
---|---|---|
284 ↗ | (On Diff #190422) | The accumulator argument is not part of VECREDUCE_FADD, and is only represented in the VECREDUCE_STRICT_FADD for the reason that it _must_ start off with that value as part of the ordered reduction, where for the non-strict VECREDUCE_FADD it can be accumulated separately using a regular FADD. There should probably be a separate SDT_FPStrictVecReduce when we implement this for VECREDUCE_STRICT_FADD. Looking in SelectionDAGBuilder::visitVectorReduce() I see that the accumulator operand is actually ignored for non-strict fadd reductions, which is something that we'll need to fix. |
Updated the tests to use a scalar accumulator value (instead of a vector accumulator value, which was wrong, but seemed to be completely ignored by SelectionDAGBuilder).
ping!
include/llvm/Target/TargetSelectionDAG.td | ||
---|---|---|
284 ↗ | (On Diff #190422) | Please note that I have pinged off the question on the accumulator argument of the LLVM IR intrinsic to the mailing list in an RFC. |
- Rebased patch.
- Following D60261, the scalar accumulator is either folded away or added separately as a scalar value, and so this patch only needs to focus on optimising the vector reduction itself.
I totally forgot about this patch tbh, but I think the logic still applies. I'll rebase the patch.
lib/Target/AArch64/AArch64InstrInfo.td | ||
---|---|---|
4425 ↗ | (On Diff #205039) | Should/could the second Rn be IMPLICIT_DEF? Same for below. |
Thanks. LGTM
llvm/test/CodeGen/AArch64/vecreduce-fadd.ll | ||
---|---|---|
112 | Can you make sure there is some test where the first element isn't -0.0. (I think it should work fine, but it would be good to make sure there is a test for it) |
Can you make sure there is some test where the first element isn't -0.0. (I think it should work fine, but it would be good to make sure there is a test for it)