Page MenuHomePhabricator

[AArch64] Use faddp to implement fadd reductions.
Needs ReviewPublic

Authored by sdesmalen on Mar 12 2019, 8:42 AM.



Custom-expand legal VECREDUCE_FADD SDNodes
to benefit from pair-wise faddp instructions.

Diff Detail

Event Timeline

sdesmalen created this revision.Mar 12 2019, 8:42 AM
RKSimon added inline comments.Mar 12 2019, 9:05 AM
2 ↗(On Diff #190266)

Can you commit this to trunk with the current codegen so this patch shows the codegen delta?

nikic added a comment.Mar 12 2019, 9:58 AM

Would it be possible to avoid the FADDV node and directly match on VECREDUCE_FADD? It would be nice to replace all of these target specific nodes with their generic equivalents.

sdesmalen updated this revision to Diff 190422.Mar 13 2019, 8:19 AM

This patch now matches ISD::VECREDUCE_FADD directly, and shows the diff with test/CodeGen/AArch64/vecreduce-fadd.ll.

sdesmalen marked an inline comment as done.Mar 13 2019, 8:20 AM
nikic resigned from this revision.Mar 13 2019, 1:25 PM

The change generally looks good to me, but I'm not familiar enough with isel to review this on a technical level.

RKSimon added inline comments.Mar 13 2019, 1:48 PM

What happened to the accumulator argument? IMO we shouldn't have it all but the intrinsic currently defines it.

sdesmalen added inline comments.Mar 13 2019, 4:01 PM

The accumulator argument is not part of VECREDUCE_FADD, and is only represented in the VECREDUCE_STRICT_FADD for the reason that it _must_ start off with that value as part of the ordered reduction, where for the non-strict VECREDUCE_FADD it can be accumulated separately using a regular FADD. There should probably be a separate SDT_FPStrictVecReduce when we implement this for VECREDUCE_STRICT_FADD.

Looking in SelectionDAGBuilder::visitVectorReduce() I see that the accumulator operand is actually ignored for non-strict fadd reductions, which is something that we'll need to fix.

sdesmalen updated this revision to Diff 190597.Mar 14 2019, 4:17 AM

Updated the tests to use a scalar accumulator value (instead of a vector accumulator value, which was wrong, but seemed to be completely ignored by SelectionDAGBuilder).

nikic added a subscriber: nikic.Mar 22 2019, 10:02 AM
sdesmalen marked an inline comment as done.Apr 5 2019, 8:57 AM



Please note that I have pinged off the question on the accumulator argument of the LLVM IR intrinsic to the mailing list in an RFC.
For the current definition of the SDNodes, having no accumulator argument here is correct.

sdesmalen updated this revision to Diff 205039.Jun 17 2019, 5:09 AM
sdesmalen edited the summary of this revision. (Show Details)
  • Rebased patch.
  • Following D60261, the scalar accumulator is either folded away or added separately as a scalar value, and so this patch only needs to focus on optimising the vector reduction itself.