Page MenuHomePhabricator

[AArch64] Use faddp to implement fadd reductions.

Authored by sdesmalen on Mar 12 2019, 8:42 AM.



Custom-expand legal VECREDUCE_FADD SDNodes
to benefit from pair-wise faddp instructions.

Diff Detail

Event Timeline

sdesmalen created this revision.Mar 12 2019, 8:42 AM
RKSimon added inline comments.Mar 12 2019, 9:05 AM
2 ↗(On Diff #190266)

Can you commit this to trunk with the current codegen so this patch shows the codegen delta?

nikic added a comment.Mar 12 2019, 9:58 AM

Would it be possible to avoid the FADDV node and directly match on VECREDUCE_FADD? It would be nice to replace all of these target specific nodes with their generic equivalents.

sdesmalen updated this revision to Diff 190422.Mar 13 2019, 8:19 AM

This patch now matches ISD::VECREDUCE_FADD directly, and shows the diff with test/CodeGen/AArch64/vecreduce-fadd.ll.

sdesmalen marked an inline comment as done.Mar 13 2019, 8:20 AM
nikic resigned from this revision.Mar 13 2019, 1:25 PM

The change generally looks good to me, but I'm not familiar enough with isel to review this on a technical level.

RKSimon added inline comments.Mar 13 2019, 1:48 PM
284 ↗(On Diff #190422)

What happened to the accumulator argument? IMO we shouldn't have it all but the intrinsic currently defines it.

sdesmalen added inline comments.Mar 13 2019, 4:01 PM
284 ↗(On Diff #190422)

The accumulator argument is not part of VECREDUCE_FADD, and is only represented in the VECREDUCE_STRICT_FADD for the reason that it _must_ start off with that value as part of the ordered reduction, where for the non-strict VECREDUCE_FADD it can be accumulated separately using a regular FADD. There should probably be a separate SDT_FPStrictVecReduce when we implement this for VECREDUCE_STRICT_FADD.

Looking in SelectionDAGBuilder::visitVectorReduce() I see that the accumulator operand is actually ignored for non-strict fadd reductions, which is something that we'll need to fix.

sdesmalen updated this revision to Diff 190597.Mar 14 2019, 4:17 AM

Updated the tests to use a scalar accumulator value (instead of a vector accumulator value, which was wrong, but seemed to be completely ignored by SelectionDAGBuilder).

nikic added a subscriber: nikic.Mar 22 2019, 10:02 AM
sdesmalen marked an inline comment as done.Apr 5 2019, 8:57 AM


284 ↗(On Diff #190422)

Please note that I have pinged off the question on the accumulator argument of the LLVM IR intrinsic to the mailing list in an RFC.
For the current definition of the SDNodes, having no accumulator argument here is correct.

sdesmalen updated this revision to Diff 205039.Jun 17 2019, 5:09 AM
sdesmalen edited the summary of this revision. (Show Details)
  • Rebased patch.
  • Following D60261, the scalar accumulator is either folded away or added separately as a scalar value, and so this patch only needs to focus on optimising the vector reduction itself.

@sdesmalen whatever happened to this patch?

@sdesmalen whatever happened to this patch?

I totally forgot about this patch tbh, but I think the logic still applies. I'll rebase the patch.

dmgreen added inline comments.Jan 4 2021, 7:05 AM
4425 ↗(On Diff #205039)

Should/could the second Rn be IMPLICIT_DEF? Same for below.

sdesmalen updated this revision to Diff 314385.Jan 4 2021, 8:41 AM
  • Rebased patch
  • Use IMPLICIT_DEF for second source operand to faddp.
Herald added a project: Restricted Project. · View Herald TranscriptJan 4 2021, 8:41 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
dmgreen accepted this revision.Jan 4 2021, 10:24 AM

Thanks. LGTM


Can you make sure there is some test where the first element isn't -0.0. (I think it should work fine, but it would be good to make sure there is a test for it)

This revision is now accepted and ready to land.Jan 4 2021, 10:24 AM
sdesmalen updated this revision to Diff 314520.Jan 5 2021, 1:08 AM
sdesmalen marked an inline comment as done.

Added test with different start value for fadd reduction.

This revision was automatically updated to reflect the committed changes.