This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE] Fold fadda(ptrue, x, select(mask, y, -0.0)) into fadda(mask, x, y)
ClosedPublic

Authored by RosieSumpter on Jul 13 2022, 1:58 AM.

Details

Summary

This patch adds an SVE pattern to recognize the use of a select with an
fadda in the form fadda(ptrue, x, select(mask, y, -0.0)). In this case
the select can be folded away, with the select mask used as the
predicate for fadda. This improves the codegen when vectorizing loops
with ordered fp reductions.

Tests have been added for the f16 cases, although the folding doesn't
currently work for these as -0.0 isn't treated as a legal f16 immediate.

Diff Detail

Event Timeline

RosieSumpter created this revision.Jul 13 2022, 1:58 AM
Herald added a project: Restricted Project. · View Herald Transcript
RosieSumpter requested review of this revision.Jul 13 2022, 1:58 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 13 2022, 1:58 AM
Matt added a subscriber: Matt.Jul 13 2022, 4:17 PM
paulwalker-arm added inline comments.
llvm/lib/Target/AArch64/SVEInstrFormats.td
5129–5134 ↗(On Diff #444189)

It is possible to change AArch64fadda_p into a PatFrags to remove the need for these? See AArch64bic_node and AArch64bic.

RosieSumpter marked an inline comment as done.
  • Used PatFrags
llvm/lib/Target/AArch64/SVEInstrFormats.td
5129–5134 ↗(On Diff #444189)

Hey Paul, I've had a go at using PatFrags instead - is this what you had in mind?

paulwalker-arm added inline comments.Jul 15 2022, 9:20 AM
llvm/lib/Target/AArch64/SVEInstrFormats.td
5129–5134 ↗(On Diff #444189)

Not quite, which make me think I've asked for something that's not possible. I'll investigate further.

paulwalker-arm added inline comments.Jul 15 2022, 9:36 AM
llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
287–290

You shouldn't need duplicate patterns within the PatFrag based on types as this kind of defeats the point of a PatFrags. The main exception to this is the scalar operand of splat_vector due to the way it's defined.

The problem here is that you cannot infer the types which is down to the loose definition of AArch64fadda_p_node or rather SDT_AArch64ReduceWithInit. You'll have to double check it's correct but when I tighten it up to say

SDTCisVec<1>, SDTCVecEltisVT<1,i1>, SDTCisVec<3>, SDTCisSameNumEltsAs<1,3>

i.e. The first operand is a predicate vector and has the same number of elements as the third operand.

I only needed a single f32 and f64 pattern without any other types mentioned.

RosieSumpter marked an inline comment as done.
  • Change definition of SDT_AArch64ReduceWithInit so that fadda patterns don't need to be replicated for different predicate vector types.
paulwalker-arm accepted this revision.Jul 18 2022, 5:02 AM

Given AArch64ISD::FADDA_PRED exists I guess there's an argument this could be a DAG combine. However, this isel route looks clean enough and you're adding things that will be beneficial regardless.

This revision is now accepted and ready to land.Jul 18 2022, 5:02 AM
This revision was landed with ongoing or failed builds.Jul 19 2022, 12:39 AM
This revision was automatically updated to reflect the committed changes.