This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE] Add hadd and rhadd support
ClosedPublic

Authored by dmgreen on Aug 15 2022, 12:10 AM.

Details

Summary

This adds basic HADD and RHADD support for SVE, by marking the AVGFLOOR and AVGCEIL as custom and converting those to HADD_PRED/RHADD_PRED AArch64 nodes. Both the existing intrinsics and the _PRED nodes are then lowered to the _ZPmZ instructions.

The <vscale x 2 x i64> tests are added here as the underlying code extends to a <vscale x 2 x i128>, which fails to lower. The others are precommitted to show the differences.

Diff Detail

Event Timeline

dmgreen created this revision.Aug 15 2022, 12:10 AM
dmgreen created this object with visibility "No One".
Herald added a project: Restricted Project. · View Herald TranscriptAug 15 2022, 12:10 AM
dmgreen requested review of this revision.Aug 15 2022, 12:10 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 15 2022, 12:10 AM
dmgreen changed the visibility from "No One" to "Public (No Login Required)".
Matt added a subscriber: Matt.Aug 15 2022, 1:20 PM
dmgreen updated this revision to Diff 481229.Dec 8 2022, 4:05 AM
dmgreen edited the summary of this revision. (Show Details)

I think this now has enough in tree to allow it to work. I've given it a rebase, but it has been a very long time since I wrote it.

paulwalker-arm added inline comments.Dec 8 2022, 4:32 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
5188–5202

We use the _PRED suffix for nodes where the result of inactive lanes does not matter. This means this transform is not strictly safe (as in somebody might do something erroneous based on the name).

We have convertMergedOpToPredOp to convert from predicated intrinsics to _PRED nodes when it is safe to do so. The alternative is to change the ISD name but that seems wrong because for the IR side of things you really do want _PRED nodes to free up isel/register allocation by selecting.

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
3308–3309

This ties in with my previous comment because typically here we'd create SHADD_ZPZZ_UNDEF_B pseudo instructions to free up the register allocator. You don't need to bother with this for this patch if you don't want to but otherwise you'll need to handle AArch64shadd_p and int_aarch64_sve_shadd as separate patterns because they have different behaviour.

For an example of how we normally handle this see MUL_ZPmZ verses MUL_ZPZZ.

dmgreen updated this revision to Diff 481352.Dec 8 2022, 10:28 AM
dmgreen edited the summary of this revision. (Show Details)

Ah I see. Thanks. This alters the intrinsics to go via a PatFrag instead.

paulwalker-arm accepted this revision.Dec 12 2022, 2:15 AM

Small code placement issue but otherwise looks good.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1297–1304

I suspect you didn't mean to put this within the {MVT::nxv16i1, MVT::nxv8i1... loop? Perhaps it's worth placing within the existing loop for {MVT::nxv16i8, MVT::nxv8i16, MVT::nxv4i32, MVT::nxv2i64} and protecting with hasSVE2 there?

This revision is now accepted and ready to land.Dec 12 2022, 2:15 AM
This revision was landed with ongoing or failed builds.Dec 14 2022, 1:25 AM
This revision was automatically updated to reflect the committed changes.