This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE] Convert SRSHL to LSL when the fed from an ABS intrinsic
ClosedPublic

Authored by bsmith on May 9 2022, 7:19 AM.

Diff Detail

Event Timeline

bsmith created this revision.May 9 2022, 7:19 AM
bsmith requested review of this revision.May 9 2022, 7:19 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 9 2022, 7:19 AM

Hi @bsmith, this looks like a sensible optimisation! I suppose we can also do something similar when the input is an and too? i.e.

%mask = tail call <vscale x 8 x i16> @llvm.aarch64.sve.dup.x.nxv8i16(i16 0x7FFF)
%and = tail call <vscale x 8 x i16> @llvm.aarch64.sve.and.nxv8i16(<vscale x 8 x i1> %pg, <vscale x 8 x i16> %a, <vscale x 8 x i16> %mask)
%splat = tail call <vscale x 8 x i16> @llvm.aarch64.sve.dup.x.nxv8i16(i16 2)
%shr = tail call <vscale x 8 x i16> @llvm.aarch64.sve.srshl.nxv8i16(<vscale x 8 x i1> %pg, <vscale x 8 x i16> %and, <vscale x 8 x i16> %splat)
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1230

Do you mean SRSHL instead of SQSHL in all the comments?

llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-abs-srshl.ll
43

nit: Maybe it's worth splitting the tests out into Positive and Negative tests, i.e. top half positive, bottom half negative? I think that makes it a bit easier to see what's going on.

Hi @bsmith, this looks like a sensible optimisation! I suppose we can also do something similar when the input is an and too? i.e.

%mask = tail call <vscale x 8 x i16> @llvm.aarch64.sve.dup.x.nxv8i16(i16 0x7FFF)
%and = tail call <vscale x 8 x i16> @llvm.aarch64.sve.and.nxv8i16(<vscale x 8 x i1> %pg, <vscale x 8 x i16> %a, <vscale x 8 x i16> %mask)
%splat = tail call <vscale x 8 x i16> @llvm.aarch64.sve.dup.x.nxv8i16(i16 2)
%shr = tail call <vscale x 8 x i16> @llvm.aarch64.sve.srshl.nxv8i16(<vscale x 8 x i1> %pg, <vscale x 8 x i16> %and, <vscale x 8 x i16> %splat)

I think it's probably not worth handling a case like this, the abs case is one we have explicitly seen in real world code. We could in theory go further and add even more cases (shr also, for example), but given there is no obvious generic way to do this we'd have to add loads of special cases for each intrinsic we might care about, all just in case someone happened to write/end up with suboptimal ACLE code, (srshl is not a compiler generated thing, it will only come from ACLE intrinsics). I think it's probably best to only extend this if and when we need it.

bsmith updated this revision to Diff 428668.May 11 2022, 8:07 AM
  • Fix typos in comments, SQSHL -> SRSHL.
  • Rearrange tests to have positive cases first.
This comment was removed by bsmith.
paulwalker-arm added inline comments.May 16 2022, 6:15 AM
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1262–1265

I don't suppose m_NonNegative() works, or can be made to work, here?

Ideally I'd hope for something similar in the higher up code paths so that perhaps we can catch more cases for free but I think realistically this is the only place where it might work today out-the-box.

bsmith updated this revision to Diff 429708.May 16 2022, 7:13 AM
  • Use m_NonNegative instead of manually checking for splats and their values
paulwalker-arm accepted this revision.May 17 2022, 10:17 AM
paulwalker-arm added inline comments.
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1234–1235

I think you can use IRBuilder<> Builder(II); here.

This revision is now accepted and ready to land.May 17 2022, 10:17 AM
This revision was landed with ongoing or failed builds.May 19 2022, 7:08 AM
This revision was automatically updated to reflect the committed changes.