This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE] Optimize ptrue predicate pattern with known sve register width.
ClosedPublic

Authored by junparser on Aug 25 2021, 8:12 AM.

Details

Summary

For vectors that are exactly equal to getMaxSVEVectorSizeInBits, just use
AArch64SVEPredPattern::all, which can enable the use of unpredicated ptrue when available.

TestPlan: check-llvm

Diff Detail

Event Timeline

junparser created this revision.Aug 25 2021, 8:12 AM
junparser requested review of this revision.Aug 25 2021, 8:12 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 25 2021, 8:12 AM
paulwalker-arm added inline comments.Aug 25 2021, 8:29 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
3887–3890

There should already be placeholder for this logic. If you look at getPredicateForFixedLengthVector you'll see a TODO comment.

junparser added inline comments.Aug 25 2021, 8:00 PM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
3887–3890

I uniformly use getPTrue for all of the creation of ptrue in NFC patch, then we can even handle sve.ptrue intrinsic which we have seen in some cases.

paulwalker-arm added inline comments.Aug 26 2021, 3:48 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
3887–3890

This doesn't really change my mind. getPTrue exists as purely a convenience routine, which I think should always emit exactly what the caller asks for. The design for fixed length code generation already has a placeholder for this logic.

Which just leaves the intrinsic case. This is best handled explicitly, either when lowering the intrinsic or perhaps even as an instcombine which can open up more optimisation opportunities.

junparser added inline comments.Aug 26 2021, 4:21 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
3887–3890

make senseable to me, will update.

junparser updated this revision to Diff 368851.Aug 26 2021, 4:55 AM

Address comment. Only deal with fixed length vector.

Matt added a subscriber: Matt.Aug 26 2021, 10:04 AM
paulwalker-arm accepted this revision.Aug 27 2021, 3:10 AM

I know the patch triggers existing test changes but I'd prefer to have an explicit test for this functionality. Noting complicated, perhaps just a 512bit vector add/fadd test for each element type. You can probably just copy them from sve-fixed-length-{int,fp}-arith.ll (i.e. add_v256i8, add_v128i16 ...) and let update_llc_test_checks.py do its stuff.

This revision is now accepted and ready to land.Aug 27 2021, 3:10 AM
junparser updated this revision to Diff 369073.Aug 27 2021, 4:50 AM

add testcase.

This revision was landed with ongoing or failed builds.Aug 27 2021, 5:04 AM
This revision was automatically updated to reflect the committed changes.