This is an archive of the discontinued LLVM Phabricator instance.

[AARCH64][SVE] Do not optimize vector conversions
ClosedPublic

Authored by bzinodev on Jan 10 2023, 2:53 PM.

Details

Summary

shuffle_vector instructions are serialized targeting SVE fixed vectors, see https://reviews.llvm.org/D139111. This patch disables optimizeExtendOrTruncateConversion peepholes that generates shuffle_vector.

Diff Detail

Event Timeline

bzinodev created this revision.Jan 10 2023, 2:53 PM
bzinodev requested review of this revision.Jan 10 2023, 2:53 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 10 2023, 2:53 PM
Matt added a subscriber: Matt.Jan 10 2023, 3:55 PM

Hi Zino,

Looks like the test case is failing? See https://reviews.llvm.org/harbormaster/unit/view/5711271/

I am wondering how big of a hammer this is. Are there no cases at all where doing this is beneficial?

Is the test case minimal? Think I see a loop, which we don't need? To see the codegen differences, it would be good if we can precommit a test, but then we want it more minimal if that is possible.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
14181

Nit: I would prefer a pointer to source-code (e.g. a function name) rather than a hyperlink, or just omit it if it is obvious.

It looks like this patch is SVE-related, rather than SME. Can you change the title from [AARCH64][SME] to [AARCH64][SVE] please? Thanks!

bzinodev updated this revision to Diff 488038.Jan 11 2023, 8:44 AM
bzinodev updated this revision to Diff 488316.Jan 11 2023, 11:43 AM
bzinodev retitled this revision from [AARCH64][SME] Do not optimize vector conversions to [AARCH64][SVE] Do not optimize vector conversions.
SjoerdMeijer accepted this revision.Jan 12 2023, 3:11 AM

Thanks for reducing the test case. I think bailing out early here makes sense indeed, so LGTM.

This revision is now accepted and ready to land.Jan 12 2023, 3:11 AM
paulwalker-arm added inline comments.Jan 13 2023, 5:21 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
14178

For clarification, is this a bad optimisation when SVE is available? or is it the case the current code generation for SVE is suboptimal?

bzinodev marked an inline comment as done.Jan 15 2023, 1:19 PM
bzinodev added inline comments.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
14178

In a nutshell I don't know. Is the zext peep that rely on tbl instruction is always profitable even when targeting NEON? I have limited access to ARM HW, this peep performs slower on my small toy test.

On SVE, disabling the peep, uunpklo is generated and it is fast enough. Please feel free to propose better code generation options. thanks

I have double checked with Zino that a significant performance uplift was observed for a benchmark app, measured on SVE hardware.
While Zino is in the process of getting llvm commit rights, I am happy/confident to land this on his behalf as the codegen improvement for the examples I have seen are obvious. I think we can iterate on this should there be other/better things to do in this area.

This revision was landed with ongoing or failed builds.Jan 19 2023, 8:50 AM
This revision was automatically updated to reflect the committed changes.