This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Insert subvector costs
ClosedPublic

Authored by dmgreen on Mar 3 2022, 2:22 AM.

Details

Summary

An insert subvector under aarch64 can often be done as a single lane mov operation. For example a v4i8 inserted into a v16i8 is a "s" mov, so long as the index is a multiple of 4. This teaches the cost model that, using code copied over from the X86 backend.

Some of the costs (v16i16_4_0) are still high because they get matched as a "SK_Select", not an "SK_InsertSubvector". D120879 has some codegen tests for inserting subvectors, which I will add as llvm/test/CodeGen/AArch64/insert-subvector.ll.

Diff Detail

Event Timeline

dmgreen created this revision.Mar 3 2022, 2:22 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 3 2022, 2:22 AM
dmgreen requested review of this revision.Mar 3 2022, 2:22 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 3 2022, 2:22 AM
dmgreen edited the summary of this revision. (Show Details)Mar 3 2022, 2:23 AM

Do we care about vector alignment? I am just looking at the x86 code, which does. Also, does this change apply to subvector extract too?

sdesmalen added inline comments.Mar 8 2022, 6:30 AM
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
2709

nit: It seems more natural to write isa<FixedVectorType>(Tp)

2710

If you check this is a fixed-width vector using LT.second.isFixedLengthVector(), you can remove the test for Tp being a fixed-length vector above.

2711

nit: move closer to use.

2714

I think this code silently assumes fixed-width == NEON, so I think it's worth adding a check that vector type being inserted into is less than 128bits. (and something similar for the value of the index being within a certain range). For fixed-length SVE vectors where wider vectors are legal, these ranges may go beyond 128bits.

It would also be useful to have a test for that case.

Do we care about vector alignment? I am just looking at the x86 code, which does. Also, does this change apply to subvector extract too?

By vector alignment - do you mean the alignment of the subvector inserted into the larger vector? If so - yep. We can insert aligned elements (a v4i8 inserted into a v16i8 is just an f32 lane mov providing the lane % 4 == 0), but not unaligned values.

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
2714

It was intending to work on larger-than-legal types. As in a v16i8 inserted (aligned) into a v32i8 isn't really an insert after all, it's just legalized to being a different register, and should be cheap as a result. llvm shuffles can always be larger than legal types. I can add a check that the legal type is at most 128bits.

dmgreen updated this revision to Diff 417220.Mar 22 2022, 2:27 AM
Matt added a subscriber: Matt.Mar 23 2022, 12:54 PM

Ping - any further comments?

sdesmalen accepted this revision.Apr 7 2022, 1:44 AM
sdesmalen added inline comments.
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
2713

nit: Is this condition already implied by the fact that Kind == TTI::SK_InsertSubvector?

This revision is now accepted and ready to land.Apr 7 2022, 1:44 AM
dmgreen added inline comments.Apr 7 2022, 2:14 AM
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
2713

It is trying to defend against strange types which legalize to a scalar type.

This revision was landed with ongoing or failed builds.Apr 7 2022, 11:27 AM
This revision was automatically updated to reflect the committed changes.