Page MenuHomePhabricator

[AArch64] Make nxv1i1 types a legal type for SVE.
ClosedPublic

Authored by sdesmalen on Jun 27 2022, 9:45 AM.

Details

Summary

One motivation to add support for these types are the LD1Q/ST1Q
instructions in SME, for which we have defined a number of load/store
intrinsics which at the moment still take a <vscale x 16 x i1> predicate
regardless of their element type.

This patch adds basic support for the nxv1i1 type such that it can be passed/returned
from functions, as well as some basic support to support some existing tests that
result in a nxv1i1 type. It also adds support for splats.

Other operations (e.g. insert/extract subvector, logical ops, etc) will be
supported in follow-up patches.

Diff Detail

Event Timeline

sdesmalen created this revision.Jun 27 2022, 9:45 AM
sdesmalen requested review of this revision.Jun 27 2022, 9:45 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 27 2022, 9:45 AM

I

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
6657

How are these changes related? I would have thought that if you're making v1i1 legal, that would avoid triggering any target-independent legalization infrastructure.

llvm/test/CodeGen/AArch64/sve-intrinsics-pred-creation.ll
48 ↗(On Diff #440277)

I don't think, in general, we guarantee that padding bits of SVE predicates are zeroed. So the punpklo should be unnecessary.

sdesmalen added inline comments.Jun 27 2022, 2:38 PM
llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
6657

They're needed for the existing <vscale x 1 x i1> test-case in llvm/test/CodeGen/AArch64/sve-select.ll:

select <vscale x 1 x i1> %p, <vscale x 1 x i64> %a, <vscale x 1 x i64> %dst

Before %p was assumed to be widened. But now, only %a and %b need widening, so it calls ModifyToType to widen <vscale x 1 x i1> %p as well to <vscale x 1 x i1>. This leads to the extra uzp1.

llvm/test/CodeGen/AArch64/sve-intrinsics-pred-creation.ll
48 ↗(On Diff #440277)

I wasn't sure about this to be honest. If we can assume that the other lanes are always zeroed, then we can avoid having to always unpack the predicate when using this value (rather than when creating it). For example, the patterns for AArch64ptest don't first mask the predicate with a ptrue <required element type> and the instruction always tests each bit. That made me question if it was safe to assume the other lanes are always known to be zeroed.

efriedma added inline comments.Jun 27 2022, 4:48 PM
llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
6657

Oh, that makes sense. And I guess that explains why the uzp1 shows up in sel_nxv1i64.

llvm/test/CodeGen/AArch64/sve-intrinsics-pred-creation.ll
48 ↗(On Diff #440277)

As far as I know, nothing has changed since we last discussed this in https://reviews.llvm.org/D74471 . Unless I'm forgetting something...

Matt added a subscriber: Matt.Jun 28 2022, 2:22 PM
sdesmalen updated this revision to Diff 441042.Jun 29 2022, 9:20 AM

Moved some of the patterns to ISelLowering which simplifies the patch a bit.

sdesmalen added inline comments.Jun 29 2022, 9:22 AM
llvm/test/CodeGen/AArch64/sve-intrinsics-pred-creation.ll
48 ↗(On Diff #440277)

You're right, in general the other lanes in the vector are indeed undef.

For the ptrue/splat(1) case I think we want to make an exception and make sure we generate a predicate where the other lanes are zeroed, such that we can use this value to AND with other predicates for explicitly zeroing the other lanes.

efriedma accepted this revision.Jun 29 2022, 1:30 PM

LGTM

llvm/test/CodeGen/AArch64/sve-intrinsics-pred-creation.ll
48 ↗(On Diff #440277)

I agree we need some way to generate a predicate with the other lanes zeroed.

I'm not sure llvm.aarch64.sve.ptrue.nxv1i1 should map to that; if we convert.to.svbool or whatever, that should zero out the other bits anyway. But maybe it's less confusing this way if you have optimizations that specifically check for "ptrue".

This revision is now accepted and ready to land.Jun 29 2022, 1:30 PM
paulwalker-arm added inline comments.Jun 29 2022, 4:24 PM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
4332–4336

Strictly speaking this is not correct for all predicate patterns. Do you need a truly generic implementation of PTRUE.Q or do you only care about specific cases?

4680–4681

Do we really need this or can we fix the cause, which I'm presuming is LowerSPLAT_VECTOR. I'd rather not give the impression we're extending the SVE intrinsics for types they're not intended to support.

llvm/test/CodeGen/AArch64/sve-intrinsics-pred-creation.ll
48 ↗(On Diff #440277)

I think the confusion here is that the test gives the impression we're extending the intrinsic whereas it's really just a route to exercise some code generation logic. However, for ISD::PTRUE we do rely on the implicit zeroing and so to me it makes sense for the emulated PTRUE.Q to honour the same commitment.

sdesmalen updated this revision to Diff 441350.Jun 30 2022, 4:31 AM
sdesmalen marked 6 inline comments as done.
  • Removed lowering for ptrue intrinsic.
  • Moved nxv1i1 lowering from whilelo to splat_vector.

Sorry for the incremental review. This was not intentional, just how my code review time worked out for this patch.

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
6663–6667

A few weeks back I extended TypeSize.h to include methods that allow us to handle such cases without resorting to getVectorMinNumElements(). See hasKnownScalarFactor and getKnownScalarFactor. The > code can use NVT.bitsGT(InVT).

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
10557

Rather than using PUNPKLO directly can you use ISD::EXTRACT_SUBVECTOR?

sdesmalen updated this revision to Diff 441639.Jul 1 2022, 1:54 AM
sdesmalen marked 2 inline comments as done.
  • Use ISD::EXTRACT_SUBVECTOR instead of creating PUNPKLO directly.
  • Use getKnownScalarFactor with ElementCount instead of operating on known minimum lanes.
llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
6663–6667

Thanks for pointing me to those interfaces, I missed that patch while I was OoO.
For the comparison, I believe that I can remove it entirely because X % Y where Y > X is always X and thus never 0.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
10557

Good point!

paulwalker-arm accepted this revision.Jul 1 2022, 5:57 AM

One potential nit but otherwise this and the other i1 work is looking really good.

llvm/lib/Target/AArch64/AArch64CallingConvention.td
87–88

This is not relevant to this patch just observational.

For my own education do you know what this means? If I was to guess I'd say "we can pass such parameter types through memory"? If correct then I believe we only support nxv16i1 types when going throw memory.

llvm/lib/Target/AArch64/SVEInstrFormats.td
6263

Do we still need this change?

This revision was landed with ongoing or failed builds.Jul 1 2022, 8:13 AM
This revision was automatically updated to reflect the committed changes.
sdesmalen marked an inline comment as done.
sdesmalen added inline comments.Jul 1 2022, 8:13 AM
llvm/lib/Target/AArch64/AArch64CallingConvention.td
87–88

Yes, that's what it means. Fortunately it's not something we'll currently hit from user code, because that will always use <vscale x 16 x i1> types for svbool_t, but I guess we should clean this up at some point.

llvm/lib/Target/AArch64/SVEInstrFormats.td
6263

Nope, good catch, I'll remove it.