This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SelectionDAG] Add legalization for widening LOAD/MLOAD.
Needs ReviewPublic

Authored by efriedma on Jul 14 2021, 5:32 PM.

Details

Summary

By the magic of masked loads, a widened MLOAD is almost identical to the original MLOAD.

Need to handle a few more INSERT_SUBVECTOR legalization cases to avoid crashing on the testcase.

The code for computing the mask is unfortunately not very efficient; maybe we need a target-independent version of whilelo? Or a DAGCombine to form whilelo?

Diff Detail

Event Timeline

efriedma created this revision.Jul 14 2021, 5:32 PM
efriedma requested review of this revision.Jul 14 2021, 5:32 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 14 2021, 5:32 PM
sdesmalen added inline comments.Jul 19 2021, 1:14 PM
llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
4048

Do we also want to do this for other element-counts that are not a power of 2? (e.g. <vscale x 6 x i8>)

Was this code intended to work for <vscale x 1 x eltty> ? (I tried that a few days ago with this patch and ran into some failures. I didn't really investigate further though)

4049

nit: VT.isKnownMultipleOf(2)

4135

Should MemoryLocation::UnknownSize only be used for the scalable case?

efriedma added inline comments.Jul 19 2021, 1:34 PM
llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
4048

Should work for 1, I guess? Not sure I actually tried it. But note we only support a very limited set of operations. See also D105591.

For <vscale x 6 x i8>, we can use a different sequence: generate extloads for <vscale x 2 x i64> and <vscale x 4 x i32>, and merge them together. GenWidenVectorLoads should handle that in theory; not sure how well it works in practice.

I'll add more testcases, in any case.

4135

There's a general restriction on memoperands in selectiondag: the size of the operand must be at least the size of the memory VT. I didn't really want to think about precisely what MemoryLocation we could be using instead...

efriedma updated this revision to Diff 359931.Jul 19 2021, 3:02 PM

More tests. Make "<vscale x 1 x i32>" loads work.

Matt added a subscriber: Matt.Jul 20 2021, 6:48 AM

Apologies for the delay in reviewing this, it has been on my to-do list for quite a while!

Thanks for adding support for <vscale x 1 x eltty> types and adding a test for widening the even-numbered ElementCounts (e.g. <vscale x 6 x i16>), it's nice to see that this works too.

Other than some minor comments, the patch looks mostly fine to me.

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
4881–4883 ↗(On Diff #359931)

nit: Is/would this be the same as:

SDValue Op0 = N->getOperand(0);
TLI.getTypeToTransformTo(*DAG.getContext(), Op0.getValueType());

?

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
4011

This code is identical to the block on lines 4022-4026, with the exception of s/NumElements/MinNumElements. Can you align the implementation in the two blocks and remove the if (VT.isScalableVector())? (and then move the report_fatal_error to line 4027)

4017

nit: unnecessary curly braces.

4048

Is the FIXME still relevant?

For <vscale x 6 x i8>, we can use a different sequence: generate extloads for <vscale x 2 x i64> and <vscale x 4 x i32>, and merge them together. GenWidenVectorLoads should handle that in theory; not sure how well it works in practice.
I'll add more testcases, in any case.

Nice. From what I can see from the test you've added, <vscale x 6 x i16> is handled correctly by GenWidenVectorLoads.

4054

nit: can you move WidenVT closer to where it's used? (above line 4079)

llvm/test/CodeGen/AArch64/sve-split-load.ll
72

sve-split-load.ll may no longer be the best name for this file given that these new tests are about widening?

efriedma added inline comments.Aug 3 2021, 1:40 PM
llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
4881–4883 ↗(On Diff #359931)

No, it's not the same.

We need to extend the element VT to the element VT of the result type. For SVE, the resulting vector type is usually promotable. If it is, we deal with it in PromoteIntOp_CONCAT_VECTORS.

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
4048

Yes, it's still an issue. The constant "2" is specific to SVE.

The question we need to figure out here is, will GenWidenVectorLoads succeed (and be profitable)? For SVE, it should succeed for any multiple of 2, because we can use 64-bit extending loads. Not sure about profitability for SVE; the generated code is a bit messy, but the messy bits are all invariant (and probably less messy once we start using whilelo optimally).

Other vector instruction sets might have different restrictions.