This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE] Improve VECTOR_SPLICE codegen for VL > 128-bit
ClosedPublic

Authored by bsmith on Oct 5 2021, 5:13 AM.

Diff Detail

Event Timeline

bsmith created this revision.Oct 5 2021, 5:13 AM
bsmith requested review of this revision.Oct 5 2021, 5:13 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 5 2021, 5:13 AM
Matt added a subscriber: Matt.Oct 5 2021, 6:37 AM
paulwalker-arm added inline comments.Oct 5 2021, 7:58 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
7638–7639

I think this is better done within getNode() so as to eliminate the possibility as early as possible.

7641

getMinSVEVectorSizeInBits doesn't look relevant here. The restriction is linked to our expected isel to either INSR or EXT instructions. So I think we can be explicit here and use 2048/Ty.getVectorElementType().getSizeInBits() along with a suitable comment about the range of EXT's index operand.

bsmith added inline comments.Oct 5 2021, 8:47 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
7641

Is that really correct? If we just allow all vector sizes then surely you would end up with incorrect ext instructions? For example, if you had a VL of 128, a splice of a <vscale x 16 x i8> with an index of 31 would give an ext with an immediate of 31, which is only correct if your VL is >= 256, or are you suggesting that such a DAG node is malformed and hence undef?

paulwalker-arm added inline comments.Oct 5 2021, 9:03 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
7641

The latter. We can assume the index in range and just need to ensure that when out of range we don't do anything bad (like access random memory). In our case we'll always emit an EXT instruction that will either do the correct thing when the index is in range (i.e. the defined case) or return the first vector when the index is out of range (i.e. the undefined case).

bsmith updated this revision to Diff 377280.Oct 5 2021, 9:25 AM
bsmith retitled this revision from [AArch64][SVE] Improve VECTOR_SPLICE codegen when vector length is known to [AArch64][SVE] Improve VECTOR_SPLICE codegen for VL > 128-bit.
  • Move idx==0 optimization to getNode
  • Allow lowering for all VL lengths, not just the current one.
paulwalker-arm added inline comments.Oct 6 2021, 3:27 AM
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
6162

Up to you but N3 must to be a constant so you can do if (cast<ConstantSDNode>(N3)->isNullValue()).

llvm/test/CodeGen/AArch64/named-vector-shuffles-sve.ll
10–13

I think the intent of all the first_idx tests is to validate the lower bound of the EXT isel patterns and so think they should all pass 1 to the vector.splice and then perhaps just have a single zero_idx test for the NOP case.

37–38

Is this safe? You've not touched this code so I've got a feeling the common expand code is incorrectly assuming the largest legal index+1 represents a safe index to use when expanding the code. I think it is safer to force an out-of-range index to 0.

209

Perhaps the _1_ in the function name should now be _31_?

paulwalker-arm added inline comments.Oct 6 2021, 3:46 AM
llvm/test/CodeGen/AArch64/named-vector-shuffles-sve.ll
37–38

Please ignore this. I didn't spot the ll test change and so got my wires crossed.

bsmith updated this revision to Diff 377494.Oct 6 2021, 4:01 AM
  • Change getNode to assume constant value for 3rd operand of VECTOR_SPLICE
  • Change #0 index tests to #1 to check minimum ext range
  • Add single #0 index test
paulwalker-arm accepted this revision.Oct 6 2021, 4:10 AM
This revision is now accepted and ready to land.Oct 6 2021, 4:10 AM
peterwaller-arm added inline comments.Oct 6 2021, 4:38 AM
llvm/test/CodeGen/AArch64/named-vector-shuffles-sve.ll
203–204

I expected to find a #2 here (1 x f16's worth of bytes), but there is a #8, which I can't quickly explain?

214

Continuing the logic of the earlier comment I make the immediate maximum to be 255 (bytes) - 2 (1 x f16) = 253, but we can't access byte 253 because we need a whole number of 2-byte halves, so I'm expecting to see 252/2 = 126 as input and #252 as the immediate.

peterwaller-arm accepted this revision.Oct 6 2021, 5:45 AM
peterwaller-arm added inline comments.
llvm/test/CodeGen/AArch64/named-vector-shuffles-sve.ll
203–204

A colleague reminded me that this is an unpacked type, so the container fits in 8 bytes. Now it all makes sense.