This is an archive of the discontinued LLVM Phabricator instance.

[SVE][CodeGen] Legalisation of truncate for scalable vectors
ClosedPublic

Authored by kmclaughlin on Aug 25 2020, 9:47 AM.

Details

Summary

Truncating from an illegal SVE type to a legal type, e.g.
trunc <vscale x 4 x i64> %in to <vscale x 4 x i32>
fails after PromoteIntOp_CONCAT_VECTORS attempts to
create a BUILD_VECTOR.

This patch changes the promote function to create a sequence of
INSERT_SUBVECTORs if the return type is scalable, and replaces
these with UNPK+UZP1 for AArch64.

Diff Detail

Event Timeline

kmclaughlin created this revision.Aug 25 2020, 9:47 AM
Herald added a project: Restricted Project. · View Herald Transcript
kmclaughlin requested review of this revision.Aug 25 2020, 9:47 AM

It's probably worth considering adding a target-independent opcode for this sort of truncate, but I guess this approach is okay, at least for now.

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
4681

How do we end up here? I guess we split the truncate into two truncs, and then concatenate the two truncates?

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
965

I'd prefer to list out the relevant types more explicitly; the set of illegal types listed in integer_scalable_vector_valuetypes() could change if other targets need additional scalable types. Can we use the same {MVT::nxv8i8, MVT::nxv4i16, MVT::nxv2i32} list we use for EXTRACT_SUBVECTOR?

9110

Do we need to check the result type is legal?

9132

Is the Vec0->getOperand(1).isUndef() check actually necessary?

Can we have the same set of checks for optimizing the "other" half in the low and high cases? Some of them won't be immediately useful, but it helps make the logic more clear.

kmclaughlin marked 3 inline comments as done.
  • Set EXTRACT_SUBVECTOR to Custom for the illegal types nxv8i8, nxv4i16 & nxv2i32 only.
  • Changed LowerINSERT_SUBVECTOR to apply the same checks for both the low & high cases.
kmclaughlin marked an inline comment as not done.Sep 1 2020, 10:50 AM
kmclaughlin added inline comments.
llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
4681

That's correct, we end up here when the result type is legal but the operand needs to be split, e.g.
trunc <vscale x 4 x i64> %in to <vscale x 4 x i32>
In this example we try to split the trunc in two truncates with return types of <vscale x 2 x i32>, and concatenate them.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
9132

I changed this slightly so that we are using the same set of checks on both halves, and removed the extra isUndef() check as I don't think it was necessary here.

efriedma added inline comments.Sep 1 2020, 1:25 PM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
9156

Vec0->getOperand(0); looks wrong; should it be Vec0->getOperand(Idx == 0 ? 0 : 1)?

david-arm added inline comments.Sep 2 2020, 12:47 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
9156

I think it should be

Vec0->getOperand(Idx == 0 ? 1 : 0)

since we want to keep lower half when inserting a subvec into the upper half and vice-versa.

paulwalker-arm added inline comments.
llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
4689–4691

Out of interest why do we need this restriction? The common algorithm seems like a trivial loop of the form

Res = getUndef(ResVT)
for i=0-NumOp; {
  Op = N->getOperand(i);
  unsigned Idx = Op.getValueType().getVectorMinNumElement();
  Res = getNode(INSERT_SUBVECTOR, ResVT, Res, Op, getConstant(Idx * i));
}
return Res;

I guess it's not for this patch but I'm also wondering if this could be the canonical form when type legalising CONCAT_VECTORS since the more instances of BUILD_VECTOR we can remove the better.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
9133

You don't need this because VT and Op0VT are the same thing, which is probably why Op1VT was named InVT, not that there's anything wrong with the rename if you prefer it.

9135

Probably a bit left field but the issue here is the integer sub-vector is not type legal. Is it possible to bitcast the operation to floating point, then we just need to add patterns to InstrInfo.td.

I can imagine DAGCombine might undo the transform but it's perhaps worth an experiment, even if it's after landing the patch since I imagine we'll want to support floating-point inserts at some point.

9155–9156

FYI: This looks more like a DAGCombine. Considering how much we're relying on UPK/UZP for legalisation I'm wondering if we should (separately from this patch) introduce common ISD nodes for these operations. Assuming there isn't already a way to do combines for target nodes? What does @eli.friedman think?

efriedma added inline comments.Sep 2 2020, 5:03 PM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
9155–9156

We could split this into a combine, I guess?

You can run DAGCombines on target-specific nodes by just adding a case to AArch64TargetLowering::PerformDAGCombine, I think.

I think we should avoid adding new target-independent nodes that are only useful for scalable vectors if we don't need them in target-independent code; it's hard to tell what common nodes would actually be useful without another backend to use them.

kmclaughlin marked 4 inline comments as done.
kmclaughlin edited the summary of this revision. (Show Details)
  • Removed the restriction from PromoteIntOp_EXTRACT_SUBVECTOR which prevented concatenating more than 2 scalable vectors
  • Added a case to PerformDAGCombine for UNPKHI/LO which combines the following: unpk(undef) -> undef & unpk(uzp1(x)) -> x
kmclaughlin added inline comments.Sep 4 2020, 9:23 AM
llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
4689–4691

The only reason for this restriction was that LowerINSERT_SUBVECTOR checks that the smaller subvector is half the size of the insert_subvector before creating an unpk + uzp1. I've changed this to remove the restriction and added a loop over the number of operands instead.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
9135

I did try your suggestion here of bitcasting the operation to floating point, though I then hit another issue where the bitcast of the sub-vector (e.g. bitcasting a 2i32 truncate -> 2f32) has an operand which is not type legal.
The PromoteIntOp_BITCAST function tries to handle this by creating a store + load, but I'm not sure this is the right approach in this situation?

9155–9156

Added a case for UUNPKHI/LO and split this into a DAG combine

Looks good to me, but I'll defer the final LGTM to @paulwalker-arm and @efriedma! I learned something new from this patch too - didn't know you could add target ISD combines like that.

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
4692

nit: Perhaps better to rename Idx to something like NumSubVecElems or something like that, since it's not really an index now?

paulwalker-arm added inline comments.Sep 7 2020, 5:51 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
9110

As well as what Eli asked I also don't think InVT.isInteger is required because by INSERT_SUBVECTOR's definition the subvector must have the same element type as the result vector.

13037–13038

In it's current form I don't think this is correct. The UUNPK instructions zero the even lanes and so you need to ensure the chosen child of the UZP1 honours this requirement. Do you know the exact sequence you expect to see? For example perhaps the pattern you're after is UUNPKLO(UZP1(UUNPKLO(X), ??)) -> UUNPKLO(X) or some UZP1(UUNPKLO(UZP1(X,Y))... ->UZP1(X,Y) like sequence.

kmclaughlin marked an inline comment as done.
  • Removed DAGCombine for UNPKHI/LO and added a new case for UZP1, so that we can check for the more specific case of uzp1(unpklo(uzp1(x, y)), z)
  • Removed an unnecessary check that InVT is an integer in LowerINSERT_SUBVECTOR
  • Renamed Idx in PromoteIntOp_EXTRACT_SUBVECTOR to OpNumElts
paulwalker-arm accepted this revision.Sep 9 2020, 5:39 AM

Looks good assuming the new test doesn't throw up any surprises.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
13031

Stylistically this function could do with a few blank lines and is perhaps clearer if the two UzpOp variables are named X and Z respectively.

llvm/test/CodeGen/AArch64/sve-split-trunc.ll
21

You're missing trunc_i32toi8 for the complete collection.

This revision is now accepted and ready to land.Sep 9 2020, 5:39 AM
This revision was automatically updated to reflect the committed changes.
kmclaughlin marked 2 inline comments as done.