Download Raw Diff

Details

Reviewers

sdesmalen
efriedma
david-arm
paulwalker-arm
rengolin

Commits

rGcd89f5c91b4b: [SVE][CodeGen] Legalisation of truncate for scalable vectors

Summary

Truncating from an illegal SVE type to a legal type, e.g.
trunc <vscale x 4 x i64> %in to <vscale x 4 x i32>
fails after PromoteIntOp_CONCAT_VECTORS attempts to
create a BUILD_VECTOR.

This patch changes the promote function to create a sequence of
INSERT_SUBVECTORs if the return type is scalable, and replaces
these with UNPK+UZP1 for AArch64.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

kmclaughlin created this revision.Aug 25 2020, 9:47 AM

Herald added a reviewer: rengolin. · View Herald TranscriptAug 25 2020, 9:47 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: llvm-commits, psnobl, hiraditya and 2 others. · View Herald Transcript

kmclaughlin requested review of this revision.Aug 25 2020, 9:47 AM

Harbormaster completed remote builds in B69466: Diff 287693.Aug 25 2020, 10:19 AM

It's probably worth considering adding a target-independent opcode for this sort of truncate, but I guess this approach is okay, at least for now.

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
4703	How do we end up here? I guess we split the truncate into two truncs, and then concatenate the two truncates?
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
965	I'd prefer to list out the relevant types more explicitly; the set of illegal types listed in integer_scalable_vector_valuetypes() could change if other targets need additional scalable types. Can we use the same `{MVT::nxv8i8, MVT::nxv4i16, MVT::nxv2i32}` list we use for EXTRACT_SUBVECTOR?
9110	Do we need to check the result type is legal?
9132	Is the `Vec0->getOperand(1).isUndef()` check actually necessary? Can we have the same set of checks for optimizing the "other" half in the low and high cases? Some of them won't be immediately useful, but it helps make the logic more clear.

Set EXTRACT_SUBVECTOR to Custom for the illegal types nxv8i8, nxv4i16 & nxv2i32 only.
Changed LowerINSERT_SUBVECTOR to apply the same checks for both the low & high cases.

kmclaughlin marked an inline comment as not done.Sep 1 2020, 10:50 AM

kmclaughlin added inline comments.

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
4703	That's correct, we end up here when the result type is legal but the operand needs to be split, e.g. `trunc <vscale x 4 x i64> %in to <vscale x 4 x i32>` In this example we try to split the trunc in two truncates with return types of `<vscale x 2 x i32>`, and concatenate them.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
9132	I changed this slightly so that we are using the same set of checks on both halves, and removed the extra `isUndef()` check as I don't think it was necessary here.

efriedma added inline comments.Sep 1 2020, 1:25 PM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
9156	`Vec0->getOperand(0);` looks wrong; should it be `Vec0->getOperand(Idx == 0 ? 0 : 1)`?

david-arm added inline comments.Sep 2 2020, 12:47 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
9156	I think it should be Vec0->getOperand(Idx == 0 ? 1 : 0) since we want to keep lower half when inserting a subvec into the upper half and vice-versa.

paulwalker-arm added a subscriber: eli.friedman.Sep 2 2020, 4:15 AM

paulwalker-arm added inline comments.

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
4711–4713	Out of interest why do we need this restriction? The common algorithm seems like a trivial loop of the form Res = getUndef(ResVT) for i=0-NumOp; { Op = N->getOperand(i); unsigned Idx = Op.getValueType().getVectorMinNumElement(); Res = getNode(INSERT_SUBVECTOR, ResVT, Res, Op, getConstant(Idx * i)); } return Res; I guess it's not for this patch but I'm also wondering if this could be the canonical form when type legalising CONCAT_VECTORS since the more instances of BUILD_VECTOR we can remove the better.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
9133	You don't need this because VT and Op0VT are the same thing, which is probably why Op1VT was named InVT, not that there's anything wrong with the rename if you prefer it.
9135	Probably a bit left field but the issue here is the integer sub-vector is not type legal. Is it possible to bitcast the operation to floating point, then we just need to add patterns to InstrInfo.td. I can imagine DAGCombine might undo the transform but it's perhaps worth an experiment, even if it's after landing the patch since I imagine we'll want to support floating-point inserts at some point.
9155–9156	FYI: This looks more like a DAGCombine. Considering how much we're relying on UPK/UZP for legalisation I'm wondering if we should (separately from this patch) introduce common ISD nodes for these operations. Assuming there isn't already a way to do combines for target nodes? What does @eli.friedman think?

efriedma added inline comments.Sep 2 2020, 5:03 PM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
9155–9156	We could split this into a combine, I guess? You can run DAGCombines on target-specific nodes by just adding a case to AArch64TargetLowering::PerformDAGCombine, I think. I think we should avoid adding new target-independent nodes that are only useful for scalable vectors if we don't need them in target-independent code; it's hard to tell what common nodes would actually be useful without another backend to use them.

Removed the restriction from PromoteIntOp_EXTRACT_SUBVECTOR which prevented concatenating more than 2 scalable vectors
Added a case to PerformDAGCombine for UNPKHI/LO which combines the following: unpk(undef) -> undef & unpk(uzp1(x)) -> x

kmclaughlin added inline comments.Sep 4 2020, 9:23 AM

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
4711–4713	The only reason for this restriction was that LowerINSERT_SUBVECTOR checks that the smaller subvector is half the size of the insert_subvector before creating an unpk + uzp1. I've changed this to remove the restriction and added a loop over the number of operands instead.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
9135	I did try your suggestion here of bitcasting the operation to floating point, though I then hit another issue where the bitcast of the sub-vector (e.g. bitcasting a 2i32 truncate -> 2f32) has an operand which is not type legal. The PromoteIntOp_BITCAST function tries to handle this by creating a store + load, but I'm not sure this is the right approach in this situation?
9155–9156	Added a case for UUNPKHI/LO and split this into a DAG combine

Looks good to me, but I'll defer the final LGTM to @paulwalker-arm and @efriedma! I learned something new from this patch too - didn't know you could add target ISD combines like that.

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
4714	nit: Perhaps better to rename Idx to something like NumSubVecElems or something like that, since it's not really an index now?

paulwalker-arm added inline comments.Sep 7 2020, 5:51 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
9110	As well as what Eli asked I also don't think InVT.isInteger is required because by INSERT_SUBVECTOR's definition the subvector must have the same element type as the result vector.
13037–13038	In it's current form I don't think this is correct. The UUNPK instructions zero the even lanes and so you need to ensure the chosen child of the UZP1 honours this requirement. Do you know the exact sequence you expect to see? For example perhaps the pattern you're after is `UUNPKLO(UZP1(UUNPKLO(X), ??)) -> UUNPKLO(X)` or some `UZP1(UUNPKLO(UZP1(X,Y))... ->UZP1(X,Y)` like sequence.

kmclaughlin added a child revision: D87232: [SVE][CodeGen] Lower floating point -> integer conversions.Sep 7 2020, 6:44 AM

Removed DAGCombine for UNPKHI/LO and added a new case for UZP1, so that we can check for the more specific case of uzp1(unpklo(uzp1(x, y)), z)
Removed an unnecessary check that InVT is an integer in LowerINSERT_SUBVECTOR
Renamed Idx in PromoteIntOp_EXTRACT_SUBVECTOR to OpNumElts

Looks good assuming the new test doesn't throw up any surprises.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
13031	Stylistically this function could do with a few blank lines and is perhaps clearer if the two UzpOp variables are named X and Z respectively.
llvm/test/CodeGen/AArch64/sve-split-trunc.ll
22	You're missing trunc_i32toi8 for the complete collection.

This revision is now accepted and ready to land.Sep 9 2020, 5:39 AM

Closed by commit rGcd89f5c91b4b: [SVE][CodeGen] Legalisation of truncate for scalable vectors (authored by kmclaughlin). · Explain WhySep 10 2020, 3:38 AM

This revision was automatically updated to reflect the committed changes.

kmclaughlin marked 2 inline comments as done.

kmclaughlin added a commit: rGcd89f5c91b4b: [SVE][CodeGen] Legalisation of truncate for scalable vectors.

Thank you @paulwalker-arm, @eli.friedman and @david-arm for the reviews!

Diff 290927

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp

Show First 20 Lines • Show All 4,694 Lines • ▼ Show 20 Lines	SDValue DAGTypeLegalizer::PromoteIntOp_EXTRACT_SUBVECTOR(SDNode *N) {
SDValue V0 = GetPromotedInteger(N->getOperand(0));		SDValue V0 = GetPromotedInteger(N->getOperand(0));
MVT InVT = V0.getValueType().getSimpleVT();		MVT InVT = V0.getValueType().getSimpleVT();
MVT OutVT = MVT::getVectorVT(InVT.getVectorElementType(),		MVT OutVT = MVT::getVectorVT(InVT.getVectorElementType(),
N->getValueType(0).getVectorNumElements());		N->getValueType(0).getVectorNumElements());
SDValue Ext = DAG.getNode(ISD::EXTRACT_SUBVECTOR, dl, OutVT, V0, N->getOperand(1));		SDValue Ext = DAG.getNode(ISD::EXTRACT_SUBVECTOR, dl, OutVT, V0, N->getOperand(1));
return DAG.getNode(ISD::TRUNCATE, dl, N->getValueType(0), Ext);		return DAG.getNode(ISD::TRUNCATE, dl, N->getValueType(0), Ext);
}		}

SDValue DAGTypeLegalizer::PromoteIntOp_CONCAT_VECTORS(SDNode *N) {		SDValue DAGTypeLegalizer::PromoteIntOp_CONCAT_VECTORS(SDNode *N) {
		efriedmaUnsubmitted Not Done Reply Inline Actions How do we end up here? I guess we split the truncate into two truncs, and then concatenate the two truncates? efriedma: How do we end up here? I guess we split the truncate into two truncs, and then concatenate the…
		kmclaughlinAuthorUnsubmitted Not Done Reply Inline Actions That's correct, we end up here when the result type is legal but the operand needs to be split, e.g. `trunc <vscale x 4 x i64> %in to <vscale x 4 x i32>` In this example we try to split the trunc in two truncates with return types of `<vscale x 2 x i32>`, and concatenate them. kmclaughlin: That's correct, we end up here when the result type is legal but the operand needs to be split…
SDLoc dl(N);		SDLoc dl(N);

		EVT ResVT = N->getValueType(0);
unsigned NumElems = N->getNumOperands();		unsigned NumElems = N->getNumOperands();

		if (ResVT.isScalableVector()) {
		SDValue ResVec = DAG.getUNDEF(ResVT);

		for (unsigned OpIdx = 0; OpIdx < NumElems; ++OpIdx) {
		SDValue Op = N->getOperand(OpIdx);
		paulwalker-armUnsubmitted Not Done Reply Inline Actions Out of interest why do we need this restriction? The common algorithm seems like a trivial loop of the form Res = getUndef(ResVT) for i=0-NumOp; { Op = N->getOperand(i); unsigned Idx = Op.getValueType().getVectorMinNumElement(); Res = getNode(INSERT_SUBVECTOR, ResVT, Res, Op, getConstant(Idx * i)); } return Res; I guess it's not for this patch but I'm also wondering if this could be the canonical form when type legalising CONCAT_VECTORS since the more instances of BUILD_VECTOR we can remove the better. paulwalker-arm: Out of interest why do we need this restriction? The common algorithm seems like a trivial…
		kmclaughlinAuthorUnsubmitted Not Done Reply Inline Actions The only reason for this restriction was that LowerINSERT_SUBVECTOR checks that the smaller subvector is half the size of the insert_subvector before creating an unpk + uzp1. I've changed this to remove the restriction and added a loop over the number of operands instead. kmclaughlin: The only reason for this restriction was that LowerINSERT_SUBVECTOR checks that the smaller…
		unsigned OpNumElts = Op.getValueType().getVectorMinNumElements();
		david-armUnsubmitted Done Reply Inline Actions nit: Perhaps better to rename Idx to something like NumSubVecElems or something like that, since it's not really an index now? david-arm: nit: Perhaps better to rename Idx to something like NumSubVecElems or something like that…
		ResVec = DAG.getNode(ISD::INSERT_SUBVECTOR, dl, ResVT, ResVec, Op,
		DAG.getIntPtrConstant(OpIdx * OpNumElts, dl));
		}

		return ResVec;
		}

EVT RetSclrTy = N->getValueType(0).getVectorElementType();		EVT RetSclrTy = N->getValueType(0).getVectorElementType();

SmallVector<SDValue, 8> NewOps;		SmallVector<SDValue, 8> NewOps;
NewOps.reserve(NumElems);		NewOps.reserve(NumElems);

// For each incoming vector		// For each incoming vector
for (unsigned VecIdx = 0; VecIdx != NumElems; ++VecIdx) {		for (unsigned VecIdx = 0; VecIdx != NumElems; ++VecIdx) {
SDValue Incoming = GetPromotedInteger(N->getOperand(VecIdx));		SDValue Incoming = GetPromotedInteger(N->getOperand(VecIdx));
Show All 14 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 956 Lines • ▼ Show 20 Lines	for (MVT VT : MVT::integer_scalable_vector_valuetypes()) {
setOperationAction(ISD::SRL, VT, Custom);		setOperationAction(ISD::SRL, VT, Custom);
setOperationAction(ISD::SRA, VT, Custom);		setOperationAction(ISD::SRA, VT, Custom);
if (VT.getScalarType() == MVT::i1) {		if (VT.getScalarType() == MVT::i1) {
setOperationAction(ISD::SETCC, VT, Custom);		setOperationAction(ISD::SETCC, VT, Custom);
setOperationAction(ISD::TRUNCATE, VT, Custom);		setOperationAction(ISD::TRUNCATE, VT, Custom);
setOperationAction(ISD::CONCAT_VECTORS, VT, Legal);		setOperationAction(ISD::CONCAT_VECTORS, VT, Legal);
}		}
}		}
}		}
		efriedmaUnsubmitted Done Reply Inline Actions I'd prefer to list out the relevant types more explicitly; the set of illegal types listed in integer_scalable_vector_valuetypes() could change if other targets need additional scalable types. Can we use the same `{MVT::nxv8i8, MVT::nxv4i16, MVT::nxv2i32}` list we use for EXTRACT_SUBVECTOR? efriedma: I'd prefer to list out the relevant types more explicitly; the set of illegal types listed in…

for (auto VT : {MVT::nxv8i8, MVT::nxv4i16, MVT::nxv2i32})		for (auto VT : {MVT::nxv8i8, MVT::nxv4i16, MVT::nxv2i32}) {
setOperationAction(ISD::EXTRACT_SUBVECTOR, VT, Custom);		setOperationAction(ISD::EXTRACT_SUBVECTOR, VT, Custom);
		setOperationAction(ISD::INSERT_SUBVECTOR, VT, Custom);
		}

setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::i8, Custom);		setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::i8, Custom);
setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::i16, Custom);		setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::i16, Custom);

for (MVT VT : MVT::fp_scalable_vector_valuetypes()) {		for (MVT VT : MVT::fp_scalable_vector_valuetypes()) {
if (isTypeLegal(VT)) {		if (isTypeLegal(VT)) {
setOperationAction(ISD::INSERT_SUBVECTOR, VT, Custom);		setOperationAction(ISD::INSERT_SUBVECTOR, VT, Custom);
setOperationAction(ISD::SPLAT_VECTOR, VT, Custom);		setOperationAction(ISD::SPLAT_VECTOR, VT, Custom);
▲ Show 20 Lines • Show All 8,117 Lines • ▼ Show 20 Lines
SDValue AArch64TargetLowering::LowerINSERT_SUBVECTOR(SDValue Op,		SDValue AArch64TargetLowering::LowerINSERT_SUBVECTOR(SDValue Op,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
assert(Op.getValueType().isScalableVector() &&		assert(Op.getValueType().isScalableVector() &&
"Only expect to lower inserts into scalable vectors!");		"Only expect to lower inserts into scalable vectors!");

EVT InVT = Op.getOperand(1).getValueType();		EVT InVT = Op.getOperand(1).getValueType();
unsigned Idx = cast<ConstantSDNode>(Op.getOperand(2))->getZExtValue();		unsigned Idx = cast<ConstantSDNode>(Op.getOperand(2))->getZExtValue();

// We don't have any patterns for scalable vector yet.		if (InVT.isScalableVector()) {
if (InVT.isScalableVector())		SDLoc DL(Op);
		EVT VT = Op.getValueType();

		if (!isTypeLegal(VT) \|\| !VT.isInteger())
		return SDValue();

		efriedmaUnsubmitted Done Reply Inline Actions Do we need to check the result type is legal? efriedma: Do we need to check the result type is legal?
		paulwalker-armUnsubmitted Not Done Reply Inline Actions As well as what Eli asked I also don't think InVT.isInteger is required because by INSERT_SUBVECTOR's definition the subvector must have the same element type as the result vector. paulwalker-arm: As well as what Eli asked I also don't think InVT.isInteger is required because by…
		SDValue Vec0 = Op.getOperand(0);
		SDValue Vec1 = Op.getOperand(1);

		// Ensure the subvector is half the size of the main vector.
		if (VT.getVectorElementCount() != (InVT.getVectorElementCount() * 2))
		return SDValue();

		// Extend elements of smaller vector...
		EVT WideVT = InVT.widenIntegerVectorElementType(*(DAG.getContext()));
		SDValue ExtVec = DAG.getNode(ISD::ANY_EXTEND, DL, WideVT, Vec1);

		if (Idx == 0) {
		SDValue HiVec0 = DAG.getNode(AArch64ISD::UUNPKHI, DL, WideVT, Vec0);
		return DAG.getNode(AArch64ISD::UZP1, DL, VT, ExtVec, HiVec0);
		} else if (Idx == InVT.getVectorMinNumElements()) {
		SDValue LoVec0 = DAG.getNode(AArch64ISD::UUNPKLO, DL, WideVT, Vec0);
		return DAG.getNode(AArch64ISD::UZP1, DL, VT, LoVec0, ExtVec);
		}

return SDValue();		return SDValue();
		}

		efriedmaUnsubmitted Done Reply Inline Actions Is the `Vec0->getOperand(1).isUndef()` check actually necessary? Can we have the same set of checks for optimizing the "other" half in the low and high cases? Some of them won't be immediately useful, but it helps make the logic more clear. efriedma: Is the `Vec0->getOperand(1).isUndef()` check actually necessary? Can we have the same set of…
		kmclaughlinAuthorUnsubmitted Not Done Reply Inline Actions I changed this slightly so that we are using the same set of checks on both halves, and removed the extra `isUndef()` check as I don't think it was necessary here. kmclaughlin: I changed this slightly so that we are using the same set of checks on both halves, and removed…
// This will be matched by custom code during ISelDAGToDAG.		// This will be matched by custom code during ISelDAGToDAG.
		paulwalker-armUnsubmitted Done Reply Inline Actions You don't need this because VT and Op0VT are the same thing, which is probably why Op1VT was named InVT, not that there's anything wrong with the rename if you prefer it. paulwalker-arm: You don't need this because VT and Op0VT are the same thing, which is probably why Op1VT was…
if (Idx == 0 && isPackedVectorType(InVT, DAG) && Op.getOperand(0).isUndef())		if (Idx == 0 && isPackedVectorType(InVT, DAG) && Op.getOperand(0).isUndef())
return Op;		return Op;
		paulwalker-armUnsubmitted Not Done Reply Inline Actions Probably a bit left field but the issue here is the integer sub-vector is not type legal. Is it possible to bitcast the operation to floating point, then we just need to add patterns to InstrInfo.td. I can imagine DAGCombine might undo the transform but it's perhaps worth an experiment, even if it's after landing the patch since I imagine we'll want to support floating-point inserts at some point. paulwalker-arm: Probably a bit left field but the issue here is the integer sub-vector is not type legal. Is…
		kmclaughlinAuthorUnsubmitted Not Done Reply Inline Actions I did try your suggestion here of bitcasting the operation to floating point, though I then hit another issue where the bitcast of the sub-vector (e.g. bitcasting a 2i32 truncate -> 2f32) has an operand which is not type legal. The PromoteIntOp_BITCAST function tries to handle this by creating a store + load, but I'm not sure this is the right approach in this situation? kmclaughlin: I did try your suggestion here of bitcasting the operation to floating point, though I then hit…

return SDValue();		return SDValue();
}		}

SDValue AArch64TargetLowering::LowerDIV(SDValue Op, SelectionDAG &DAG) const {		SDValue AArch64TargetLowering::LowerDIV(SDValue Op, SelectionDAG &DAG) const {
EVT VT = Op.getValueType();		EVT VT = Op.getValueType();

if (useSVEForFixedLengthVectorVT(VT, /OverrideNEON=/true))		if (useSVEForFixedLengthVectorVT(VT, /OverrideNEON=/true))
return LowerFixedLengthVectorIntDivideToSVE(Op, DAG);		return LowerFixedLengthVectorIntDivideToSVE(Op, DAG);

assert(VT.isScalableVector() && "Expected a scalable vector.");		assert(VT.isScalableVector() && "Expected a scalable vector.");

bool Signed = Op.getOpcode() == ISD::SDIV;		bool Signed = Op.getOpcode() == ISD::SDIV;
unsigned PredOpcode = Signed ? AArch64ISD::SDIV_PRED : AArch64ISD::UDIV_PRED;		unsigned PredOpcode = Signed ? AArch64ISD::SDIV_PRED : AArch64ISD::UDIV_PRED;

if (VT == MVT::nxv4i32 \|\| VT == MVT::nxv2i64)		if (VT == MVT::nxv4i32 \|\| VT == MVT::nxv2i64)
return LowerToPredicatedOp(Op, DAG, PredOpcode);		return LowerToPredicatedOp(Op, DAG, PredOpcode);

// SVE doesn't have i8 and i16 DIV operations; widen them to 32-bit		// SVE doesn't have i8 and i16 DIV operations; widen them to 32-bit
// operations, and truncate the result.		// operations, and truncate the result.
EVT WidenedVT;		EVT WidenedVT;
		efriedmaUnsubmitted Done Reply Inline Actions `Vec0->getOperand(0);` looks wrong; should it be `Vec0->getOperand(Idx == 0 ? 0 : 1)`? efriedma: `Vec0->getOperand(0);` looks wrong; should it be `Vec0->getOperand(Idx == 0 ? 0 : 1)`?
		david-armUnsubmitted Done Reply Inline Actions I think it should be Vec0->getOperand(Idx == 0 ? 1 : 0) since we want to keep lower half when inserting a subvec into the upper half and vice-versa. david-arm: I think it should be ``` Vec0->getOperand(Idx == 0 ? 1 : 0) ``` since we want to keep lower…
		paulwalker-armUnsubmitted Not Done Reply Inline Actions FYI: This looks more like a DAGCombine. Considering how much we're relying on UPK/UZP for legalisation I'm wondering if we should (separately from this patch) introduce common ISD nodes for these operations. Assuming there isn't already a way to do combines for target nodes? What does @eli.friedman think? paulwalker-arm: FYI: This looks more like a DAGCombine. Considering how much we're relying on UPK/UZP for…
		efriedmaUnsubmitted Done Reply Inline Actions We could split this into a combine, I guess? You can run DAGCombines on target-specific nodes by just adding a case to AArch64TargetLowering::PerformDAGCombine, I think. I think we should avoid adding new target-independent nodes that are only useful for scalable vectors if we don't need them in target-independent code; it's hard to tell what common nodes would actually be useful without another backend to use them. efriedma: We could split this into a combine, I guess? You can run DAGCombines on target-specific nodes…
		kmclaughlinAuthorUnsubmitted Done Reply Inline Actions Added a case for UUNPKHI/LO and split this into a DAG combine kmclaughlin: Added a case for UUNPKHI/LO and split this into a DAG combine
if (VT == MVT::nxv16i8)		if (VT == MVT::nxv16i8)
WidenedVT = MVT::nxv8i16;		WidenedVT = MVT::nxv8i16;
else if (VT == MVT::nxv8i16)		else if (VT == MVT::nxv8i16)
WidenedVT = MVT::nxv4i32;		WidenedVT = MVT::nxv4i32;
else		else
llvm_unreachable("Unexpected Custom DIV operation");		llvm_unreachable("Unexpected Custom DIV operation");

SDLoc dl(Op);		SDLoc dl(Op);
▲ Show 20 Lines • Show All 3,858 Lines • ▼ Show 20 Lines	SDValue NewST1 =
S->getAlignment(), S->getMemOperand()->getFlags());		S->getAlignment(), S->getMemOperand()->getFlags());
SDValue OffsetPtr = DAG.getNode(ISD::ADD, DL, MVT::i64, BasePtr,		SDValue OffsetPtr = DAG.getNode(ISD::ADD, DL, MVT::i64, BasePtr,
DAG.getConstant(8, DL, MVT::i64));		DAG.getConstant(8, DL, MVT::i64));
return DAG.getStore(NewST1.getValue(0), DL, SubVector1, OffsetPtr,		return DAG.getStore(NewST1.getValue(0), DL, SubVector1, OffsetPtr,
S->getPointerInfo(), S->getAlignment(),		S->getPointerInfo(), S->getAlignment(),
S->getMemOperand()->getFlags());		S->getMemOperand()->getFlags());
}		}

		static SDValue performUzpCombine(SDNode *N, SelectionDAG &DAG) {
		paulwalker-armUnsubmitted Done Reply Inline Actions Stylistically this function could do with a few blank lines and is perhaps clearer if the two UzpOp variables are named X and Z respectively. paulwalker-arm: Stylistically this function could do with a few blank lines and is perhaps clearer if the two…
		SDLoc DL(N);
		SDValue Op0 = N->getOperand(0);
		SDValue Op1 = N->getOperand(1);
		EVT ResVT = N->getValueType(0);

		// uzp1(unpklo(uzp1(x, y)), z) => uzp1(x, z)
		if (Op0.getOpcode() == AArch64ISD::UUNPKLO) {
		paulwalker-armUnsubmitted Not Done Reply Inline Actions In it's current form I don't think this is correct. The UUNPK instructions zero the even lanes and so you need to ensure the chosen child of the UZP1 honours this requirement. Do you know the exact sequence you expect to see? For example perhaps the pattern you're after is `UUNPKLO(UZP1(UUNPKLO(X), ??)) -> UUNPKLO(X)` or some `UZP1(UUNPKLO(UZP1(X,Y))... ->UZP1(X,Y)` like sequence. paulwalker-arm: In it's current form I don't think this is correct. The UUNPK instructions zero the even lanes…
		if (Op0.getOperand(0).getOpcode() == AArch64ISD::UZP1) {
		SDValue X = Op0.getOperand(0).getOperand(0);
		return DAG.getNode(AArch64ISD::UZP1, DL, ResVT, X, Op1);
		}
		}

		// uzp1(x, unpkhi(uzp1(y, z))) => uzp1(x, z)
		if (Op1.getOpcode() == AArch64ISD::UUNPKHI) {
		if (Op1.getOperand(0).getOpcode() == AArch64ISD::UZP1) {
		SDValue Z = Op1.getOperand(0).getOperand(1);
		return DAG.getNode(AArch64ISD::UZP1, DL, ResVT, Op0, Z);
		}
		}

		return SDValue();
		}

/// Target-specific DAG combine function for post-increment LD1 (lane) and		/// Target-specific DAG combine function for post-increment LD1 (lane) and
/// post-increment LD1R.		/// post-increment LD1R.
static SDValue performPostLD1Combine(SDNode *N,		static SDValue performPostLD1Combine(SDNode *N,
TargetLowering::DAGCombinerInfo &DCI,		TargetLowering::DAGCombinerInfo &DCI,
bool IsLaneOp) {		bool IsLaneOp) {
if (DCI.isBeforeLegalizeOps())		if (DCI.isBeforeLegalizeOps())
return SDValue();		return SDValue();

▲ Show 20 Lines • Show All 1,325 Lines • ▼ Show 20 Lines	SDValue AArch64TargetLowering::PerformDAGCombine(SDNode *N,
case AArch64ISD::TBZ:		case AArch64ISD::TBZ:
return performTBZCombine(N, DCI, DAG);		return performTBZCombine(N, DCI, DAG);
case AArch64ISD::CSEL:		case AArch64ISD::CSEL:
return performCONDCombine(N, DCI, DAG, 2, 3);		return performCONDCombine(N, DCI, DAG, 2, 3);
case AArch64ISD::DUP:		case AArch64ISD::DUP:
return performPostLD1Combine(N, DCI, false);		return performPostLD1Combine(N, DCI, false);
case AArch64ISD::NVCAST:		case AArch64ISD::NVCAST:
return performNVCASTCombine(N);		return performNVCASTCombine(N);
		case AArch64ISD::UZP1:
		return performUzpCombine(N, DAG);
case ISD::INSERT_VECTOR_ELT:		case ISD::INSERT_VECTOR_ELT:
return performPostLD1Combine(N, DCI, true);		return performPostLD1Combine(N, DCI, true);
case ISD::INTRINSIC_VOID:		case ISD::INTRINSIC_VOID:
case ISD::INTRINSIC_W_CHAIN:		case ISD::INTRINSIC_W_CHAIN:
switch (cast<ConstantSDNode>(N->getOperand(1))->getZExtValue()) {		switch (cast<ConstantSDNode>(N->getOperand(1))->getZExtValue()) {
case Intrinsic::aarch64_sve_prfb_gather_scalar_offset:		case Intrinsic::aarch64_sve_prfb_gather_scalar_offset:
return combineSVEPrefetchVecBaseImmOff(N, DAG, 1 /=ScalarSizeInBytes/);		return combineSVEPrefetchVecBaseImmOff(N, DAG, 1 /=ScalarSizeInBytes/);
case Intrinsic::aarch64_sve_prfh_gather_scalar_offset:		case Intrinsic::aarch64_sve_prfh_gather_scalar_offset:
▲ Show 20 Lines • Show All 1,415 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/sve-split-trunc.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mtriple=aarch64--linux-gnu -mattr=+sve < %s \| FileCheck %s

				define <vscale x 16 x i8> @trunc_i16toi8(<vscale x 16 x i16> %in) {
				; CHECK-LABEL: trunc_i16toi8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: uzp1 z0.b, z0.b, z1.b
				; CHECK-NEXT: ret
				%out = trunc <vscale x 16 x i16> %in to <vscale x 16 x i8>
				ret <vscale x 16 x i8> %out
				}

				define <vscale x 16 x i8> @trunc_i32toi8(<vscale x 16 x i32> %in) {
				; CHECK-LABEL: trunc_i32toi8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: uzp1 z2.h, z2.h, z3.h
				; CHECK-NEXT: uzp1 z0.h, z0.h, z1.h
				; CHECK-NEXT: uzp1 z0.b, z0.b, z2.b
				; CHECK-NEXT: ret
				%out = trunc <vscale x 16 x i32> %in to <vscale x 16 x i8>
				ret <vscale x 16 x i8> %out
				}
				paulwalker-armUnsubmitted Done Reply Inline Actions You're missing trunc_i32toi8 for the complete collection. paulwalker-arm: You're missing trunc_i32toi8 for the complete collection.

				define <vscale x 8 x i16> @trunc_i32toi16(<vscale x 8 x i32> %in) {
				; CHECK-LABEL: trunc_i32toi16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: uzp1 z0.h, z0.h, z1.h
				; CHECK-NEXT: ret
				%out = trunc <vscale x 8 x i32> %in to <vscale x 8 x i16>
				ret <vscale x 8 x i16> %out
				}

				define <vscale x 4 x i32> @trunc_i64toi32(<vscale x 4 x i64> %in) {
				; CHECK-LABEL: trunc_i64toi32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: uzp1 z0.s, z0.s, z1.s
				; CHECK-NEXT: ret
				%out = trunc <vscale x 4 x i64> %in to <vscale x 4 x i32>
				ret <vscale x 4 x i32> %out
				}

				define <vscale x 8 x i16> @trunc_i64toi16(<vscale x 8 x i64> %in) {
				; CHECK-LABEL: trunc_i64toi16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: uzp1 z2.s, z2.s, z3.s
				; CHECK-NEXT: uzp1 z0.s, z0.s, z1.s
				; CHECK-NEXT: uzp1 z0.h, z0.h, z2.h
				; CHECK-NEXT: ret
				%out = trunc <vscale x 8 x i64> %in to <vscale x 8 x i16>
				ret <vscale x 8 x i16> %out
				}

				define <vscale x 16 x i8> @trunc_i64toi8(<vscale x 16 x i64> %in) {
				; CHECK-LABEL: trunc_i64toi8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: uzp1 z6.s, z6.s, z7.s
				; CHECK-NEXT: uzp1 z4.s, z4.s, z5.s
				; CHECK-NEXT: uzp1 z2.s, z2.s, z3.s
				; CHECK-NEXT: uzp1 z0.s, z0.s, z1.s
				; CHECK-NEXT: uzp1 z1.h, z4.h, z6.h
				; CHECK-NEXT: uzp1 z0.h, z0.h, z2.h
				; CHECK-NEXT: uzp1 z0.b, z0.b, z1.b
				; CHECK-NEXT: ret
				%out = trunc <vscale x 16 x i64> %in to <vscale x 16 x i8>
				ret <vscale x 16 x i8> %out
				}

This is an archive of the discontinued LLVM Phabricator instance.

[SVE][CodeGen] Legalisation of truncate for scalable vectors
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 290927

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/test/CodeGen/AArch64/sve-split-trunc.ll

This is an archive of the discontinued LLVM Phabricator instance.

[SVE][CodeGen] Legalisation of truncate for scalable vectorsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 290927

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/test/CodeGen/AArch64/sve-split-trunc.ll

[SVE][CodeGen] Legalisation of truncate for scalable vectors
ClosedPublic