This is an archive of the discontinued LLVM Phabricator instance.

[SVE] Custom ISel for fixed length extract/insert_subvector.
ClosedPublic

Authored by paulwalker-arm on Jun 30 2020, 5:58 AM.

Download Raw Diff

Details

Reviewers

efriedma
cameron.mcinally

Commits

rGfb75451775f8: [SVE] Custom ISel for fixed length extract/insert_subvector.

Summary

We use extact_subvector and insert_subvector to "cast" between
fixed length and scalable vectors. This patch adds custom c++
based ISel for the following cases:

fixed_vector = ISD::EXTRACT_SUBVECTOR scalable_vector, 0
scalable_vector = ISD::INSERT_SUBVECTOR undef(scalable_vector), fixed_vector, 0

Which result in either EXTRACT_SUBREG/INSERT_SUBREG for NEON sized
vectors or COPY_TO_REGCLASS otherwise.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

paulwalker-arm created this revision.Jun 30 2020, 5:58 AM

Herald added a reviewer: efriedma. · View Herald TranscriptJun 30 2020, 5:58 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: llvm-commits, psnobl, jfb and 3 others. · View Herald Transcript

paulwalker-arm added a reviewer: cameron.mcinally.Jun 30 2020, 6:00 AM

Harbormaster failed remote builds in B62313: Diff 274441!Jun 30 2020, 7:34 AM

efriedma added inline comments.Jun 30 2020, 12:31 PM

llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
3244	I'm not sure I understand the issue here. Is the problem just that for a pattern, you need to write the type of the result? I don't think there's any problem with writing a pattern involving a type that isn't always legal; if the type isn't legal, it just won't match.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
8714	The comment here is really terse, and I'm not sure this is handling all the cases we need to. In particular, I'm not sure this correctly handles types like nxv2f32.

paulwalker-arm marked an inline comment as done.Jun 30 2020, 1:50 PM

paulwalker-arm added inline comments.

llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
3244	I believe the issue is the runtime nature of the >128bit fixed length vectors means they are not mapped to any register class, which prevents pattern based matching. We did investigate using hwmodes but it didn't prove to be a viable solution.

efriedma added inline comments.Jun 30 2020, 2:16 PM

llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
3244	I believe the issue is the runtime nature of the >128bit fixed length vectors means they are not mapped to any register class You could change that in AArch64RegisterInfo.td, if you wanted to. I don't think that would cause any issues; it's okay if some of the types in the list aren't legal for all subtargets.

paulwalker-arm marked an inline comment as not done.Jun 30 2020, 2:50 PM

paulwalker-arm added inline comments.

llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
3244	Thanks for the tip, I'll take look and see how it works out.

paulwalker-arm marked 3 inline comments as done.Jul 1 2020, 6:04 AM

paulwalker-arm added inline comments.

llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
3244	I tried adding the extra fixed length vector MVTs to ZPRClass but it didn't really work out. Firstly I needed to fix up a bunch of patterns due to "Could not infer all types in pattern" build errors. This might be down to ZPRClass being sized as 128, which looks especially weird once it starts to contain a lot of MVTs that are known bigger than 128. Ultimately though the failing patterns are easily fixed but I guess potentially burdensome since 99.9% of the patterns shouldn't need to care about fixed length SVE. After this the extract_subvector patterns still do not match. I've tracked this down to SDTSubVecExtract's usage of SDTCisSubVecOfVec which is not strictly true given the newly expanded definition of extract_subvector. I cannot just update SDTCisSubVecOfVec because it's used by other operations where we don't want to allow a mixture of fixed length and scalable vectors (e.g. concat_vector). I can add a new variant of SDTCisSubVecOfVec for use by extract_subvector and insert_subvector, but given the minimal usage I'm not sure it's worth it. What do you think?
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
8714	I've create a helper function to detect unpacked vector types and used it here so that only fixed length vector extracts from packed scalable types are considered legal.

Updated selection comments to fully explain the rational. Updated lowering code to be more caution in what is considered legal.

Harbormaster failed remote builds in B62494: Diff 274778!Jul 1 2020, 7:01 AM

efriedma added inline comments.Jul 1 2020, 12:30 PM

llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
3244	I guess it's fine to leave as-is, in that case.
3395	There isn't any corresponding code for INSERT_SUBVECTOR in AArch64ISelLowering?

paulwalker-arm marked an inline comment as done.Jul 1 2020, 1:18 PM

paulwalker-arm added inline comments.

llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
3395	Unlike EXTRACT_SUBVECTOR, AArch64ISelLowering didn't have any custom lowering for INSERT_SUBVECTOR for any vector types. For this reason I assumed the defaults were fine. My new usage when lowering fixed length to SVE only uses the legal variants so doesn't introduce any new requirements.

efriedma added inline comments.Jul 1 2020, 2:15 PM

llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
3376	Do we need the isPackedVectorType check here as well?
3395	In theory, you need exactly the same code. Realistically, I'm not sure target-independent code will actually ever create an insert_subvector with a non-zero index. I briefly tried grepping through the relevant code, and couldn't find anything. I guess we could ignore the issue for now.

paulwalker-arm marked 2 inline comments as done.Jul 1 2020, 3:45 PM

paulwalker-arm added inline comments.

llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
3376	The code in LowerEXTRACT_SUBVECTOR should preclude that case, but I can add some asserts.

Added an implementation for LowerINSERT_SUBVECTOR along with more asserts.

paulwalker-arm marked 2 inline comments as done.Jul 2 2020, 8:09 AM

paulwalker-arm added inline comments.

llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
3244	Thanks. I'll circle back round to this after we've enough functionality for fixed length to be usable.
3395	To be consistent I've provided an implementation. We're not really in a position to properly test it, although it defaults to expand for all the cases where tests don't exist.

Harbormaster failed remote builds in B62692: Diff 275125!Jul 2 2020, 9:10 AM

cameron.mcinally marked an inline comment as done.Jul 6 2020, 10:11 AM

cameron.mcinally added inline comments.

llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
3244	No objections, but I do like Eli's idea for the long term. This would be a step in the right direction for passing fixed-width arguments. Having a restricted IR for fixed-width isn't ideal.
llvm/test/CodeGen/AArch64/sve-fixed-length-subvector.ll
26	Nit: could probably mark the loads/stores volatile to avoid the branch.

paulwalker-arm marked an inline comment as done.Jul 6 2020, 4:12 PM

paulwalker-arm added inline comments.

llvm/test/CodeGen/AArch64/sve-fixed-length-subvector.ll
26	I'm not sure how this helps. The reason for the branch is to force a block boundary to ensure the extract_subvector resulting from lowering the load is not combined with the insert_subvector that's created when lowering the store.

cameron.mcinally added inline comments.Jul 7 2020, 9:29 AM

llvm/test/CodeGen/AArch64/sve-fixed-length-subvector.ll
26	Ok, that makes sense. Let me ask the opposite though -- if the load and store are volatile, will the fixed-width lowering honor the volatile? x = load volatile p y = fneg x x = load volatile p store *p, x Unlikely a problem considering the loads made it to the backend, but would be good to confirm.

paulwalker-arm marked an inline comment as done.Jul 7 2020, 9:51 AM

paulwalker-arm added inline comments.

llvm/test/CodeGen/AArch64/sve-fixed-length-subvector.ll
26	The resulting masked memory operation takes the same MachineMemOperand as the original fixed length operation, so the volatile flag will be maintained.

LGTM

This revision is now accepted and ready to land.Jul 7 2020, 12:28 PM

cameron.mcinally accepted this revision.Jul 7 2020, 12:41 PM

Closed by commit rGfb75451775f8: [SVE] Custom ISel for fixed length extract/insert_subvector. (authored by paulwalker-arm). · Explain WhyJul 8 2020, 2:51 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64ISelDAGToDAG.cpp

103 lines

AArch64ISelLowering.h

1 line

AArch64ISelLowering.cpp

73 lines

test/

CodeGen/

AArch64/

sve-fixed-length-subvector.ll

88 lines

Diff 276355

llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp

Show First 20 Lines • Show All 3,234 Lines • ▼ Show 20 Lines	SDNode *N2 = CurDAG->getMachineNode(AArch64::ADDXrr, DL, MVT::i64,
{SDValue(N1, 0), N->getOperand(2)});		{SDValue(N1, 0), N->getOperand(2)});
SDNode *N3 = CurDAG->getMachineNode(		SDNode *N3 = CurDAG->getMachineNode(
AArch64::ADDG, DL, MVT::i64,		AArch64::ADDG, DL, MVT::i64,
{SDValue(N2, 0), CurDAG->getTargetConstant(0, DL, MVT::i64),		{SDValue(N2, 0), CurDAG->getTargetConstant(0, DL, MVT::i64),
CurDAG->getTargetConstant(TagOffset, DL, MVT::i64)});		CurDAG->getTargetConstant(TagOffset, DL, MVT::i64)});
ReplaceNode(N, N3);		ReplaceNode(N, N3);
}		}

		// NOTE: We cannot use EXTRACT_SUBREG in all cases because the fixed length
		// vector types larger than NEON don't have a matching SubRegIndex.
		efriedmaUnsubmitted Not Done Reply Inline Actions I'm not sure I understand the issue here. Is the problem just that for a pattern, you need to write the type of the result? I don't think there's any problem with writing a pattern involving a type that isn't always legal; if the type isn't legal, it just won't match. efriedma: I'm not sure I understand the issue here. Is the problem just that for a pattern, you need to…
		paulwalker-armAuthorUnsubmitted Not Done Reply Inline Actions I believe the issue is the runtime nature of the >128bit fixed length vectors means they are not mapped to any register class, which prevents pattern based matching. We did investigate using hwmodes but it didn't prove to be a viable solution. paulwalker-arm: I believe the issue is the runtime nature of the >128bit fixed length vectors means they are…
		efriedmaUnsubmitted Not Done Reply Inline Actions I believe the issue is the runtime nature of the >128bit fixed length vectors means they are not mapped to any register class You could change that in AArch64RegisterInfo.td, if you wanted to. I don't think that would cause any issues; it's okay if some of the types in the list aren't legal for all subtargets. efriedma: > I believe the issue is the runtime nature of the >128bit fixed length vectors means they are…
		paulwalker-armAuthorUnsubmitted Not Done Reply Inline Actions Thanks for the tip, I'll take look and see how it works out. paulwalker-arm: Thanks for the tip, I'll take look and see how it works out.
		paulwalker-armAuthorUnsubmitted Done Reply Inline Actions I tried adding the extra fixed length vector MVTs to ZPRClass but it didn't really work out. Firstly I needed to fix up a bunch of patterns due to "Could not infer all types in pattern" build errors. This might be down to ZPRClass being sized as 128, which looks especially weird once it starts to contain a lot of MVTs that are known bigger than 128. Ultimately though the failing patterns are easily fixed but I guess potentially burdensome since 99.9% of the patterns shouldn't need to care about fixed length SVE. After this the extract_subvector patterns still do not match. I've tracked this down to SDTSubVecExtract's usage of SDTCisSubVecOfVec which is not strictly true given the newly expanded definition of extract_subvector. I cannot just update SDTCisSubVecOfVec because it's used by other operations where we don't want to allow a mixture of fixed length and scalable vectors (e.g. concat_vector). I can add a new variant of SDTCisSubVecOfVec for use by extract_subvector and insert_subvector, but given the minimal usage I'm not sure it's worth it. What do you think? paulwalker-arm: I tried adding the extra fixed length vector MVTs to ZPRClass but it didn't really work out.
		efriedmaUnsubmitted Not Done Reply Inline Actions I guess it's fine to leave as-is, in that case. efriedma: I guess it's fine to leave as-is, in that case.
		paulwalker-armAuthorUnsubmitted Done Reply Inline Actions Thanks. I'll circle back round to this after we've enough functionality for fixed length to be usable. paulwalker-arm: Thanks. I'll circle back round to this after we've enough functionality for fixed length to be…
		cameron.mcinallyUnsubmitted Done Reply Inline Actions No objections, but I do like Eli's idea for the long term. This would be a step in the right direction for passing fixed-width arguments. Having a restricted IR for fixed-width isn't ideal. cameron.mcinally: No objections, but I do like Eli's idea for the long term. This would be a step in the right…
		static SDNode extractSubReg(SelectionDAG DAG, EVT VT, SDValue V) {
		assert(V.getValueType().isScalableVector() &&
		V.getValueType().getSizeInBits().getKnownMinSize() ==
		AArch64::SVEBitsPerBlock &&
		"Expected to extract from a packed scalable vector!");
		assert(VT.isFixedLengthVector() &&
		"Expected to extract a fixed length vector!");

		SDLoc DL(V);
		switch (VT.getSizeInBits()) {
		case 64: {
		auto SubReg = DAG->getTargetConstant(AArch64::dsub, DL, MVT::i32);
		return DAG->getMachineNode(TargetOpcode::EXTRACT_SUBREG, DL, VT, V, SubReg);
		}
		case 128: {
		auto SubReg = DAG->getTargetConstant(AArch64::zsub, DL, MVT::i32);
		return DAG->getMachineNode(TargetOpcode::EXTRACT_SUBREG, DL, VT, V, SubReg);
		}
		default: {
		auto RC = DAG->getTargetConstant(AArch64::ZPRRegClassID, DL, MVT::i64);
		return DAG->getMachineNode(TargetOpcode::COPY_TO_REGCLASS, DL, VT, V, RC);
		}
		}
		}

		// NOTE: We cannot use INSERT_SUBREG in all cases because the fixed length
		// vector types larger than NEON don't have a matching SubRegIndex.
		static SDNode insertSubReg(SelectionDAG DAG, EVT VT, SDValue V) {
		assert(VT.isScalableVector() &&
		VT.getSizeInBits().getKnownMinSize() == AArch64::SVEBitsPerBlock &&
		"Expected to insert into a packed scalable vector!");
		assert(V.getValueType().isFixedLengthVector() &&
		"Expected to insert a fixed length vector!");

		SDLoc DL(V);
		switch (V.getValueType().getSizeInBits()) {
		case 64: {
		auto SubReg = DAG->getTargetConstant(AArch64::dsub, DL, MVT::i32);
		auto Container = DAG->getMachineNode(TargetOpcode::IMPLICIT_DEF, DL, VT);
		return DAG->getMachineNode(TargetOpcode::INSERT_SUBREG, DL, VT,
		SDValue(Container, 0), V, SubReg);
		}
		case 128: {
		auto SubReg = DAG->getTargetConstant(AArch64::zsub, DL, MVT::i32);
		auto Container = DAG->getMachineNode(TargetOpcode::IMPLICIT_DEF, DL, VT);
		return DAG->getMachineNode(TargetOpcode::INSERT_SUBREG, DL, VT,
		SDValue(Container, 0), V, SubReg);
		}
		default: {
		auto RC = DAG->getTargetConstant(AArch64::ZPRRegClassID, DL, MVT::i64);
		return DAG->getMachineNode(TargetOpcode::COPY_TO_REGCLASS, DL, VT, V, RC);
		}
		}
		}

void AArch64DAGToDAGISel::Select(SDNode *Node) {		void AArch64DAGToDAGISel::Select(SDNode *Node) {
// If we have a custom node, we already have selected!		// If we have a custom node, we already have selected!
if (Node->isMachineOpcode()) {		if (Node->isMachineOpcode()) {
LLVM_DEBUG(errs() << "== "; Node->dump(CurDAG); errs() << "\n");		LLVM_DEBUG(errs() << "== "; Node->dump(CurDAG); errs() << "\n");
Node->setNodeId(-1);		Node->setNodeId(-1);
return;		return;
}		}

▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	if (tryHighFPExt(Node))
return;		return;
break;		break;

case ISD::OR:		case ISD::OR:
if (tryBitfieldInsertOp(Node))		if (tryBitfieldInsertOp(Node))
return;		return;
break;		break;

		case ISD::EXTRACT_SUBVECTOR: {
		// Bail when not a "cast" like extract_subvector.
		if (cast<ConstantSDNode>(Node->getOperand(1))->getZExtValue() != 0)
		break;
		efriedmaUnsubmitted Not Done Reply Inline Actions Do we need the isPackedVectorType check here as well? efriedma: Do we need the isPackedVectorType check here as well?
		paulwalker-armAuthorUnsubmitted Done Reply Inline Actions The code in LowerEXTRACT_SUBVECTOR should preclude that case, but I can add some asserts. paulwalker-arm: The code in LowerEXTRACT_SUBVECTOR should preclude that case, but I can add some asserts.

		// Bail when normal isel can do the job.
		EVT InVT = Node->getOperand(0).getValueType();
		if (VT.isScalableVector() \|\| InVT.isFixedLengthVector())
		break;

		// NOTE: We can only get here when doing fixed length SVE code generation.
		// We do manual selection because the types involved are not linked to real
		// registers (despite being legal) and must be coerced into SVE registers.
		//
		// NOTE: If the above changes, be aware that selection will still not work
		// because the td definition of extract_vector does not support extracting
		// a fixed length vector from a scalable vector.

		ReplaceNode(Node, extractSubReg(CurDAG, VT, Node->getOperand(0)));
		return;
		}

		case ISD::INSERT_SUBVECTOR: {
		efriedmaUnsubmitted Not Done Reply Inline Actions There isn't any corresponding code for INSERT_SUBVECTOR in AArch64ISelLowering? efriedma: There isn't any corresponding code for INSERT_SUBVECTOR in AArch64ISelLowering?
		paulwalker-armAuthorUnsubmitted Done Reply Inline Actions Unlike EXTRACT_SUBVECTOR, AArch64ISelLowering didn't have any custom lowering for INSERT_SUBVECTOR for any vector types. For this reason I assumed the defaults were fine. My new usage when lowering fixed length to SVE only uses the legal variants so doesn't introduce any new requirements. paulwalker-arm: Unlike EXTRACT_SUBVECTOR, AArch64ISelLowering didn't have any custom lowering for…
		efriedmaUnsubmitted Not Done Reply Inline Actions In theory, you need exactly the same code. Realistically, I'm not sure target-independent code will actually ever create an insert_subvector with a non-zero index. I briefly tried grepping through the relevant code, and couldn't find anything. I guess we could ignore the issue for now. efriedma: In theory, you need exactly the same code. Realistically, I'm not sure target-independent code…
		paulwalker-armAuthorUnsubmitted Done Reply Inline Actions To be consistent I've provided an implementation. We're not really in a position to properly test it, although it defaults to expand for all the cases where tests don't exist. paulwalker-arm: To be consistent I've provided an implementation. We're not really in a position to properly…
		// Bail when not a "cast" like insert_subvector.
		if (cast<ConstantSDNode>(Node->getOperand(2))->getZExtValue() != 0)
		break;
		if (!Node->getOperand(0).isUndef())
		break;

		// Bail when normal isel should do the job.
		EVT InVT = Node->getOperand(1).getValueType();
		if (VT.isFixedLengthVector() \|\| InVT.isScalableVector())
		break;

		// NOTE: We can only get here when doing fixed length SVE code generation.
		// We do manual selection because the types involved are not linked to real
		// registers (despite being legal) and must be coerced into SVE registers.
		//
		// NOTE: If the above changes, be aware that selection will still not work
		// because the td definition of insert_vector does not support inserting a
		// fixed length vector into a scalable vector.

		ReplaceNode(Node, insertSubReg(CurDAG, VT, Node->getOperand(1)));
		return;
		}

case ISD::Constant: {		case ISD::Constant: {
// Materialize zero constants as copies from WZR/XZR. This allows		// Materialize zero constants as copies from WZR/XZR. This allows
// the coalescer to propagate these into other instructions.		// the coalescer to propagate these into other instructions.
ConstantSDNode *ConstNode = cast<ConstantSDNode>(Node);		ConstantSDNode *ConstNode = cast<ConstantSDNode>(Node);
if (ConstNode->isNullValue()) {		if (ConstNode->isNullValue()) {
if (VT == MVT::i32) {		if (VT == MVT::i32) {
SDValue New = CurDAG->getCopyFromReg(		SDValue New = CurDAG->getCopyFromReg(
CurDAG->getEntryNode(), SDLoc(Node), AArch64::WZR, MVT::i32);		CurDAG->getEntryNode(), SDLoc(Node), AArch64::WZR, MVT::i32);
▲ Show 20 Lines • Show All 1,505 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.h

Show First 20 Lines • Show All 844 Lines • ▼ Show 20 Lines	private:
SDValue LowerSCALAR_TO_VECTOR(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerSCALAR_TO_VECTOR(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerBUILD_VECTOR(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerBUILD_VECTOR(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerVECTOR_SHUFFLE(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerVECTOR_SHUFFLE(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerSPLAT_VECTOR(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerSPLAT_VECTOR(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerDUPQLane(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerDUPQLane(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerToPredicatedOp(SDValue Op, SelectionDAG &DAG,		SDValue LowerToPredicatedOp(SDValue Op, SelectionDAG &DAG,
unsigned NewOp) const;		unsigned NewOp) const;
SDValue LowerEXTRACT_SUBVECTOR(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerEXTRACT_SUBVECTOR(SDValue Op, SelectionDAG &DAG) const;
		SDValue LowerINSERT_SUBVECTOR(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerVectorSRA_SRL_SHL(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerVectorSRA_SRL_SHL(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerShiftLeftParts(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerShiftLeftParts(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerShiftRightParts(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerShiftRightParts(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerVSETCC(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerVSETCC(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerCTPOP(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerCTPOP(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerF128Call(SDValue Op, SelectionDAG &DAG,		SDValue LowerF128Call(SDValue Op, SelectionDAG &DAG,
RTLIB::Libcall Call) const;		RTLIB::Libcall Call) const;
SDValue LowerFCOPYSIGN(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFCOPYSIGN(SDValue Op, SelectionDAG &DAG) const;
▲ Show 20 Lines • Show All 99 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 110 Lines • ▼ Show 20 Lines
EnableOptimizeLogicalImm("aarch64-enable-logical-imm", cl::Hidden,		EnableOptimizeLogicalImm("aarch64-enable-logical-imm", cl::Hidden,
cl::desc("Enable AArch64 logical imm instruction "		cl::desc("Enable AArch64 logical imm instruction "
"optimization"),		"optimization"),
cl::init(true));		cl::init(true));

/// Value type used for condition codes.		/// Value type used for condition codes.
static const MVT MVT_CC = MVT::i32;		static const MVT MVT_CC = MVT::i32;

		/// Returns true if VT's elements occupy the lowest bit positions of its
		/// associated register class without any intervening space.
		///
		/// For example, nxv2f16, nxv4f16 and nxv8f16 are legal types that belong to the
		/// same register class, but only nxv8f16 can be treated as a packed vector.
		static inline bool isPackedVectorType(EVT VT, SelectionDAG &DAG) {
		assert(VT.isVector() && DAG.getTargetLoweringInfo().isTypeLegal(VT) &&
		"Expected legal vector type!");
		return VT.isFixedLengthVector() \|\|
		VT.getSizeInBits().getKnownMinSize() == AArch64::SVEBitsPerBlock;
		}

AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,		AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
const AArch64Subtarget &STI)		const AArch64Subtarget &STI)
: TargetLowering(TM), Subtarget(&STI) {		: TargetLowering(TM), Subtarget(&STI) {
// AArch64 doesn't have comparisons which set GPRs or setcc instructions, so		// AArch64 doesn't have comparisons which set GPRs or setcc instructions, so
// we have to make something up. Arbitrarily, choose ZeroOrOne.		// we have to make something up. Arbitrarily, choose ZeroOrOne.
setBooleanContents(ZeroOrOneBooleanContent);		setBooleanContents(ZeroOrOneBooleanContent);
// When comparing vectors the result sets the different elements in the		// When comparing vectors the result sets the different elements in the
// vector to all-one or all-zero.		// vector to all-one or all-zero.
▲ Show 20 Lines • Show All 776 Lines • ▼ Show 20 Lines	AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
}		}

if (Subtarget->hasSVE()) {		if (Subtarget->hasSVE()) {
// FIXME: Add custom lowering of MLOAD to handle different passthrus (not a		// FIXME: Add custom lowering of MLOAD to handle different passthrus (not a
// splat of 0 or undef) once vector selects supported in SVE codegen. See		// splat of 0 or undef) once vector selects supported in SVE codegen. See
// D68877 for more details.		// D68877 for more details.
for (MVT VT : MVT::integer_scalable_vector_valuetypes()) {		for (MVT VT : MVT::integer_scalable_vector_valuetypes()) {
if (isTypeLegal(VT)) {		if (isTypeLegal(VT)) {
		setOperationAction(ISD::INSERT_SUBVECTOR, VT, Custom);
setOperationAction(ISD::SPLAT_VECTOR, VT, Custom);		setOperationAction(ISD::SPLAT_VECTOR, VT, Custom);
setOperationAction(ISD::SELECT, VT, Custom);		setOperationAction(ISD::SELECT, VT, Custom);
setOperationAction(ISD::SDIV, VT, Custom);		setOperationAction(ISD::SDIV, VT, Custom);
setOperationAction(ISD::UDIV, VT, Custom);		setOperationAction(ISD::UDIV, VT, Custom);
setOperationAction(ISD::SMIN, VT, Custom);		setOperationAction(ISD::SMIN, VT, Custom);
setOperationAction(ISD::UMIN, VT, Custom);		setOperationAction(ISD::UMIN, VT, Custom);
setOperationAction(ISD::SMAX, VT, Custom);		setOperationAction(ISD::SMAX, VT, Custom);
setOperationAction(ISD::UMAX, VT, Custom);		setOperationAction(ISD::UMAX, VT, Custom);
setOperationAction(ISD::SHL, VT, Custom);		setOperationAction(ISD::SHL, VT, Custom);
setOperationAction(ISD::SRL, VT, Custom);		setOperationAction(ISD::SRL, VT, Custom);
setOperationAction(ISD::SRA, VT, Custom);		setOperationAction(ISD::SRA, VT, Custom);
if (VT.getScalarType() == MVT::i1)		if (VT.getScalarType() == MVT::i1)
setOperationAction(ISD::SETCC, VT, Custom);		setOperationAction(ISD::SETCC, VT, Custom);
} else {
for (auto VT : { MVT::nxv8i8, MVT::nxv4i16, MVT::nxv2i32 })
setOperationAction(ISD::EXTRACT_SUBVECTOR, VT, Custom);
}		}
}		}

		for (auto VT : {MVT::nxv8i8, MVT::nxv4i16, MVT::nxv2i32})
		setOperationAction(ISD::EXTRACT_SUBVECTOR, VT, Custom);

setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::i8, Custom);		setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::i8, Custom);
setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::i16, Custom);		setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::i16, Custom);

for (MVT VT : MVT::fp_scalable_vector_valuetypes()) {		for (MVT VT : MVT::fp_scalable_vector_valuetypes()) {
if (isTypeLegal(VT)) {		if (isTypeLegal(VT)) {
		setOperationAction(ISD::INSERT_SUBVECTOR, VT, Custom);
setOperationAction(ISD::SPLAT_VECTOR, VT, Custom);		setOperationAction(ISD::SPLAT_VECTOR, VT, Custom);
setOperationAction(ISD::SELECT, VT, Custom);		setOperationAction(ISD::SELECT, VT, Custom);
}		}
}		}

// NOTE: Currently this has to happen after computeRegisterProperties rather		// NOTE: Currently this has to happen after computeRegisterProperties rather
// than the preferred option of combining it with the addRegisterClass call.		// than the preferred option of combining it with the addRegisterClass call.
if (useSVEForFixedLengthVectors()) {		if (useSVEForFixedLengthVectors()) {
▲ Show 20 Lines • Show All 90 Lines • ▼ Show 20 Lines

void AArch64TargetLowering::addTypeForFixedLengthSVE(MVT VT) {		void AArch64TargetLowering::addTypeForFixedLengthSVE(MVT VT) {
assert(VT.isFixedLengthVector() && "Expected fixed length vector type!");		assert(VT.isFixedLengthVector() && "Expected fixed length vector type!");

// By default everything must be expanded.		// By default everything must be expanded.
for (unsigned Op = 0; Op < ISD::BUILTIN_OP_END; ++Op)		for (unsigned Op = 0; Op < ISD::BUILTIN_OP_END; ++Op)
setOperationAction(Op, VT, Expand);		setOperationAction(Op, VT, Expand);

// EXTRACT_SUBVECTOR/INSERT_SUBVECTOR are used to "cast" between scalable		// We use EXTRACT_SUBVECTOR to "cast" a scalable vector to a fixed length one.
// and fixed length vector types, although with the current level of support
// only the former is exercised.
setOperationAction(ISD::EXTRACT_SUBVECTOR, VT, Custom);		setOperationAction(ISD::EXTRACT_SUBVECTOR, VT, Custom);

// Lower fixed length vector operations to scalable equivalents.		// Lower fixed length vector operations to scalable equivalents.
setOperationAction(ISD::ADD, VT, Custom);		setOperationAction(ISD::ADD, VT, Custom);
setOperationAction(ISD::FADD, VT, Custom);		setOperationAction(ISD::FADD, VT, Custom);
setOperationAction(ISD::LOAD, VT, Custom);		setOperationAction(ISD::LOAD, VT, Custom);
setOperationAction(ISD::STORE, VT, Custom);		setOperationAction(ISD::STORE, VT, Custom);
}		}
▲ Show 20 Lines • Show All 2,413 Lines • ▼ Show 20 Lines	SDValue AArch64TargetLowering::LowerOperation(SDValue Op,
case ISD::BUILD_VECTOR:		case ISD::BUILD_VECTOR:
return LowerBUILD_VECTOR(Op, DAG);		return LowerBUILD_VECTOR(Op, DAG);
case ISD::VECTOR_SHUFFLE:		case ISD::VECTOR_SHUFFLE:
return LowerVECTOR_SHUFFLE(Op, DAG);		return LowerVECTOR_SHUFFLE(Op, DAG);
case ISD::SPLAT_VECTOR:		case ISD::SPLAT_VECTOR:
return LowerSPLAT_VECTOR(Op, DAG);		return LowerSPLAT_VECTOR(Op, DAG);
case ISD::EXTRACT_SUBVECTOR:		case ISD::EXTRACT_SUBVECTOR:
return LowerEXTRACT_SUBVECTOR(Op, DAG);		return LowerEXTRACT_SUBVECTOR(Op, DAG);
		case ISD::INSERT_SUBVECTOR:
		return LowerINSERT_SUBVECTOR(Op, DAG);
case ISD::SDIV:		case ISD::SDIV:
return LowerToPredicatedOp(Op, DAG, AArch64ISD::SDIV_PRED);		return LowerToPredicatedOp(Op, DAG, AArch64ISD::SDIV_PRED);
case ISD::UDIV:		case ISD::UDIV:
return LowerToPredicatedOp(Op, DAG, AArch64ISD::UDIV_PRED);		return LowerToPredicatedOp(Op, DAG, AArch64ISD::UDIV_PRED);
case ISD::SMIN:		case ISD::SMIN:
return LowerToPredicatedOp(Op, DAG, AArch64ISD::SMIN_MERGE_OP1);		return LowerToPredicatedOp(Op, DAG, AArch64ISD::SMIN_MERGE_OP1);
case ISD::UMIN:		case ISD::UMIN:
return LowerToPredicatedOp(Op, DAG, AArch64ISD::UMIN_MERGE_OP1);		return LowerToPredicatedOp(Op, DAG, AArch64ISD::UMIN_MERGE_OP1);
▲ Show 20 Lines • Show All 5,194 Lines • ▼ Show 20 Lines	AArch64TargetLowering::LowerEXTRACT_VECTOR_ELT(SDValue Op,

// For extractions, we just return the result directly.		// For extractions, we just return the result directly.
return DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, ExtrTy, WideVec,		return DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, ExtrTy, WideVec,
Op.getOperand(1));		Op.getOperand(1));
}		}

SDValue AArch64TargetLowering::LowerEXTRACT_SUBVECTOR(SDValue Op,		SDValue AArch64TargetLowering::LowerEXTRACT_SUBVECTOR(SDValue Op,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
assert(!Op.getValueType().isScalableVector() &&		assert(Op.getValueType().isFixedLengthVector() &&
"Unexpected scalable type for custom lowering EXTRACT_SUBVECTOR");		"Only cases that extract a fixed length vector are supported!");

EVT VT = Op.getOperand(0).getValueType();		EVT InVT = Op.getOperand(0).getValueType();
SDLoc dl(Op);		unsigned Idx = cast<ConstantSDNode>(Op.getOperand(1))->getZExtValue();
// Just in case...		unsigned Size = Op.getValueSizeInBits();
if (!VT.isVector())
return SDValue();

ConstantSDNode *Cst = dyn_cast<ConstantSDNode>(Op.getOperand(1));		if (InVT.isScalableVector()) {
if (!Cst)		// This will be matched by custom code during ISelDAGToDAG.
return SDValue();		if (Idx == 0 && isPackedVectorType(InVT, DAG))
unsigned Val = Cst->getZExtValue();		return Op;

unsigned Size = Op.getValueSizeInBits();		return SDValue();
		}

// This will get lowered to an appropriate EXTRACT_SUBREG in ISel.		// This will get lowered to an appropriate EXTRACT_SUBREG in ISel.
if (Val == 0)		if (Idx == 0 && InVT.getSizeInBits() <= 128)
return Op;		return Op;
		efriedmaUnsubmitted Done Reply Inline Actions The comment here is really terse, and I'm not sure this is handling all the cases we need to. In particular, I'm not sure this correctly handles types like nxv2f32. efriedma: The comment here is really terse, and I'm not sure this is handling all the cases we need to.
		paulwalker-armAuthorUnsubmitted Done Reply Inline Actions I've create a helper function to detect unpacked vector types and used it here so that only fixed length vector extracts from packed scalable types are considered legal. paulwalker-arm: I've create a helper function to detect unpacked vector types and used it here so that only…

// If this is extracting the upper 64-bits of a 128-bit vector, we match		// If this is extracting the upper 64-bits of a 128-bit vector, we match
// that directly.		// that directly.
if (Size == 64 && Val * VT.getScalarSizeInBits() == 64)		if (Size == 64 && Idx * InVT.getScalarSizeInBits() == 64)
		return Op;

		return SDValue();
		}

		SDValue AArch64TargetLowering::LowerINSERT_SUBVECTOR(SDValue Op,
		SelectionDAG &DAG) const {
		assert(Op.getValueType().isScalableVector() &&
		"Only expect to lower inserts into scalable vectors!");

		EVT InVT = Op.getOperand(1).getValueType();
		unsigned Idx = cast<ConstantSDNode>(Op.getOperand(2))->getZExtValue();

		// We don't have any patterns for scalable vector yet.
		if (InVT.isScalableVector() \|\| !useSVEForFixedLengthVectorVT(InVT))
		return SDValue();

		// This will be matched by custom code during ISelDAGToDAG.
		if (Idx == 0 && isPackedVectorType(InVT, DAG) && Op.getOperand(0).isUndef())
return Op;		return Op;

return SDValue();		return SDValue();
}		}

bool AArch64TargetLowering::isShuffleMaskLegal(ArrayRef<int> M, EVT VT) const {		bool AArch64TargetLowering::isShuffleMaskLegal(ArrayRef<int> M, EVT VT) const {
if (VT.getVectorNumElements() == 4 &&		if (VT.getVectorNumElements() == 4 &&
(VT.is128BitVector() \|\| VT.is64BitVector())) {		(VT.is128BitVector() \|\| VT.is64BitVector())) {
▲ Show 20 Lines • Show All 6,346 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/sve-fixed-length-subvector.ll

This file was added.

				; RUN: llc -aarch64-sve-vector-bits-min=128 -aarch64-enable-atomic-cfg-tidy=false < %s \| FileCheck %s -check-prefix=NO_SVE
				; RUN: llc -aarch64-sve-vector-bits-min=256 -aarch64-enable-atomic-cfg-tidy=false < %s \| FileCheck %s -check-prefixes=CHECK
				; RUN: llc -aarch64-sve-vector-bits-min=384 -aarch64-enable-atomic-cfg-tidy=false < %s \| FileCheck %s -check-prefixes=CHECK
				; RUN: llc -aarch64-sve-vector-bits-min=512 -aarch64-enable-atomic-cfg-tidy=false < %s \| FileCheck %s -check-prefixes=CHECK,VBITS_GE_512
				; RUN: llc -aarch64-sve-vector-bits-min=640 -aarch64-enable-atomic-cfg-tidy=false < %s \| FileCheck %s -check-prefixes=CHECK,VBITS_GE_512
				; RUN: llc -aarch64-sve-vector-bits-min=768 -aarch64-enable-atomic-cfg-tidy=false < %s \| FileCheck %s -check-prefixes=CHECK,VBITS_GE_512
				; RUN: llc -aarch64-sve-vector-bits-min=896 -aarch64-enable-atomic-cfg-tidy=false < %s \| FileCheck %s -check-prefixes=CHECK,VBITS_GE_512
				; RUN: llc -aarch64-sve-vector-bits-min=1024 -aarch64-enable-atomic-cfg-tidy=false < %s \| FileCheck %s -check-prefixes=CHECK,VBITS_GE_512,VBITS_GE_1024
				; RUN: llc -aarch64-sve-vector-bits-min=1152 -aarch64-enable-atomic-cfg-tidy=false < %s \| FileCheck %s -check-prefixes=CHECK,VBITS_GE_512,VBITS_GE_1024
				; RUN: llc -aarch64-sve-vector-bits-min=1280 -aarch64-enable-atomic-cfg-tidy=false < %s \| FileCheck %s -check-prefixes=CHECK,VBITS_GE_512,VBITS_GE_1024
				; RUN: llc -aarch64-sve-vector-bits-min=1408 -aarch64-enable-atomic-cfg-tidy=false < %s \| FileCheck %s -check-prefixes=CHECK,VBITS_GE_512,VBITS_GE_1024
				; RUN: llc -aarch64-sve-vector-bits-min=1536 -aarch64-enable-atomic-cfg-tidy=false < %s \| FileCheck %s -check-prefixes=CHECK,VBITS_GE_512,VBITS_GE_1024
				; RUN: llc -aarch64-sve-vector-bits-min=1664 -aarch64-enable-atomic-cfg-tidy=false < %s \| FileCheck %s -check-prefixes=CHECK,VBITS_GE_512,VBITS_GE_1024
				; RUN: llc -aarch64-sve-vector-bits-min=1792 -aarch64-enable-atomic-cfg-tidy=false < %s \| FileCheck %s -check-prefixes=CHECK,VBITS_GE_512,VBITS_GE_1024
				; RUN: llc -aarch64-sve-vector-bits-min=1920 -aarch64-enable-atomic-cfg-tidy=false < %s \| FileCheck %s -check-prefixes=CHECK,VBITS_GE_512,VBITS_GE_1024
				; RUN: llc -aarch64-sve-vector-bits-min=2048 -aarch64-enable-atomic-cfg-tidy=false < %s \| FileCheck %s -check-prefixes=CHECK,VBITS_GE_512,VBITS_GE_1024,VBITS_GE_2048

				; Test we can code generater patterns of the form:
				; fixed_length_vector = ISD::EXTRACT_SUBVECTOR scalable_vector, 0
				; scalable_vector = ISD::INSERT_SUBVECTOR scalable_vector, fixed_length_vector, 0
				;
				; NOTE: Currently shufflevector does not support scalable vectors so it cannot
				; be used to model the above operations. Instead these tests rely on knowing
				; how fixed length operation are lowered to scalable ones, with multiple blocks
				; ensuring insert/extract sequences are not folded away.

				cameron.mcinallyUnsubmitted Not Done Reply Inline Actions Nit: could probably mark the loads/stores volatile to avoid the branch. cameron.mcinally: Nit: could probably mark the loads/stores volatile to avoid the branch.
				paulwalker-armAuthorUnsubmitted Done Reply Inline Actions I'm not sure how this helps. The reason for the branch is to force a block boundary to ensure the extract_subvector resulting from lowering the load is not combined with the insert_subvector that's created when lowering the store. paulwalker-arm: I'm not sure how this helps. The reason for the branch is to force a block boundary to ensure…
				cameron.mcinallyUnsubmitted Not Done Reply Inline Actions Ok, that makes sense. Let me ask the opposite though -- if the load and store are volatile, will the fixed-width lowering honor the volatile? x = load volatile p y = fneg x x = load volatile p store p, x Unlikely a problem considering the loads made it to the backend, but would be good to confirm. cameron.mcinally:* Ok, that makes sense. Let me ask the opposite though -- if the load and store are volatile…
				paulwalker-armAuthorUnsubmitted Done Reply Inline Actions The resulting masked memory operation takes the same MachineMemOperand as the original fixed length operation, so the volatile flag will be maintained. paulwalker-arm: The resulting masked memory operation takes the same MachineMemOperand as the original fixed…
				target triple = "aarch64-unknown-linux-gnu"

				; Don't use SVE when its registers are no bigger than NEON.
				; NO_SVE-NOT: ptrue

				define void @subvector_v8i32(<8 x i32> %in, <8 x i32> %out) #0 {
				; CHECK-LABEL: subvector_v8i32:
				; CHECK: ptrue [[PG:p[0-9]+]].s, vl8
				; CHECK: ld1w { [[DATA:z[0-9]+.s]] }, [[PG]]/z, [x0]
				; CHECK: st1w { [[DATA]] }, [[PG]], [x1]
				; CHECK: ret
				%a = load <8 x i32>, <8 x i32>* %in
				br label %bb1

				bb1:
				store <8 x i32> %a, <8 x i32>* %out
				ret void
				}

				define void @subvector_v16i32(<16 x i32> %in, <16 x i32> %out) #0 {
				; CHECK-LABEL: subvector_v16i32:
				; VBITS_GE_512: ptrue [[PG:p[0-9]+]].s, vl16
				; VBITS_GE_512: ld1w { [[DATA:z[0-9]+.s]] }, [[PG]]/z, [x0]
				; VBITS_GE_512: st1w { [[DATA]] }, [[PG]], [x1]
				; CHECKT: ret
				%a = load <16 x i32>, <16 x i32>* %in
				br label %bb1

				bb1:
				store <16 x i32> %a, <16 x i32>* %out
				ret void
				}

				define void @subvector_v32i32(<32 x i32> %in, <32 x i32> %out) #0 {
				; CHECK-LABEL: subvector_v32i32:
				; VBITS_GE_1024: ptrue [[PG:p[0-9]+]].s, vl32
				; VBITS_GE_1024: ld1w { [[DATA:z[0-9]+.s]] }, [[PG]]/z, [x0]
				; VBITS_GE_1024: st1w { [[DATA]] }, [[PG]], [x1]
				; CHECK: ret
				%a = load <32 x i32>, <32 x i32>* %in
				br label %bb1

				bb1:
				store <32 x i32> %a, <32 x i32>* %out
				ret void
				}

				define void @subvector_v64i32(<64 x i32> %in, <64 x i32> %out) #0 {
				; CHECK-LABEL: subvector_v64i32:
				; VBITS_GE_2048: ptrue [[PG:p[0-9]+]].s, vl64
				; VBITS_GE_2048: ld1w { [[DATA:z[0-9]+.s]] }, [[PG]]/z, [x0]
				; VBITS_GE_2048: st1w { [[DATA]] }, [[PG]], [x1]
				; CHECK: ret
				%a = load <64 x i32>, <64 x i32>* %in
				br label %bb1

				bb1:
				store <64 x i32> %a, <64 x i32>* %out
				ret void
				}

				attributes #0 = { "target-features"="+sve" }

This is an archive of the discontinued LLVM Phabricator instance.

[SVE] Custom ISel for fixed length extract/insert_subvector.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 276355

llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp

llvm/lib/Target/AArch64/AArch64ISelLowering.h

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/test/CodeGen/AArch64/sve-fixed-length-subvector.ll

[SVE] Custom ISel for fixed length extract/insert_subvector.
ClosedPublic