This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
-
AArch64ISelLowering.h
12/15
AArch64ISelLowering.cpp
3/3
AArch64SVEInstrInfo.td
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
-
sve-streaming-mode-fixed-length-loads.ll
-
sve-streaming-mode-fixed-length-stores.ll

Differential D135564

[AArch64-SVE]: Force generating code compatible to streaming mode.
ClosedPublic

Authored by hassnaa-arm on Oct 10 2022, 2:02 AM.

Download Raw Diff

Details

Reviewers

david-arm
sdesmalen
paulwalker-arm

Commits

rG681888e3ab34: [AArch64-SVE]: Force generating code compatible to streaming mode.

Summary

When streaming mode is enabled, lower some operations and disable some code paths to force generating code compatible to streaming mode.
Add testing files for shifts, build_vector, concat, extract_subvector, extract_vector_elt, and shuffle.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

hassnaa-arm created this revision.Oct 10 2022, 2:02 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 10 2022, 2:02 AM

Herald added subscribers: hiraditya, kristof.beyls, tschuett. · View Herald Transcript

hassnaa-arm requested review of this revision.Oct 10 2022, 2:02 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 10 2022, 2:02 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B191235: Diff 466453.Oct 10 2022, 2:03 AM

hassnaa-arm added reviewers: david-arm, sdesmalen.Oct 10 2022, 2:03 AM

hassnaa-arm added a parent revision: D133433: [AArch64]: Force generating code compatible to streaming mode.Oct 10 2022, 2:08 AM

Matt added a subscriber: Matt.Oct 10 2022, 10:51 PM

Add additional test cases

Harbormaster completed remote builds in B191652: Diff 467015.Oct 11 2022, 10:10 PM

hassnaa-arm added a child revision: D135324: [AArch64-SVE]: force using SVE in streaming mode to lower arithmetic and logical fixed-width vector ops..Oct 11 2022, 10:17 PM

get latest changes of parent revision

Harbormaster completed remote builds in B191654: Diff 467020.Oct 11 2022, 10:28 PM

update by rebasing parent revision

Harbormaster completed remote builds in B191921: Diff 467406.Oct 13 2022, 2:19 AM

Remove unrelated changes

Harbormaster completed remote builds in B191930: Diff 467419.Oct 13 2022, 3:05 AM

Restore some changes removed by mistake

Harbormaster completed remote builds in B191931: Diff 467421.Oct 13 2022, 3:10 AM

Update by parent branch

Harbormaster completed remote builds in B191950: Diff 467451.Oct 13 2022, 5:32 AM

Update by parent patch

Harbormaster completed remote builds in B192008: Diff 467540.Oct 13 2022, 11:53 AM

SjoerdMeijer added a subscriber: SjoerdMeijer.Oct 14 2022, 4:03 AM

SjoerdMeijer added inline comments.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
12110	Drive by comment: we don't need to pass `Subtarget->forceStreamingCompatibleSVE()` but can just query that inside `useSVEForFixedLengthVectorVT`?

hassnaa-arm added inline comments.Oct 14 2022, 4:27 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
12110	I think we still need to pass it, because `useSVEForFixedLengthVectorVT` is used in many places, some of them don't need the flag to be enabled. The flag is used for enabling lowering specific nodes that cause generating invalid code in streaming mode.

Update by changes of parent patch.
While lowering ISD::load remove enabling LowerFixedLengthVectorLoadToSVE,
no need for it, as zero_Extend is custom-lowered.
Previously, LowerLOAD() creates zero_Extend node, which cause invalid generated code,
but now, the zero_extend node is custom-lowered, which cause valid generated code.

Harbormaster completed remote builds in B192185: Diff 467786.Oct 14 2022, 10:07 AM

hassnaa-arm added a reviewer: paulwalker-arm.Oct 17 2022, 3:33 AM

Update by parent patch

Harbormaster completed remote builds in B192461: Diff 468153.Oct 17 2022, 3:51 AM

hassnaa-arm retitled this revision from [AArch64-SVE]: Force generating code compatible to streaming mode. to [AArch64-SVE]: Force generating code compatible to streaming mode for (masked/extending/truncating) load/store.Oct 20 2022, 5:01 AM

Hi @hassnaa-arm, I think you have the tests the wrong way around. The tests from D136147 should be part of this patch, because this is the patch where you're implementing the lowering of the operations you're testing in D136147.
After this patch, you get the masked/truncating/extending load/store operations "for free", so the tests for those operations could be moved to a separate test-only patch like D136147.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1394	It should be able to use standard scalar instructions for v1i64 in streaming-compatible mode, so this one can be removed from the list.
1397	Most scalar FP operations are valid in streaming mode, so we probably don't need to do anything custom for this type.
1608–1610	nit: Perhaps it doesn't lead to an error, but these operations only operate on integers, so should be guarded by: if (VT.isInteger()) { ... }
12281	nit: rather than wrapping this in another condition, can you just add it to the existing condition with `&& !Subtarget->forceStreamingCompatibleSVE()` ?
llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
3049–3050	nit: Can you change this into: let Predicates = [NotInStreamingSVEMode], AddedComplexity = 1 in { def : Pat<...> .. } let Predicates = [NotInStreamingSVEMode] in { def : Pat<..> ... } Rather than indenting?
llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-masked-store.ll
312 ↗	(On Diff #468153)	Can you remove all tests that are larger than "twice the size" of a 128bit vector (v32f32 is 8x the size, I'm not sure what value that adds for the testing of this functionality)

hassnaa-arm marked 6 inline comments as done.Oct 21 2022, 5:33 AM

Remove all tests that are larger than "twice the size" of a 128bit vector.

Remove masked/truncating/extending load/store, to be added in a test-only patch.

hassnaa-arm retitled this revision from [AArch64-SVE]: Force generating code compatible to streaming mode for (masked/extending/truncating) load/store to [AArch64-SVE]: Force generating code compatible to streaming mode..Oct 21 2022, 6:26 AM

hassnaa-arm edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B193498: Diff 469573.Oct 21 2022, 7:13 AM

Lower And operation, and disable replacing 'and' by 'bic' while combining step.

Update by parent patch

Revert changes added by mistake

hassnaa-arm added a child revision: D136147: [AArch64-SVE]: Test enabling streaming mode for tests of: shifts, extract subverter, build vector, concat, and extract vector elt.Oct 21 2022, 8:56 AM

Harbormaster completed remote builds in B193546: Diff 469638.Oct 21 2022, 9:53 AM

Add testing files for shifts, build_vector, concat, extract_subvector, extract_vector_elt, and shuffle.

hassnaa-arm added a child revision: D136585: [AArch64-SVE]: Add tests for masked/truncating/extending load/store while streaming mode is enabled..Oct 24 2022, 2:39 AM

hassnaa-arm edited the summary of this revision. (Show Details)Oct 24 2022, 2:59 AM

hassnaa-arm edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B193897: Diff 470094.Oct 24 2022, 3:13 AM

Hi @hassnaa-arm, I think it's almost there! Most of the tests look good to me. I just had a few minor comments ...

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
11373	I don't think you need to pass in the `Subtarget` here. In the code below you can just do if (VT.isFixedLengthVector() && DAG.getSubtarget<AArch64Subtarget>().forceStreamingCompatibleSVE() return SDValue();
11424	Same comment as above for `tryAdvSIMDModImm32`
13997	nit: The comment can probably be formatted better I think so that you use up 80 chars, i.e.: // Skip if streaming compatible SVE is enabled, because it generates invalid // code in streaming mode when SVE length is not specified.
15735	Again, you can avoid passing in the subtarget here if you make the changes to `tryAdvSIMDModImm32` and `tryAdvSIMDModImm16`.
llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
3044	When we guard something by a predicate we normally add a comment on the final brace '}' to make it easy to see, i.e. something like: } // End NotInStreamingSVEMode
3058	} // End NotInStreamingSVEMode
llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-concat.ll
10 ↗	(On Diff #470094)	I don't think we need to have `vscale_range(2,0)` on these tests, right? We want streaming SVE to work for vector lengths.
llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-extract-vector-elt.ll
17 ↗	(On Diff #470094)	nit: I think in all these tests there should only be 2 spaces at the start of the IR, i.e. `%r = extractelement <2 x half> %op1, i64 1` etc.

Remove vscale_range from concat.ll test file.

Fix identation

Harbormaster completed remote builds in B194663: Diff 471171.Oct 27 2022, 8:50 AM

LGTM! Thanks for making the changes @hassnaa-arm.

This revision is now accepted and ready to land.Oct 27 2022, 11:18 AM

paulwalker-arm added inline comments.Oct 27 2022, 4:28 PM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
22396–22397	As with the above change can this be `V.getValueType().isFixedLengthVector() && isTypeLegal(V.getValueType()) &&`?
llvm/test/CodeGen/AArch64/sve-streaming-fixed-length-int-shifts.ll
1–5 ↗	(On Diff #471171)	Please rename this file `sve-streaming-mode-fixed-length-int-shifts.ll` to match the same format as the others.

hassnaa-arm marked 2 inline comments as done.Oct 28 2022, 3:57 AM

Rename sve-streaming-fixed-length-int-shifts.ll to sve-streaming-mode-fixed-length-int-shifts.ll

paulwalker-arm accepted this revision.Oct 28 2022, 4:04 AM

LGTM as well (please address nit before submitting)

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
22396	nit: Does this cross the 80-character limit? (please use clang-format to be sure)

hassnaa-arm marked an inline comment as done.Oct 28 2022, 4:41 AM

Harbormaster completed remote builds in B194890: Diff 471483.Oct 28 2022, 4:49 AM

sdesmalen added inline comments.Oct 28 2022, 8:01 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1614	nit: I only just spot this now in one of your other patches, but ISD::AND should also be guarded by `VT.isInteger()`.

paulwalker-arm added inline comments.Oct 28 2022, 8:08 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1614	Although not wrong it doesn't really matter as legalisation is smart enough to not care about the operation action for types that make no sense. We rely on this in `addTypeForFixedLengthSVE` where the type is only considered when handling extend-loads/truncating-store plus the odd compare.

sdesmalen added inline comments.Oct 28 2022, 8:12 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1614	In that case it's probably better to remove the condition entirely.

hassnaa-arm marked an inline comment as done.Oct 28 2022, 10:22 AM

This revision was landed with ongoing or failed builds.Oct 31 2022, 4:03 AM

Closed by commit rG681888e3ab34: [AArch64-SVE]: Force generating code compatible to streaming mode. (authored by Hassnaa Hamdi <hassnaa.hamdi@arm.com>). · Explain Why

This revision was automatically updated to reflect the committed changes.

hassnaa-arm added a commit: rG681888e3ab34: [AArch64-SVE]: Force generating code compatible to streaming mode..

CarolineConcatto mentioned this in D147040: [AArch64][CodeGen] Use interleave store for streaming compatible functions.Mar 28 2023, 4:16 AM

CarolineConcatto mentioned this in rGc8192670ecc7: [AArch64][CodeGen] Use interleave store for streaming compatible functions.Apr 13 2023, 1:45 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64ISelLowering.h

1 line

AArch64ISelLowering.cpp

80 lines

AArch64SVEInstrInfo.td

6 lines

test/

CodeGen/

AArch64/

sve-streaming-mode-fixed-length-loads.ll

16 lines

sve-streaming-mode-fixed-length-stores.ll

3 lines

Diff 469638

llvm/lib/Target/AArch64/AArch64ISelLowering.h

	Show First 20 Lines • Show All 893 Lines • ▼ Show 20 Lines
	private:			private:
	/// Keep a pointer to the AArch64Subtarget around so that we can			/// Keep a pointer to the AArch64Subtarget around so that we can
	/// make the right decision when generating code for different targets.			/// make the right decision when generating code for different targets.
	const AArch64Subtarget *Subtarget;			const AArch64Subtarget *Subtarget;

	bool isExtFreeImpl(const Instruction *Ext) const override;			bool isExtFreeImpl(const Instruction *Ext) const override;

	void addTypeForNEON(MVT VT);			void addTypeForNEON(MVT VT);
				void addTypeForStreamingSVE(MVT VT);
	void addTypeForFixedLengthSVE(MVT VT);			void addTypeForFixedLengthSVE(MVT VT);
	void addDRTypeForNEON(MVT VT);			void addDRTypeForNEON(MVT VT);
	void addQRTypeForNEON(MVT VT);			void addQRTypeForNEON(MVT VT);

	unsigned allocateLazySaveBuffer(SDValue &Chain, const SDLoc &DL,			unsigned allocateLazySaveBuffer(SDValue &Chain, const SDLoc &DL,
	SelectionDAG &DAG, Register &Reg) const;			SelectionDAG &DAG, Register &Reg) const;

	SDValue LowerFormalArguments(SDValue Chain, CallingConv::ID CallConv,			SDValue LowerFormalArguments(SDValue Chain, CallingConv::ID CallConv,
	▲ Show 20 Lines • Show All 292 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,383 Lines • ▼ Show 20 Lines	if (Subtarget->hasSVE()) {
// NEON doesn't support 64-bit vector integer muls, but SVE does.		// NEON doesn't support 64-bit vector integer muls, but SVE does.
setOperationAction(ISD::MUL, MVT::v1i64, Custom);		setOperationAction(ISD::MUL, MVT::v1i64, Custom);
setOperationAction(ISD::MUL, MVT::v2i64, Custom);		setOperationAction(ISD::MUL, MVT::v2i64, Custom);

// NEON doesn't support across-vector reductions, but SVE does.		// NEON doesn't support across-vector reductions, but SVE does.
for (auto VT : {MVT::v4f16, MVT::v8f16, MVT::v2f32, MVT::v4f32, MVT::v2f64})		for (auto VT : {MVT::v4f16, MVT::v8f16, MVT::v2f32, MVT::v4f32, MVT::v2f64})
setOperationAction(ISD::VECREDUCE_SEQ_FADD, VT, Custom);		setOperationAction(ISD::VECREDUCE_SEQ_FADD, VT, Custom);

		if (Subtarget->forceStreamingCompatibleSVE()) {
		for (MVT VT : {MVT::v8i8, MVT::v16i8, MVT::v4i16, MVT::v8i16, MVT::v2i32,
		MVT::v4i32, MVT::v2i64})
		sdesmalenUnsubmitted Done Reply Inline Actions It should be able to use standard scalar instructions for v1i64 in streaming-compatible mode, so this one can be removed from the list. sdesmalen: It should be able to use standard scalar instructions for v1i64 in streaming-compatible mode…
		addTypeForStreamingSVE(VT);

		for (MVT VT : {MVT::v4f16, MVT::v8f16, MVT::v2f32, MVT::v4f32, MVT::v2f64})
		sdesmalenUnsubmitted Done Reply Inline Actions Most scalar FP operations are valid in streaming mode, so we probably don't need to do anything custom for this type. sdesmalen: Most scalar FP operations are valid in streaming mode, so we probably don't need to do anything…
		addTypeForStreamingSVE(VT);
		}

// NOTE: Currently this has to happen after computeRegisterProperties rather		// NOTE: Currently this has to happen after computeRegisterProperties rather
// than the preferred option of combining it with the addRegisterClass call.		// than the preferred option of combining it with the addRegisterClass call.
if (Subtarget->useSVEForFixedLengthVectors()) {		if (Subtarget->useSVEForFixedLengthVectors()) {
for (MVT VT : MVT::integer_fixedlen_vector_valuetypes())		for (MVT VT : MVT::integer_fixedlen_vector_valuetypes())
if (useSVEForFixedLengthVectorVT(VT))		if (useSVEForFixedLengthVectorVT(VT))
addTypeForFixedLengthSVE(VT);		addTypeForFixedLengthSVE(VT);
for (MVT VT : MVT::fp_fixedlen_vector_valuetypes())		for (MVT VT : MVT::fp_fixedlen_vector_valuetypes())
if (useSVEForFixedLengthVectorVT(VT))		if (useSVEForFixedLengthVectorVT(VT))
▲ Show 20 Lines • Show All 190 Lines • ▼ Show 20 Lines	bool AArch64TargetLowering::shouldExpandGetActiveLaneMask(EVT ResVT,

// The whilelo instruction only works with i32 or i64 scalar inputs.		// The whilelo instruction only works with i32 or i64 scalar inputs.
if (OpVT != MVT::i32 && OpVT != MVT::i64)		if (OpVT != MVT::i32 && OpVT != MVT::i64)
return true;		return true;

return false;		return false;
}		}

		void AArch64TargetLowering::addTypeForStreamingSVE(MVT VT) {
		if(VT.isInteger()) {
		setOperationAction(ISD::ANY_EXTEND, VT, Custom);
		setOperationAction(ISD::ZERO_EXTEND, VT, Custom);
		sdesmalenUnsubmitted Done Reply Inline Actions nit: Perhaps it doesn't lead to an error, but these operations only operate on integers, so should be guarded by: if (VT.isInteger()) { ... } sdesmalen: nit: Perhaps it doesn't lead to an error, but these operations only operate on integers, so…
		setOperationAction(ISD::SIGN_EXTEND, VT, Custom);
		}
		setOperationAction(ISD::CONCAT_VECTORS, VT, Custom);
		setOperationAction(ISD::AND, VT, Custom);
		sdesmalenUnsubmitted Not Done Reply Inline Actions nit: I only just spot this now in one of your other patches, but ISD::AND should also be guarded by `VT.isInteger()`. sdesmalen: nit: I only just spot this now in one of your other patches, but ISD::AND should also be…
		paulwalker-armUnsubmitted Not Done Reply Inline Actions Although not wrong it doesn't really matter as legalisation is smart enough to not care about the operation action for types that make no sense. We rely on this in `addTypeForFixedLengthSVE` where the type is only considered when handling extend-loads/truncating-store plus the odd compare. paulwalker-arm: Although not wrong it doesn't really matter as legalisation is smart enough to not care about…
		sdesmalenUnsubmitted Done Reply Inline Actions In that case it's probably better to remove the condition entirely. sdesmalen: In that case it's probably better to remove the condition entirely.
		}

void AArch64TargetLowering::addTypeForFixedLengthSVE(MVT VT) {		void AArch64TargetLowering::addTypeForFixedLengthSVE(MVT VT) {
assert(VT.isFixedLengthVector() && "Expected fixed length vector type!");		assert(VT.isFixedLengthVector() && "Expected fixed length vector type!");

// By default everything must be expanded.		// By default everything must be expanded.
for (unsigned Op = 0; Op < ISD::BUILTIN_OP_END; ++Op)		for (unsigned Op = 0; Op < ISD::BUILTIN_OP_END; ++Op)
setOperationAction(Op, VT, Expand);		setOperationAction(Op, VT, Expand);

// We use EXTRACT_SUBVECTOR to "cast" a scalable vector to a fixed length one.		// We use EXTRACT_SUBVECTOR to "cast" a scalable vector to a fixed length one.
▲ Show 20 Lines • Show All 4,143 Lines • ▼ Show 20 Lines	case ISD::SIGN_EXTEND_INREG: {
return LowerToPredicatedOp(Op, DAG,		return LowerToPredicatedOp(Op, DAG,
AArch64ISD::SIGN_EXTEND_INREG_MERGE_PASSTHRU);		AArch64ISD::SIGN_EXTEND_INREG_MERGE_PASSTHRU);
}		}
case ISD::TRUNCATE:		case ISD::TRUNCATE:
return LowerTRUNCATE(Op, DAG);		return LowerTRUNCATE(Op, DAG);
case ISD::MLOAD:		case ISD::MLOAD:
return LowerMLOAD(Op, DAG);		return LowerMLOAD(Op, DAG);
case ISD::LOAD:		case ISD::LOAD:
if (useSVEForFixedLengthVectorVT(Op.getValueType(),		if (useSVEForFixedLengthVectorVT(Op.getValueType()))
Subtarget->forceStreamingCompatibleSVE()))
return LowerFixedLengthVectorLoadToSVE(Op, DAG);		return LowerFixedLengthVectorLoadToSVE(Op, DAG);
return LowerLOAD(Op, DAG);		return LowerLOAD(Op, DAG);
case ISD::ADD:		case ISD::ADD:
case ISD::AND:		case ISD::AND:
case ISD::SUB:		case ISD::SUB:
return LowerToScalableOp(Op, DAG);		return LowerToScalableOp(Op, DAG);
case ISD::FMAXIMUM:		case ISD::FMAXIMUM:
return LowerToPredicatedOp(Op, DAG, AArch64ISD::FMAX_PRED);		return LowerToPredicatedOp(Op, DAG, AArch64ISD::FMAX_PRED);
▲ Show 20 Lines • Show All 5,577 Lines • ▼ Show 20 Lines	static SDValue tryAdvSIMDModImm64(unsigned NewOp, SDValue Op, SelectionDAG &DAG,
}		}

return SDValue();		return SDValue();
}		}

// Try 32-bit splatted SIMD immediate.		// Try 32-bit splatted SIMD immediate.
static SDValue tryAdvSIMDModImm32(unsigned NewOp, SDValue Op, SelectionDAG &DAG,		static SDValue tryAdvSIMDModImm32(unsigned NewOp, SDValue Op, SelectionDAG &DAG,
const APInt &Bits,		const APInt &Bits,
const SDValue *LHS = nullptr) {		const SDValue *LHS = nullptr,
		const AArch64Subtarget *const Subtarget = nullptr) {
		EVT VT = Op.getValueType();
		if(Subtarget && VT.isFixedLengthVector() && Subtarget->forceStreamingCompatibleSVE())
		david-armUnsubmitted Done Reply Inline Actions I don't think you need to pass in the `Subtarget` here. In the code below you can just do if (VT.isFixedLengthVector() && DAG.getSubtarget<AArch64Subtarget>().forceStreamingCompatibleSVE() return SDValue(); david-arm: I don't think you need to pass in the `Subtarget` here. In the code below you can just do if…
		return SDValue();

if (Bits.getHiBits(64) == Bits.getLoBits(64)) {		if (Bits.getHiBits(64) == Bits.getLoBits(64)) {
uint64_t Value = Bits.zextOrTrunc(64).getZExtValue();		uint64_t Value = Bits.zextOrTrunc(64).getZExtValue();
EVT VT = Op.getValueType();		EVT VT = Op.getValueType();
MVT MovTy = (VT.getSizeInBits() == 128) ? MVT::v4i32 : MVT::v2i32;		MVT MovTy = (VT.getSizeInBits() == 128) ? MVT::v4i32 : MVT::v2i32;
bool isAdvSIMDModImm = false;		bool isAdvSIMDModImm = false;
uint64_t Shift;		uint64_t Shift;

if ((isAdvSIMDModImm = AArch64_AM::isAdvSIMDModImmType1(Value))) {		if ((isAdvSIMDModImm = AArch64_AM::isAdvSIMDModImmType1(Value))) {
Show All 31 Lines	static SDValue tryAdvSIMDModImm32(unsigned NewOp, SDValue Op, SelectionDAG &DAG,
}		}

return SDValue();		return SDValue();
}		}

// Try 16-bit splatted SIMD immediate.		// Try 16-bit splatted SIMD immediate.
static SDValue tryAdvSIMDModImm16(unsigned NewOp, SDValue Op, SelectionDAG &DAG,		static SDValue tryAdvSIMDModImm16(unsigned NewOp, SDValue Op, SelectionDAG &DAG,
const APInt &Bits,		const APInt &Bits,
const SDValue *LHS = nullptr) {		const SDValue *LHS = nullptr,
		const AArch64Subtarget *const Subtarget = nullptr) {
		david-armUnsubmitted Done Reply Inline Actions Same comment as above for `tryAdvSIMDModImm32` david-arm: Same comment as above for `tryAdvSIMDModImm32`
		EVT VT = Op.getValueType();
		if(Subtarget && VT.isFixedLengthVector() && Subtarget->forceStreamingCompatibleSVE())
		return SDValue();

if (Bits.getHiBits(64) == Bits.getLoBits(64)) {		if (Bits.getHiBits(64) == Bits.getLoBits(64)) {
uint64_t Value = Bits.zextOrTrunc(64).getZExtValue();		uint64_t Value = Bits.zextOrTrunc(64).getZExtValue();
EVT VT = Op.getValueType();		EVT VT = Op.getValueType();
MVT MovTy = (VT.getSizeInBits() == 128) ? MVT::v8i16 : MVT::v4i16;		MVT MovTy = (VT.getSizeInBits() == 128) ? MVT::v8i16 : MVT::v4i16;
bool isAdvSIMDModImm = false;		bool isAdvSIMDModImm = false;
uint64_t Shift;		uint64_t Shift;

if ((isAdvSIMDModImm = AArch64_AM::isAdvSIMDModImmType5(Value))) {		if ((isAdvSIMDModImm = AArch64_AM::isAdvSIMDModImmType5(Value))) {
▲ Show 20 Lines • Show All 664 Lines • ▼ Show 20 Lines	SDValue AArch64TargetLowering::LowerBUILD_VECTOR(SDValue Op,
LLVM_DEBUG(		LLVM_DEBUG(
dbgs() << "LowerBUILD_VECTOR: use default expansion, failed to find "		dbgs() << "LowerBUILD_VECTOR: use default expansion, failed to find "
"better alternative\n");		"better alternative\n");
return SDValue();		return SDValue();
}		}

SDValue AArch64TargetLowering::LowerCONCAT_VECTORS(SDValue Op,		SDValue AArch64TargetLowering::LowerCONCAT_VECTORS(SDValue Op,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
if (useSVEForFixedLengthVectorVT(Op.getValueType()))		if (useSVEForFixedLengthVectorVT(Op.getValueType(),
		Subtarget->forceStreamingCompatibleSVE()))
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions Drive by comment: we don't need to pass `Subtarget->forceStreamingCompatibleSVE()` but can just query that inside `useSVEForFixedLengthVectorVT`? SjoerdMeijer: Drive by comment: we don't need to pass `Subtarget->forceStreamingCompatibleSVE()` but can…
		hassnaa-armAuthorUnsubmitted Done Reply Inline Actions I think we still need to pass it, because `useSVEForFixedLengthVectorVT` is used in many places, some of them don't need the flag to be enabled. The flag is used for enabling lowering specific nodes that cause generating invalid code in streaming mode. hassnaa-arm: I think we still need to pass it, because `useSVEForFixedLengthVectorVT` is used in many places…
return LowerFixedLengthConcatVectorsToSVE(Op, DAG);		return LowerFixedLengthConcatVectorsToSVE(Op, DAG);

assert(Op.getValueType().isScalableVector() &&		assert(Op.getValueType().isScalableVector() &&
isTypeLegal(Op.getValueType()) &&		isTypeLegal(Op.getValueType()) &&
"Expected legal scalable vector type!");		"Expected legal scalable vector type!");

if (isTypeLegal(Op.getOperand(0).getValueType())) {		if (isTypeLegal(Op.getOperand(0).getValueType())) {
unsigned NumOperands = Op->getNumOperands();		unsigned NumOperands = Op->getNumOperands();
▲ Show 20 Lines • Show All 89 Lines • ▼ Show 20 Lines	if (VT.getScalarType() == MVT::i1) {
SDValue Extend =		SDValue Extend =
DAG.getNode(ISD::ANY_EXTEND, DL, VectorVT, Op.getOperand(0));		DAG.getNode(ISD::ANY_EXTEND, DL, VectorVT, Op.getOperand(0));
MVT ExtractTy = VectorVT == MVT::nxv2i64 ? MVT::i64 : MVT::i32;		MVT ExtractTy = VectorVT == MVT::nxv2i64 ? MVT::i64 : MVT::i32;
SDValue Extract = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, ExtractTy,		SDValue Extract = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, ExtractTy,
Extend, Op.getOperand(1));		Extend, Op.getOperand(1));
return DAG.getAnyExtOrTrunc(Extract, DL, Op.getValueType());		return DAG.getAnyExtOrTrunc(Extract, DL, Op.getValueType());
}		}

if (useSVEForFixedLengthVectorVT(VT))		if (useSVEForFixedLengthVectorVT(VT,
		Subtarget->forceStreamingCompatibleSVE()))
return LowerFixedLengthExtractVectorElt(Op, DAG);		return LowerFixedLengthExtractVectorElt(Op, DAG);

// Check for non-constant or out of range lane.		// Check for non-constant or out of range lane.
ConstantSDNode *CI = dyn_cast<ConstantSDNode>(Op.getOperand(1));		ConstantSDNode *CI = dyn_cast<ConstantSDNode>(Op.getOperand(1));
if (!CI \|\| CI->getZExtValue() >= VT.getVectorNumElements())		if (!CI \|\| CI->getZExtValue() >= VT.getVectorNumElements())
return SDValue();		return SDValue();

// Insertion/extraction are legal for V128 types.		// Insertion/extraction are legal for V128 types.
▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	SDValue AArch64TargetLowering::LowerEXTRACT_SUBVECTOR(SDValue Op,

// This will get lowered to an appropriate EXTRACT_SUBREG in ISel.		// This will get lowered to an appropriate EXTRACT_SUBREG in ISel.
if (Idx == 0 && InVT.getSizeInBits() <= 128)		if (Idx == 0 && InVT.getSizeInBits() <= 128)
return Op;		return Op;

// If this is extracting the upper 64-bits of a 128-bit vector, we match		// If this is extracting the upper 64-bits of a 128-bit vector, we match
// that directly.		// that directly.
if (Size == 64 && Idx * InVT.getScalarSizeInBits() == 64 &&		if (Size == 64 && Idx * InVT.getScalarSizeInBits() == 64 &&
InVT.getSizeInBits() == 128)		InVT.getSizeInBits() == 128 &&
		!Subtarget->forceStreamingCompatibleSVE())
return Op;		return Op;
		sdesmalenUnsubmitted Done Reply Inline Actions nit: rather than wrapping this in another condition, can you just add it to the existing condition with `&& !Subtarget->forceStreamingCompatibleSVE()` ? sdesmalen: nit: rather than wrapping this in another condition, can you just add it to the existing…

if (useSVEForFixedLengthVectorVT(InVT)) {		if (useSVEForFixedLengthVectorVT(InVT,
		Subtarget->forceStreamingCompatibleSVE())) {
SDLoc DL(Op);		SDLoc DL(Op);

EVT ContainerVT = getContainerForFixedLengthVector(DAG, InVT);		EVT ContainerVT = getContainerForFixedLengthVector(DAG, InVT);
SDValue NewInVec =		SDValue NewInVec =
convertToScalableVector(DAG, ContainerVT, Op.getOperand(0));		convertToScalableVector(DAG, ContainerVT, Op.getOperand(0));

SDValue Splice = DAG.getNode(ISD::VECTOR_SPLICE, DL, ContainerVT, NewInVec,		SDValue Splice = DAG.getNode(ISD::VECTOR_SPLICE, DL, ContainerVT, NewInVec,
NewInVec, DAG.getConstant(Idx, DL, MVT::i64));		NewInVec, DAG.getConstant(Idx, DL, MVT::i64));
▲ Show 20 Lines • Show All 171 Lines • ▼ Show 20 Lines	SDValue AArch64TargetLowering::LowerDIV(SDValue Op, SelectionDAG &DAG) const {
SDValue Op1Hi = DAG.getNode(UnpkHi, dl, WidenedVT, Op.getOperand(1));		SDValue Op1Hi = DAG.getNode(UnpkHi, dl, WidenedVT, Op.getOperand(1));
SDValue ResultLo = DAG.getNode(Op.getOpcode(), dl, WidenedVT, Op0Lo, Op1Lo);		SDValue ResultLo = DAG.getNode(Op.getOpcode(), dl, WidenedVT, Op0Lo, Op1Lo);
SDValue ResultHi = DAG.getNode(Op.getOpcode(), dl, WidenedVT, Op0Hi, Op1Hi);		SDValue ResultHi = DAG.getNode(Op.getOpcode(), dl, WidenedVT, Op0Hi, Op1Hi);
return DAG.getNode(AArch64ISD::UZP1, dl, VT, ResultLo, ResultHi);		return DAG.getNode(AArch64ISD::UZP1, dl, VT, ResultLo, ResultHi);
}		}

bool AArch64TargetLowering::isShuffleMaskLegal(ArrayRef<int> M, EVT VT) const {		bool AArch64TargetLowering::isShuffleMaskLegal(ArrayRef<int> M, EVT VT) const {
// Currently no fixed length shuffles that require SVE are legal.		// Currently no fixed length shuffles that require SVE are legal.
if (useSVEForFixedLengthVectorVT(VT))		if (useSVEForFixedLengthVectorVT(VT, Subtarget->forceStreamingCompatibleSVE()))
return false;		return false;

if (VT.getVectorNumElements() == 4 &&		if (VT.getVectorNumElements() == 4 &&
(VT.is128BitVector() \|\| VT.is64BitVector())) {		(VT.is128BitVector() \|\| VT.is64BitVector())) {
unsigned Cost = getPerfectShuffleCost(M);		unsigned Cost = getPerfectShuffleCost(M);
if (Cost <= 1)		if (Cost <= 1)
return true;		return true;
}		}
▲ Show 20 Lines • Show All 93 Lines • ▼ Show 20 Lines	SDValue AArch64TargetLowering::LowerVectorSRA_SRL_SHL(SDValue Op,
int64_t Cnt;		int64_t Cnt;

if (!Op.getOperand(1).getValueType().isVector())		if (!Op.getOperand(1).getValueType().isVector())
return Op;		return Op;
unsigned EltSize = VT.getScalarSizeInBits();		unsigned EltSize = VT.getScalarSizeInBits();

switch (Op.getOpcode()) {		switch (Op.getOpcode()) {
case ISD::SHL:		case ISD::SHL:
if (VT.isScalableVector() \|\| useSVEForFixedLengthVectorVT(VT))		if (VT.isScalableVector() \|\|
		useSVEForFixedLengthVectorVT(VT,
		Subtarget->forceStreamingCompatibleSVE()))
return LowerToPredicatedOp(Op, DAG, AArch64ISD::SHL_PRED);		return LowerToPredicatedOp(Op, DAG, AArch64ISD::SHL_PRED);

if (isVShiftLImm(Op.getOperand(1), VT, false, Cnt) && Cnt < EltSize)		if (isVShiftLImm(Op.getOperand(1), VT, false, Cnt) && Cnt < EltSize)
return DAG.getNode(AArch64ISD::VSHL, DL, VT, Op.getOperand(0),		return DAG.getNode(AArch64ISD::VSHL, DL, VT, Op.getOperand(0),
DAG.getConstant(Cnt, DL, MVT::i32));		DAG.getConstant(Cnt, DL, MVT::i32));
return DAG.getNode(ISD::INTRINSIC_WO_CHAIN, DL, VT,		return DAG.getNode(ISD::INTRINSIC_WO_CHAIN, DL, VT,
DAG.getConstant(Intrinsic::aarch64_neon_ushl, DL,		DAG.getConstant(Intrinsic::aarch64_neon_ushl, DL,
MVT::i32),		MVT::i32),
Op.getOperand(0), Op.getOperand(1));		Op.getOperand(0), Op.getOperand(1));
case ISD::SRA:		case ISD::SRA:
case ISD::SRL:		case ISD::SRL:
if (VT.isScalableVector() \|\| useSVEForFixedLengthVectorVT(VT)) {		if (VT.isScalableVector() \|\|
		useSVEForFixedLengthVectorVT(
		VT, Subtarget->forceStreamingCompatibleSVE())) {
unsigned Opc = Op.getOpcode() == ISD::SRA ? AArch64ISD::SRA_PRED		unsigned Opc = Op.getOpcode() == ISD::SRA ? AArch64ISD::SRA_PRED
: AArch64ISD::SRL_PRED;		: AArch64ISD::SRL_PRED;
return LowerToPredicatedOp(Op, DAG, Opc);		return LowerToPredicatedOp(Op, DAG, Opc);
}		}

// Right shift immediate		// Right shift immediate
if (isVShiftRImm(Op.getOperand(1), VT, false, Cnt) && Cnt < EltSize) {		if (isVShiftRImm(Op.getOperand(1), VT, false, Cnt) && Cnt < EltSize) {
unsigned Opc =		unsigned Opc =
▲ Show 20 Lines • Show All 1,382 Lines • ▼ Show 20 Lines
/// Into:		/// Into:
/// %sub.v0 = shuffle <32 x i32> %v0, <32 x i32> v1, <4, 5, 6, 7>		/// %sub.v0 = shuffle <32 x i32> %v0, <32 x i32> v1, <4, 5, 6, 7>
/// %sub.v1 = shuffle <32 x i32> %v0, <32 x i32> v1, <32, 33, 34, 35>		/// %sub.v1 = shuffle <32 x i32> %v0, <32 x i32> v1, <32, 33, 34, 35>
/// %sub.v2 = shuffle <32 x i32> %v0, <32 x i32> v1, <16, 17, 18, 19>		/// %sub.v2 = shuffle <32 x i32> %v0, <32 x i32> v1, <16, 17, 18, 19>
/// call void llvm.aarch64.neon.st3(%sub.v0, %sub.v1, %sub.v2, %ptr)		/// call void llvm.aarch64.neon.st3(%sub.v0, %sub.v1, %sub.v2, %ptr)
bool AArch64TargetLowering::lowerInterleavedStore(StoreInst *SI,		bool AArch64TargetLowering::lowerInterleavedStore(StoreInst *SI,
ShuffleVectorInst *SVI,		ShuffleVectorInst *SVI,
unsigned Factor) const {		unsigned Factor) const {
		// Skip if streaming compatible SVE is enabled,
		david-armUnsubmitted Done Reply Inline Actions nit: The comment can probably be formatted better I think so that you use up 80 chars, i.e.: // Skip if streaming compatible SVE is enabled, because it generates invalid // code in streaming mode when SVE length is not specified. david-arm: nit: The comment can probably be formatted better I think so that you use up 80 chars, i.e.
		// because it generates invalid code in streaming mode when SVE length is not specified.
		if(Subtarget->forceStreamingCompatibleSVE())
		return false;

assert(Factor >= 2 && Factor <= getMaxSupportedInterleaveFactor() &&		assert(Factor >= 2 && Factor <= getMaxSupportedInterleaveFactor() &&
"Invalid interleave factor");		"Invalid interleave factor");

auto *VecTy = cast<FixedVectorType>(SVI->getType());		auto *VecTy = cast<FixedVectorType>(SVI->getType());
assert(VecTy->getNumElements() % Factor == 0 && "Invalid interleaved store");		assert(VecTy->getNumElements() % Factor == 0 && "Invalid interleaved store");

unsigned LaneLen = VecTy->getNumElements() / Factor;		unsigned LaneLen = VecTy->getNumElements() / Factor;
Type *EltTy = VecTy->getElementType();		Type *EltTy = VecTy->getElementType();
▲ Show 20 Lines • Show All 1,716 Lines • ▼ Show 20 Lines	static SDValue performSVEAndCombine(SDNode *N,

if (isConstantSplatVectorMaskForType(Mask.getNode(), MemVT))		if (isConstantSplatVectorMaskForType(Mask.getNode(), MemVT))
return Src;		return Src;

return SDValue();		return SDValue();
}		}

static SDValue performANDCombine(SDNode *N,		static SDValue performANDCombine(SDNode *N,
TargetLowering::DAGCombinerInfo &DCI) {		TargetLowering::DAGCombinerInfo &DCI,
		const AArch64Subtarget *const Subtarget) {
		david-armUnsubmitted Done Reply Inline Actions Again, you can avoid passing in the subtarget here if you make the changes to `tryAdvSIMDModImm32` and `tryAdvSIMDModImm16`. david-arm: Again, you can avoid passing in the subtarget here if you make the changes to…
SelectionDAG &DAG = DCI.DAG;		SelectionDAG &DAG = DCI.DAG;
SDValue LHS = N->getOperand(0);		SDValue LHS = N->getOperand(0);
SDValue RHS = N->getOperand(1);		SDValue RHS = N->getOperand(1);
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);

if (SDValue R = performANDORCSELCombine(N, DAG))		if (SDValue R = performANDORCSELCombine(N, DAG))
return R;		return R;

Show All 18 Lines	static SDValue performANDCombine(SDNode *N,
// (and x, (movi imm)) form, even though an mvni representation also exists.		// (and x, (movi imm)) form, even though an mvni representation also exists.
APInt DefBits(VT.getSizeInBits(), 0);		APInt DefBits(VT.getSizeInBits(), 0);
APInt UndefBits(VT.getSizeInBits(), 0);		APInt UndefBits(VT.getSizeInBits(), 0);
if (resolveBuildVector(BVN, DefBits, UndefBits)) {		if (resolveBuildVector(BVN, DefBits, UndefBits)) {
SDValue NewOp;		SDValue NewOp;

DefBits = ~DefBits;		DefBits = ~DefBits;
if ((NewOp = tryAdvSIMDModImm32(AArch64ISD::BICi, SDValue(N, 0), DAG,		if ((NewOp = tryAdvSIMDModImm32(AArch64ISD::BICi, SDValue(N, 0), DAG,
DefBits, &LHS)) \|\|		DefBits, &LHS, Subtarget)) \|\|
(NewOp = tryAdvSIMDModImm16(AArch64ISD::BICi, SDValue(N, 0), DAG,		(NewOp = tryAdvSIMDModImm16(AArch64ISD::BICi, SDValue(N, 0), DAG,
DefBits, &LHS)))		DefBits, &LHS, Subtarget)))
return NewOp;		return NewOp;

UndefBits = ~UndefBits;		UndefBits = ~UndefBits;
if ((NewOp = tryAdvSIMDModImm32(AArch64ISD::BICi, SDValue(N, 0), DAG,		if ((NewOp = tryAdvSIMDModImm32(AArch64ISD::BICi, SDValue(N, 0), DAG,
UndefBits, &LHS)) \|\|		UndefBits, &LHS, Subtarget)) \|\|
(NewOp = tryAdvSIMDModImm16(AArch64ISD::BICi, SDValue(N, 0), DAG,		(NewOp = tryAdvSIMDModImm16(AArch64ISD::BICi, SDValue(N, 0), DAG,
UndefBits, &LHS)))		UndefBits, &LHS, Subtarget)))
return NewOp;		return NewOp;
}		}

return SDValue();		return SDValue();
}		}

static bool hasPairwiseAdd(unsigned Opcode, EVT VT, bool FullFP16) {		static bool hasPairwiseAdd(unsigned Opcode, EVT VT, bool FullFP16) {
switch (Opcode) {		switch (Opcode) {
▲ Show 20 Lines • Show All 4,777 Lines • ▼ Show 20 Lines	SDValue AArch64TargetLowering::PerformDAGCombine(SDNode *N,
case ISD::FP_TO_SINT_SAT:		case ISD::FP_TO_SINT_SAT:
case ISD::FP_TO_UINT_SAT:		case ISD::FP_TO_UINT_SAT:
return performFpToIntCombine(N, DAG, DCI, Subtarget);		return performFpToIntCombine(N, DAG, DCI, Subtarget);
case ISD::FDIV:		case ISD::FDIV:
return performFDivCombine(N, DAG, DCI, Subtarget);		return performFDivCombine(N, DAG, DCI, Subtarget);
case ISD::OR:		case ISD::OR:
return performORCombine(N, DCI, Subtarget);		return performORCombine(N, DCI, Subtarget);
case ISD::AND:		case ISD::AND:
return performANDCombine(N, DCI);		return performANDCombine(N, DCI, Subtarget);
case ISD::INTRINSIC_WO_CHAIN:		case ISD::INTRINSIC_WO_CHAIN:
return performIntrinsicCombine(N, DCI, Subtarget);		return performIntrinsicCombine(N, DCI, Subtarget);
case ISD::ANY_EXTEND:		case ISD::ANY_EXTEND:
case ISD::ZERO_EXTEND:		case ISD::ZERO_EXTEND:
case ISD::SIGN_EXTEND:		case ISD::SIGN_EXTEND:
return performExtendCombine(N, DCI, DAG);		return performExtendCombine(N, DCI, DAG);
case ISD::SIGN_EXTEND_INREG:		case ISD::SIGN_EXTEND_INREG:
return performSignExtendInRegCombine(N, DCI, DAG);		return performSignExtendInRegCombine(N, DCI, DAG);
▲ Show 20 Lines • Show All 1,790 Lines • ▼ Show 20 Lines
}		}

// If a fixed length vector operation has no side effects when applied to		// If a fixed length vector operation has no side effects when applied to
// undefined elements, we can safely use scalable vectors to perform the same		// undefined elements, we can safely use scalable vectors to perform the same
// operation without needing to worry about predication.		// operation without needing to worry about predication.
SDValue AArch64TargetLowering::LowerToScalableOp(SDValue Op,		SDValue AArch64TargetLowering::LowerToScalableOp(SDValue Op,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
EVT VT = Op.getValueType();		EVT VT = Op.getValueType();
assert(useSVEForFixedLengthVectorVT(VT) &&		assert(VT.isFixedLengthVector() && isTypeLegal(VT) &&
"Only expected to lower fixed length vector operation!");		"Only expected to lower fixed length vector operation!");
EVT ContainerVT = getContainerForFixedLengthVector(DAG, VT);		EVT ContainerVT = getContainerForFixedLengthVector(DAG, VT);

// Create list of operands by converting existing ones to scalable types.		// Create list of operands by converting existing ones to scalable types.
SmallVector<SDValue, 4> Ops;		SmallVector<SDValue, 4> Ops;
for (const SDValue &V : Op->op_values()) {		for (const SDValue &V : Op->op_values()) {
assert(!isa<VTSDNode>(V) && "Unexpected VTSDNode node!");		assert(!isa<VTSDNode>(V) && "Unexpected VTSDNode node!");

// Pass through non-vector operands.		// Pass through non-vector operands.
if (!V.getValueType().isVector()) {		if (!V.getValueType().isVector()) {
Ops.push_back(V);		Ops.push_back(V);
continue;		continue;
}		}

// "cast" fixed length vector to a scalable vector.		// "cast" fixed length vector to a scalable vector.
assert(useSVEForFixedLengthVectorVT(V.getValueType()) &&		assert(useSVEForFixedLengthVectorVT(V.getValueType(), Subtarget->forceStreamingCompatibleSVE()) &&
		sdesmalenUnsubmitted Done Reply Inline Actions nit: Does this cross the 80-character limit? (please use clang-format to be sure) sdesmalen: nit: Does this cross the 80-character limit? (please use clang-format to be sure)
"Only fixed length vectors are supported!");		"Only fixed length vectors are supported!");
		paulwalker-armUnsubmitted Done Reply Inline Actions As with the above change can this be `V.getValueType().isFixedLengthVector() && isTypeLegal(V.getValueType()) &&`? paulwalker-arm: As with the above change can this be `V.getValueType().isFixedLengthVector() && isTypeLegal(V.
Ops.push_back(convertToScalableVector(DAG, ContainerVT, V));		Ops.push_back(convertToScalableVector(DAG, ContainerVT, V));
}		}

auto ScalableRes = DAG.getNode(Op.getOpcode(), SDLoc(Op), ContainerVT, Ops);		auto ScalableRes = DAG.getNode(Op.getOpcode(), SDLoc(Op), ContainerVT, Ops);
return convertFromScalableVector(DAG, VT, ScalableRes);		return convertFromScalableVector(DAG, VT, ScalableRes);
}		}

SDValue AArch64TargetLowering::LowerVECREDUCE_SEQ_FADD(SDValue ScalarOp,		SDValue AArch64TargetLowering::LowerVECREDUCE_SEQ_FADD(SDValue ScalarOp,
▲ Show 20 Lines • Show All 569 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td

Show First 20 Lines • Show All 3,026 Lines • ▼ Show 20 Lines	let Predicates = [HasSVEorSME] in {
def : Pat<(f32 (vector_extract (nxv4f32 ZPR:$vec), sve_elm_idx_extdup_s:$index)),		def : Pat<(f32 (vector_extract (nxv4f32 ZPR:$vec), sve_elm_idx_extdup_s:$index)),
(EXTRACT_SUBREG (DUP_ZZI_S ZPR:$vec, sve_elm_idx_extdup_s:$index), ssub)>;		(EXTRACT_SUBREG (DUP_ZZI_S ZPR:$vec, sve_elm_idx_extdup_s:$index), ssub)>;
def : Pat<(f32 (vector_extract (nxv2f32 ZPR:$vec), sve_elm_idx_extdup_d:$index)),		def : Pat<(f32 (vector_extract (nxv2f32 ZPR:$vec), sve_elm_idx_extdup_d:$index)),
(EXTRACT_SUBREG (DUP_ZZI_D ZPR:$vec, sve_elm_idx_extdup_d:$index), ssub)>;		(EXTRACT_SUBREG (DUP_ZZI_D ZPR:$vec, sve_elm_idx_extdup_d:$index), ssub)>;
def : Pat<(f64 (vector_extract (nxv2f64 ZPR:$vec), sve_elm_idx_extdup_d:$index)),		def : Pat<(f64 (vector_extract (nxv2f64 ZPR:$vec), sve_elm_idx_extdup_d:$index)),
(EXTRACT_SUBREG (DUP_ZZI_D ZPR:$vec, sve_elm_idx_extdup_d:$index), dsub)>;		(EXTRACT_SUBREG (DUP_ZZI_D ZPR:$vec, sve_elm_idx_extdup_d:$index), dsub)>;

// Extract element from vector with immediate index that's within the bottom 128-bits.		// Extract element from vector with immediate index that's within the bottom 128-bits.
let AddedComplexity = 1 in {		let Predicates = [NotInStreamingSVEMode], AddedComplexity = 1 in {
def : Pat<(i32 (vector_extract (nxv16i8 ZPR:$vec), VectorIndexB:$index)),		def : Pat<(i32 (vector_extract (nxv16i8 ZPR:$vec), VectorIndexB:$index)),
(i32 (UMOVvi8 (v16i8 (EXTRACT_SUBREG ZPR:$vec, zsub)), VectorIndexB:$index))>;		(i32 (UMOVvi8 (v16i8 (EXTRACT_SUBREG ZPR:$vec, zsub)), VectorIndexB:$index))>;
def : Pat<(i32 (vector_extract (nxv8i16 ZPR:$vec), VectorIndexH:$index)),		def : Pat<(i32 (vector_extract (nxv8i16 ZPR:$vec), VectorIndexH:$index)),
(i32 (UMOVvi16 (v8i16 (EXTRACT_SUBREG ZPR:$vec, zsub)), VectorIndexH:$index))>;		(i32 (UMOVvi16 (v8i16 (EXTRACT_SUBREG ZPR:$vec, zsub)), VectorIndexH:$index))>;
def : Pat<(i32 (vector_extract (nxv4i32 ZPR:$vec), VectorIndexS:$index)),		def : Pat<(i32 (vector_extract (nxv4i32 ZPR:$vec), VectorIndexS:$index)),
(i32 (UMOVvi32 (v4i32 (EXTRACT_SUBREG ZPR:$vec, zsub)), VectorIndexS:$index))>;		(i32 (UMOVvi32 (v4i32 (EXTRACT_SUBREG ZPR:$vec, zsub)), VectorIndexS:$index))>;
def : Pat<(i64 (vector_extract (nxv2i64 ZPR:$vec), VectorIndexD:$index)),		def : Pat<(i64 (vector_extract (nxv2i64 ZPR:$vec), VectorIndexD:$index)),
(i64 (UMOVvi64 (v2i64 (EXTRACT_SUBREG ZPR:$vec, zsub)), VectorIndexD:$index))>;		(i64 (UMOVvi64 (v2i64 (EXTRACT_SUBREG ZPR:$vec, zsub)), VectorIndexD:$index))>;
}		}
		david-armUnsubmitted Done Reply Inline Actions When we guard something by a predicate we normally add a comment on the final brace '}' to make it easy to see, i.e. something like: } // End NotInStreamingSVEMode david-arm: When we guard something by a predicate we normally add a comment on the final brace '}' to make…
		let Predicates = [NotInStreamingSVEMode] in {
def : Pat<(sext_inreg (vector_extract (nxv16i8 ZPR:$vec), VectorIndexB:$index), i8),		def : Pat<(sext_inreg (vector_extract (nxv16i8 ZPR:$vec), VectorIndexB:$index), i8),
(i32 (SMOVvi8to32 (v16i8 (EXTRACT_SUBREG ZPR:$vec, zsub)), VectorIndexB:$index))>;		(i32 (SMOVvi8to32 (v16i8 (EXTRACT_SUBREG ZPR:$vec, zsub)), VectorIndexB:$index))>;
def : Pat<(sext_inreg (anyext (vector_extract (nxv16i8 ZPR:$vec), VectorIndexB:$index)), i8),		def : Pat<(sext_inreg (anyext (vector_extract (nxv16i8 ZPR:$vec), VectorIndexB:$index)), i8),
(i64 (SMOVvi8to64 (v16i8 (EXTRACT_SUBREG ZPR:$vec, zsub)), VectorIndexB:$index))>;		(i64 (SMOVvi8to64 (v16i8 (EXTRACT_SUBREG ZPR:$vec, zsub)), VectorIndexB:$index))>;

		sdesmalenUnsubmitted Done Reply Inline Actions nit: Can you change this into: let Predicates = [NotInStreamingSVEMode], AddedComplexity = 1 in { def : Pat<...> .. } let Predicates = [NotInStreamingSVEMode] in { def : Pat<..> ... } Rather than indenting? sdesmalen: nit: Can you change this into: let Predicates = [NotInStreamingSVEMode], AddedComplexity = 1…
def : Pat<(sext_inreg (vector_extract (nxv8i16 ZPR:$vec), VectorIndexH:$index), i16),		def : Pat<(sext_inreg (vector_extract (nxv8i16 ZPR:$vec), VectorIndexH:$index), i16),
(i32 (SMOVvi16to32 (v8i16 (EXTRACT_SUBREG ZPR:$vec, zsub)), VectorIndexH:$index))>;		(i32 (SMOVvi16to32 (v8i16 (EXTRACT_SUBREG ZPR:$vec, zsub)), VectorIndexH:$index))>;
def : Pat<(sext_inreg (anyext (vector_extract (nxv8i16 ZPR:$vec), VectorIndexH:$index)), i16),		def : Pat<(sext_inreg (anyext (vector_extract (nxv8i16 ZPR:$vec), VectorIndexH:$index)), i16),
(i64 (SMOVvi16to64 (v8i16 (EXTRACT_SUBREG ZPR:$vec, zsub)), VectorIndexH:$index))>;		(i64 (SMOVvi16to64 (v8i16 (EXTRACT_SUBREG ZPR:$vec, zsub)), VectorIndexH:$index))>;

def : Pat<(sext (vector_extract (nxv4i32 ZPR:$vec), VectorIndexS:$index)),		def : Pat<(sext (vector_extract (nxv4i32 ZPR:$vec), VectorIndexS:$index)),
(i64 (SMOVvi32to64 (v4i32 (EXTRACT_SUBREG ZPR:$vec, zsub)), VectorIndexS:$index))>;		(i64 (SMOVvi32to64 (v4i32 (EXTRACT_SUBREG ZPR:$vec, zsub)), VectorIndexS:$index))>;
		}
		david-armUnsubmitted Done Reply Inline Actions } // End NotInStreamingSVEMode david-arm: } // End NotInStreamingSVEMode
// Extract first element from vector.		// Extract first element from vector.
let AddedComplexity = 2 in {		let AddedComplexity = 2 in {
def : Pat<(vector_extract (nxv16i8 ZPR:$Zs), (i64 0)),		def : Pat<(vector_extract (nxv16i8 ZPR:$Zs), (i64 0)),
(i32 (EXTRACT_SUBREG ZPR:$Zs, ssub))>;		(i32 (EXTRACT_SUBREG ZPR:$Zs, ssub))>;
def : Pat<(vector_extract (nxv8i16 ZPR:$Zs), (i64 0)),		def : Pat<(vector_extract (nxv8i16 ZPR:$Zs), (i64 0)),
(i32 (EXTRACT_SUBREG ZPR:$Zs, ssub))>;		(i32 (EXTRACT_SUBREG ZPR:$Zs, ssub))>;
def : Pat<(vector_extract (nxv4i32 ZPR:$Zs), (i64 0)),		def : Pat<(vector_extract (nxv4i32 ZPR:$Zs), (i64 0)),
(i32 (EXTRACT_SUBREG ZPR:$Zs, ssub))>;		(i32 (EXTRACT_SUBREG ZPR:$Zs, ssub))>;
▲ Show 20 Lines • Show All 491 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-loads.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -force-streaming-compatible-sve < %s \| FileCheck %s			; RUN: llc -force-streaming-compatible-sve < %s \| FileCheck %s

	target triple = "aarch64-unknown-linux-gnu"			target triple = "aarch64-unknown-linux-gnu"

	define <4 x i8> @load_v4i8(<4 x i8>* %a) #0 {			define <4 x i8> @load_v4i8(<4 x i8>* %a) #0 {
	; CHECK-LABEL: load_v4i8:			; CHECK-LABEL: load_v4i8:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ptrue p0.h, vl4			; CHECK-NEXT: ldr s0, [x0]
	; CHECK-NEXT: ld1b { z0.h }, p0/z, [x0]			; CHECK-NEXT: uunpklo z0.h, z0.b
	; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0			; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%load = load <4 x i8>, <4 x i8>* %a			%load = load <4 x i8>, <4 x i8>* %a
	ret <4 x i8> %load			ret <4 x i8> %load
	}			}

	define <8 x i8> @load_v8i8(<8 x i8>* %a) #0 {			define <8 x i8> @load_v8i8(<8 x i8>* %a) #0 {
	; CHECK-LABEL: load_v8i8:			; CHECK-LABEL: load_v8i8:
	Show All 20 Lines
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%load = load <32 x i8>, <32 x i8>* %a			%load = load <32 x i8>, <32 x i8>* %a
	ret <32 x i8> %load			ret <32 x i8> %load
	}			}

	define <2 x i16> @load_v2i16(<2 x i16>* %a) #0 {			define <2 x i16> @load_v2i16(<2 x i16>* %a) #0 {
	; CHECK-LABEL: load_v2i16:			; CHECK-LABEL: load_v2i16:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
				; CHECK-NEXT: sub sp, sp, #16
				; CHECK-NEXT: .cfi_def_cfa_offset 16
	; CHECK-NEXT: ldrh w8, [x0, #2]			; CHECK-NEXT: ldrh w8, [x0, #2]
	; CHECK-NEXT: ldrh w9, [x0]			; CHECK-NEXT: str w8, [sp, #12]
	; CHECK-NEXT: fmov s0, w8			; CHECK-NEXT: ldrh w8, [x0]
	; CHECK-NEXT: fmov s1, w9			; CHECK-NEXT: str w8, [sp, #8]
	; CHECK-NEXT: zip1 z0.s, z1.s, z0.s			; CHECK-NEXT: ldr d0, [sp, #8]
	; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0			; CHECK-NEXT: add sp, sp, #16
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%load = load <2 x i16>, <2 x i16>* %a			%load = load <2 x i16>, <2 x i16>* %a
	ret <2 x i16> %load			ret <2 x i16> %load
	}			}

	define <2 x half> @load_v2f16(<2 x half>* %a) #0 {			define <2 x half> @load_v2f16(<2 x half>* %a) #0 {
	; CHECK-LABEL: load_v2f16:			; CHECK-LABEL: load_v2f16:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	▲ Show 20 Lines • Show All 170 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-stores.ll

Show First 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	; CHECK-NEXT: ret
ret void		ret void
}		}

define void @store_v2f16(<2 x half>* %a) #0 {		define void @store_v2f16(<2 x half>* %a) #0 {
; CHECK-LABEL: store_v2f16:		; CHECK-LABEL: store_v2f16:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: adrp x8, .LCPI5_0		; CHECK-NEXT: adrp x8, .LCPI5_0
; CHECK-NEXT: ldr d0, [x8, :lo12:.LCPI5_0]		; CHECK-NEXT: ldr d0, [x8, :lo12:.LCPI5_0]
; CHECK-NEXT: str s0, [x0]		; CHECK-NEXT: fmov w8, s0
		; CHECK-NEXT: str w8, [x0]
; CHECK-NEXT: ret		; CHECK-NEXT: ret
store <2 x half> zeroinitializer, <2 x half>* %a		store <2 x half> zeroinitializer, <2 x half>* %a
ret void		ret void
}		}

define void @store_v4i16(<4 x i16>* %a) #0 {		define void @store_v4i16(<4 x i16>* %a) #0 {
; CHECK-LABEL: store_v4i16:		; CHECK-LABEL: store_v4i16:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
▲ Show 20 Lines • Show All 182 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64-SVE]: Force generating code compatible to streaming mode.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 469638

llvm/lib/Target/AArch64/AArch64ISelLowering.h

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td

llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-loads.ll

llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-stores.ll

[AArch64-SVE]: Force generating code compatible to streaming mode.
ClosedPublic