Download Raw Diff

Details

Reviewers

dmgreen
efriedma
t.p.northover

Commits

rG8bad7ad6d6cb: [AArch64] Reuse larger DUPLANE if available

Summary

As combining DUP, try to reuse larger DUPLANELANE.

Diff Detail

Unit TestsFailed

	Time	Test
	60,060 ms	x64 debian > ThreadSanitizer-x86_64.ThreadSanitizer-x86_64::restore_stack.cpp

Event Timeline

jaykang10 created this revision.Jul 18 2023, 6:00 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 18 2023, 6:00 AM

Herald added subscribers: hiraditya, kristof.beyls. · View Herald Transcript

jaykang10 requested review of this revision.Jul 18 2023, 6:00 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 18 2023, 6:00 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

dmgreen added inline comments.Jul 18 2023, 7:48 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
22119–22122	Should this be excluded for everything that isn't DUP?
llvm/lib/Target/AArch64/AArch64InstrInfo.td
6950 ↗	(On Diff #541489)	Would this be needed for all the "AdvSIMD indexed element" operations? Is there some way to make that fairly generic? Maybe a PatFrag that matches either `v4i16 (AArch64duplane16(..` or `extract_subvector (AArch64duplane16 (..`, that can be used in the instruction patterns like those in SIMDVectorIndexedHSTied.

jaykang10 added inline comments.Jul 18 2023, 8:27 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
22119–22122	I do not know how the `performPostLD1Combine` function works in detail but the function checks whether the input node is `ISD::LOAD`. If it is not `ISD::LOAD`, the function returns `SDValue()`. To be safe, let me update code which do not execute the `performPostLD1Combine` function with DUPLANE.
llvm/lib/Target/AArch64/AArch64InstrInfo.td
6950 ↗	(On Diff #541489)	I was not just sure the patterns are needed for all `SIMD indexed` operations. If it is ok, let me move the patterns into `SIMDIndexedLongSD`.

jaykang10 added inline comments.Jul 18 2023, 8:55 AM

llvm/lib/Target/AArch64/AArch64InstrInfo.td

6950 ↗

(On Diff #541489)

Additionally, when I added def v2i32_indexed_low in SIMDVectorIndexedLongSD, I got Decoding conflict as below because it causes same encoding so I added Pats.

Decoding Conflict:
		0000111110......1010.0..........
		0000111110......1010............
		0000111110......................
		...0111110......................
		...011..........................
		................................
	SMULLv2i32_indexed 0000111110______1010_0__________
	SMULLv2i32_indexed_low 0000111110______1010_0__________
Decoding Conflict:
		0010111110......1010.0..........
		0010111110......1010............
		0010111110......................
		...0111110......................
		...011..........................
		................................
	UMULLv2i32_indexed 0010111110______1010_0__________
	UMULLv2i32_indexed_low 0010111110______1010_0__________

Following @dmgreen's comment, updated patch.

Harbormaster completed remote builds in B246268: Diff 541591.Jul 18 2023, 4:24 PM

Could we add a PatFrags that looks something like this:

def dup_v8i16 : PatFrags<(ops node:$LHS, node:$RHS),
                         [(v4i16 (extract_subvector (v8i16 (AArch64duplane16 (v8i16 node:$LHS), node:$RHS)), (i64 0))),
                          (v4i16 (AArch64duplane16 (v8i16 node:$LHS), node:$RHS))]>;

So that it matches either a v4i16 DUPLANE, or a subvector_extract of a v8i16 DUPLANE. Maybe call it something like AArch64duplanev416, I'm not sure. It could then be used in the existing set patterns for all the lane instructions. I think many of them might hit the same problems we see in the tests, but don't happen to have tests for every instruction.

[(set (v4i32 V128:$Rd),
    (OpNode (v4i16 V64:$Rn),
            (dup_v8i16 (v8i16 V128_lo:$Rm), VectorIndexH:$idx)))]> {

In D155592#4514888, @dmgreen wrote:
Could we add a PatFrags that looks something like this:
def dup_v8i16 : PatFrags<(ops node:$LHS, node:$RHS),
                         [(v4i16 (extract_subvector (v8i16 (AArch64duplane16 (v8i16 node:$LHS), node:$RHS)), (i64 0))),
                          (v4i16 (AArch64duplane16 (v8i16 node:$LHS), node:$RHS))]>;
So that it matches either a v4i16 DUPLANE, or a subvector_extract of a v8i16 DUPLANE. Maybe call it something like AArch64duplanev416, I'm not sure. It could then be used in the existing set patterns for all the lane instructions. I think many of them might hit the same problems we see in the tests, but don't happen to have tests for every instruction.
[(set (v4i32 V128:$Rd),
    (OpNode (v4i16 V64:$Rn),
            (dup_v8i16 (v8i16 V128_lo:$Rm), VectorIndexH:$idx)))]> {

Yep, I agree with you. If possible, I did not want to touch existing patterns because it could cause other regressions.
Let me update the patterns with Patfrags.

Following @dmgreen's comment, updated patterns with PatFrags.

Harbormaster completed remote builds in B246584: Diff 542072.Jul 19 2023, 11:08 AM

Thanks. Looks pretty good. Do we need to handle the other "indexing" operations in the same way? For example something like this below, which is for fmul. I would guess that the extending operations (umull) are more likely to see the problem, but you can imagine the others in the "AdvSIMD indexed element" section of AArch64InstrInfo.td might all be better using dup_v8i16 now.

define <4 x float> @sel.v8i16(ptr %p, ptr %q, <4 x float> %a, <4 x float> %b, <2 x float> %c) {
  %splat = shufflevector <4 x float> %a, <4 x float> poison, <4 x i32> zeroinitializer
  %splat2 = shufflevector <4 x float> %a, <4 x float> poison, <2 x i32> zeroinitializer
  
  %r = fmul <4 x float> %b, %splat
  %r2 = fmul <2 x float> %c, %splat2
  store <2 x float> %r2, ptr %p
  ret <4 x float> %r
}

In D155592#4517823, @dmgreen wrote:
Thanks. Looks pretty good. Do we need to handle the other "indexing" operations in the same way? For example something like this below, which is for fmul. I would guess that the extending operations (umull) are more likely to see the problem, but you can imagine the others in the "AdvSIMD indexed element" section of AArch64InstrInfo.td might all be better using dup_v8i16 now.
define <4 x float> @sel.v8i16(ptr %p, ptr %q, <4 x float> %a, <4 x float> %b, <2 x float> %c) {
  %splat = shufflevector <4 x float> %a, <4 x float> poison, <4 x i32> zeroinitializer
  %splat2 = shufflevector <4 x float> %a, <4 x float> poison, <2 x i32> zeroinitializer
  
  %r = fmul <4 x float> %b, %splat
  %r2 = fmul <2 x float> %c, %splat2
  store <2 x float> %r2, ptr %p
  ret <4 x float> %r
}

I have added dup_v8i16 and dup_v4i32 into SIMDVectorIndexedLongSD and SIMDIndexedLongSD multiclass.
The umull uses SIMDVectorIndexedLongSD like smull so it will be matched with the dup_v8i6 and dup_v4i32.
Let me check the fmul example and 64-bits AArch64duplane with VectorIndexS:$idx.

Following @dmgreen's comment, updated more patterns with PatFrags.

Cheers. LGTM, but can you try and add some tests for some of the other cases, maybe based on the code in the comment above. Just in case we need to change it in the future again, we will notice they are not as good. Thanks.

This revision is now accepted and ready to land.Jul 20 2023, 5:13 AM

In D155592#4518396, @dmgreen wrote:

Cheers. LGTM, but can you try and add some tests for some of the other cases, maybe based on the code in the comment above. Just in case we need to change it in the future again, we will notice they are not as good. Thanks.

Yep, let me add some test.

Harbormaster completed remote builds in B246848: Diff 542419.Jul 20 2023, 6:35 AM

Added tests

Herald added a subscriber: arphaman. · View Herald TranscriptJul 20 2023, 7:48 AM

This revision was landed with ongoing or failed builds.Jul 20 2023, 7:59 AM

Closed by commit rG8bad7ad6d6cb: [AArch64] Reuse larger DUPLANE if available (authored by jaykang10). · Explain Why

This revision was automatically updated to reflect the committed changes.

jaykang10 added a commit: rG8bad7ad6d6cb: [AArch64] Reuse larger DUPLANE if available.

Harbormaster completed remote builds in B246921: Diff 542520.Jul 20 2023, 10:52 AM

Diff 541591

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 22,101 Lines • ▼ Show 20 Lines

static SDValue performDUPCombine(SDNode *N,		static SDValue performDUPCombine(SDNode *N,
TargetLowering::DAGCombinerInfo &DCI) {		TargetLowering::DAGCombinerInfo &DCI) {
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
// If "v2i32 DUP(x)" and "v4i32 DUP(x)" both exist, use an extract from the		// If "v2i32 DUP(x)" and "v4i32 DUP(x)" both exist, use an extract from the
// 128bit vector version.		// 128bit vector version.
if (VT.is64BitVector() && DCI.isAfterLegalizeDAG()) {		if (VT.is64BitVector() && DCI.isAfterLegalizeDAG()) {
EVT LVT = VT.getDoubleNumVectorElementsVT(*DCI.DAG.getContext());		EVT LVT = VT.getDoubleNumVectorElementsVT(*DCI.DAG.getContext());
if (SDNode *LN = DCI.DAG.getNodeIfExists(		SmallVector<SDValue> Ops(N->ops());
N->getOpcode(), DCI.DAG.getVTList(LVT), {N->getOperand(0)})) {		if (SDNode *LN = DCI.DAG.getNodeIfExists(N->getOpcode(),
		DCI.DAG.getVTList(LVT), Ops)) {
SDLoc DL(N);		SDLoc DL(N);
return DCI.DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, VT, SDValue(LN, 0),		return DCI.DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, VT, SDValue(LN, 0),
DCI.DAG.getConstant(0, DL, MVT::i64));		DCI.DAG.getConstant(0, DL, MVT::i64));
}		}
}		}

		if (N->getOpcode() == AArch64ISD::DUP)
return performPostLD1Combine(N, DCI, false);		return performPostLD1Combine(N, DCI, false);

		return SDValue();
		dmgreenUnsubmitted Not Done Reply Inline Actions Should this be excluded for everything that isn't DUP? dmgreen: Should this be excluded for everything that isn't DUP?
		jaykang10AuthorUnsubmitted Done Reply Inline Actions I do not know how the `performPostLD1Combine` function works in detail but the function checks whether the input node is `ISD::LOAD`. If it is not `ISD::LOAD`, the function returns `SDValue()`. To be safe, let me update code which do not execute the `performPostLD1Combine` function with DUPLANE. jaykang10: I do not know how the `performPostLD1Combine` function works in detail but the function checks…
}		}

/// Get rid of unnecessary NVCASTs (that don't change the type).		/// Get rid of unnecessary NVCASTs (that don't change the type).
static SDValue performNVCASTCombine(SDNode *N) {		static SDValue performNVCASTCombine(SDNode *N) {
if (N->getValueType(0) == N->getOperand(0).getValueType())		if (N->getValueType(0) == N->getOperand(0).getValueType())
return N->getOperand(0);		return N->getOperand(0);

return SDValue();		return SDValue();
▲ Show 20 Lines • Show All 911 Lines • ▼ Show 20 Lines	SDValue AArch64TargetLowering::PerformDAGCombine(SDNode *N,
case AArch64ISD::BRCOND:		case AArch64ISD::BRCOND:
return performBRCONDCombine(N, DCI, DAG);		return performBRCONDCombine(N, DCI, DAG);
case AArch64ISD::TBNZ:		case AArch64ISD::TBNZ:
case AArch64ISD::TBZ:		case AArch64ISD::TBZ:
return performTBZCombine(N, DCI, DAG);		return performTBZCombine(N, DCI, DAG);
case AArch64ISD::CSEL:		case AArch64ISD::CSEL:
return performCSELCombine(N, DCI, DAG);		return performCSELCombine(N, DCI, DAG);
case AArch64ISD::DUP:		case AArch64ISD::DUP:
		case AArch64ISD::DUPLANE8:
		case AArch64ISD::DUPLANE16:
		case AArch64ISD::DUPLANE32:
		case AArch64ISD::DUPLANE64:
return performDUPCombine(N, DCI);		return performDUPCombine(N, DCI);
case AArch64ISD::DUPLANE128:		case AArch64ISD::DUPLANE128:
return performDupLane128Combine(N, DAG);		return performDupLane128Combine(N, DAG);
case AArch64ISD::NVCAST:		case AArch64ISD::NVCAST:
return performNVCASTCombine(N);		return performNVCASTCombine(N);
case AArch64ISD::SPLICE:		case AArch64ISD::SPLICE:
return performSpliceCombine(N, DAG);		return performSpliceCombine(N, DAG);
case AArch64ISD::UUNPKLO:		case AArch64ISD::UUNPKLO:
▲ Show 20 Lines • Show All 2,932 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64InstrFormats.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 131 Lines • ▼ Show 20 Lines	def extract_high_v4i32 :
ComplexPattern<v2i32, 1, "SelectExtractHigh", [extract_subvector, bitconvert]>;		ComplexPattern<v2i32, 1, "SelectExtractHigh", [extract_subvector, bitconvert]>;
def extract_high_v2i64 :		def extract_high_v2i64 :
ComplexPattern<v1i64, 1, "SelectExtractHigh", [extract_subvector, bitconvert]>;		ComplexPattern<v1i64, 1, "SelectExtractHigh", [extract_subvector, bitconvert]>;

def extract_high_dup_v8i16 :		def extract_high_dup_v8i16 :
BinOpFrag<(extract_subvector (v8i16 (AArch64duplane16 (v8i16 node:$LHS), node:$RHS)), (i64 4))>;		BinOpFrag<(extract_subvector (v8i16 (AArch64duplane16 (v8i16 node:$LHS), node:$RHS)), (i64 4))>;
def extract_high_dup_v4i32 :		def extract_high_dup_v4i32 :
BinOpFrag<(extract_subvector (v4i32 (AArch64duplane32 (v4i32 node:$LHS), node:$RHS)), (i64 2))>;		BinOpFrag<(extract_subvector (v4i32 (AArch64duplane32 (v4i32 node:$LHS), node:$RHS)), (i64 2))>;
		def extract_low_dup_v8i16 :
		BinOpFrag<(extract_subvector (v8i16 (AArch64duplane16 (v8i16 node:$LHS), node:$RHS)), (i64 0))>;
		def extract_low_dup_v4i32 :
		BinOpFrag<(extract_subvector (v4i32 (AArch64duplane32 (v4i32 node:$LHS), node:$RHS)), (i64 0))>;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Asm Operand Classes.		// Asm Operand Classes.
//		//

// Shifter operand for arithmetic shifted encodings.		// Shifter operand for arithmetic shifted encodings.
def ShifterOperand : AsmOperandClass {		def ShifterOperand : AsmOperandClass {
let Name = "Shifter";		let Name = "Shifter";
▲ Show 20 Lines • Show All 8,868 Lines • ▼ Show 20 Lines	multiclass SIMDIndexedLongSD<bit U, bits<4> opc, string asm,

def v1i64_indexed : BaseSIMDIndexed<1, U, 1, 0b10, opc,		def v1i64_indexed : BaseSIMDIndexed<1, U, 1, 0b10, opc,
FPR64Op, FPR32Op, V128, VectorIndexS,		FPR64Op, FPR32Op, V128, VectorIndexS,
asm, ".s", "", "", ".s", []> {		asm, ".s", "", "", ".s", []> {
bits<2> idx;		bits<2> idx;
let Inst{11} = idx{1};		let Inst{11} = idx{1};
let Inst{21} = idx{0};		let Inst{21} = idx{0};
}		}

		def : Pat<(v4i32 (OpNode (v4i16 V64:$Rn),
		(extract_low_dup_v8i16 (v8i16 V128:$Rm), VectorIndexS:$idx))),
		(!cast<Instruction>(NAME # "v4i16_indexed") V64:$Rn, V128:$Rm, VectorIndexS:$idx)>;
		def : Pat<(v2i64 (OpNode (v2i32 V64:$Rn),
		(extract_low_dup_v4i32 (v4i32 V128:$Rm), VectorIndexS:$idx))),
		(!cast<Instruction>(NAME # "v2i32_indexed") V64:$Rn, V128:$Rm, VectorIndexS:$idx)>;
}		}

multiclass SIMDIndexedLongSQDMLXSDTied<bit U, bits<4> opc, string asm,		multiclass SIMDIndexedLongSQDMLXSDTied<bit U, bits<4> opc, string asm,
SDPatternOperator Accum> {		SDPatternOperator Accum> {
def v4i16_indexed : BaseSIMDIndexedTied<0, U, 0, 0b01, opc,		def v4i16_indexed : BaseSIMDIndexedTied<0, U, 0, 0b01, opc,
V128, V64,		V128, V64,
V128_lo, VectorIndexH,		V128_lo, VectorIndexH,
asm, ".4s", ".4s", ".4h", ".h",		asm, ".4s", ".4s", ".4h", ".h",
▲ Show 20 Lines • Show All 101 Lines • ▼ Show 20 Lines	def v1i64_indexed : BaseSIMDIndexedTied<1, U, 1, 0b10, opc,
bits<2> idx;		bits<2> idx;
let Inst{11} = idx{1};		let Inst{11} = idx{1};
let Inst{21} = idx{0};		let Inst{21} = idx{0};
}		}
}		}

multiclass SIMDVectorIndexedLongSD<bit U, bits<4> opc, string asm,		multiclass SIMDVectorIndexedLongSD<bit U, bits<4> opc, string asm,
SDPatternOperator OpNode> {		SDPatternOperator OpNode> {
let mayLoad = 0, mayStore = 0, hasSideEffects = 0 in {
def v4i16_indexed : BaseSIMDIndexed<0, U, 0, 0b01, opc,		def v4i16_indexed : BaseSIMDIndexed<0, U, 0, 0b01, opc,
V128, V64,		V128, V64,
V128_lo, VectorIndexH,		V128_lo, VectorIndexH,
asm, ".4s", ".4s", ".4h", ".h",		asm, ".4s", ".4s", ".4h", ".h",
[(set (v4i32 V128:$Rd),		[(set (v4i32 V128:$Rd),
(OpNode (v4i16 V64:$Rn),		(OpNode (v4i16 V64:$Rn),
(v4i16 (AArch64duplane16 (v8i16 V128_lo:$Rm), VectorIndexH:$idx))))]> {		(v4i16 (AArch64duplane16 (v8i16 V128_lo:$Rm), VectorIndexH:$idx))))]> {
bits<3> idx;		bits<3> idx;
let Inst{11} = idx{2};		let Inst{11} = idx{2};
let Inst{21} = idx{1};		let Inst{21} = idx{1};
let Inst{20} = idx{0};		let Inst{20} = idx{0};
		let mayLoad = 0;
		let mayStore = 0;
		let hasSideEffects = 0;
}		}

def v8i16_indexed : BaseSIMDIndexed<1, U, 0, 0b01, opc,		def v8i16_indexed : BaseSIMDIndexed<1, U, 0, 0b01, opc,
V128, V128,		V128, V128,
V128_lo, VectorIndexH,		V128_lo, VectorIndexH,
asm#"2", ".4s", ".4s", ".8h", ".h",		asm#"2", ".4s", ".4s", ".8h", ".h",
[(set (v4i32 V128:$Rd),		[(set (v4i32 V128:$Rd),
(OpNode (extract_high_v8i16 (v8i16 V128:$Rn)),		(OpNode (extract_high_v8i16 (v8i16 V128:$Rn)),
(extract_high_dup_v8i16 (v8i16 V128_lo:$Rm), VectorIndexH:$idx)))]> {		(extract_high_dup_v8i16 (v8i16 V128_lo:$Rm), VectorIndexH:$idx)))]> {

bits<3> idx;		bits<3> idx;
let Inst{11} = idx{2};		let Inst{11} = idx{2};
let Inst{21} = idx{1};		let Inst{21} = idx{1};
let Inst{20} = idx{0};		let Inst{20} = idx{0};
		let mayLoad = 0;
		let mayStore = 0;
		let hasSideEffects = 0;
}		}

def v2i32_indexed : BaseSIMDIndexed<0, U, 0, 0b10, opc,		def v2i32_indexed : BaseSIMDIndexed<0, U, 0, 0b10, opc,
V128, V64,		V128, V64,
V128, VectorIndexS,		V128, VectorIndexS,
asm, ".2d", ".2d", ".2s", ".s",		asm, ".2d", ".2d", ".2s", ".s",
[(set (v2i64 V128:$Rd),		[(set (v2i64 V128:$Rd),
(OpNode (v2i32 V64:$Rn),		(OpNode (v2i32 V64:$Rn),
(v2i32 (AArch64duplane32 (v4i32 V128:$Rm), VectorIndexS:$idx))))]> {		(v2i32 (AArch64duplane32 (v4i32 V128:$Rm), VectorIndexS:$idx))))]> {
bits<2> idx;		bits<2> idx;
let Inst{11} = idx{1};		let Inst{11} = idx{1};
let Inst{21} = idx{0};		let Inst{21} = idx{0};
		let mayLoad = 0;
		let mayStore = 0;
		let hasSideEffects = 0;
}		}

def v4i32_indexed : BaseSIMDIndexed<1, U, 0, 0b10, opc,		def v4i32_indexed : BaseSIMDIndexed<1, U, 0, 0b10, opc,
V128, V128,		V128, V128,
V128, VectorIndexS,		V128, VectorIndexS,
asm#"2", ".2d", ".2d", ".4s", ".s",		asm#"2", ".2d", ".2d", ".4s", ".s",
[(set (v2i64 V128:$Rd),		[(set (v2i64 V128:$Rd),
(OpNode (extract_high_v4i32 (v4i32 V128:$Rn)),		(OpNode (extract_high_v4i32 (v4i32 V128:$Rn)),
(extract_high_dup_v4i32 (v4i32 V128:$Rm), VectorIndexS:$idx)))]> {		(extract_high_dup_v4i32 (v4i32 V128:$Rm), VectorIndexS:$idx)))]> {
bits<2> idx;		bits<2> idx;
let Inst{11} = idx{1};		let Inst{11} = idx{1};
let Inst{21} = idx{0};		let Inst{21} = idx{0};
		let mayLoad = 0;
		let mayStore = 0;
		let hasSideEffects = 0;
}		}
}
		def : Pat<(v4i32 (OpNode (v4i16 V64:$Rn),
		(extract_low_dup_v8i16 (v8i16 V128:$Rm), VectorIndexS:$idx))),
		(!cast<Instruction>(NAME # "v4i16_indexed") V64:$Rn, V128:$Rm, VectorIndexS:$idx)>;
		def : Pat<(v2i64 (OpNode (v2i32 V64:$Rn),
		(extract_low_dup_v4i32 (v4i32 V128:$Rm), VectorIndexS:$idx))),
		(!cast<Instruction>(NAME # "v2i32_indexed") V64:$Rn, V128:$Rm, VectorIndexS:$idx)>;
}		}

multiclass SIMDVectorIndexedLongSDTied<bit U, bits<4> opc, string asm,		multiclass SIMDVectorIndexedLongSDTied<bit U, bits<4> opc, string asm,
SDPatternOperator OpNode> {		SDPatternOperator OpNode> {
let mayLoad = 0, mayStore = 0, hasSideEffects = 0 in {		let mayLoad = 0, mayStore = 0, hasSideEffects = 0 in {
def v4i16_indexed : BaseSIMDIndexedTied<0, U, 0, 0b01, opc,		def v4i16_indexed : BaseSIMDIndexedTied<0, U, 0, 0b01, opc,
V128, V64,		V128, V64,
V128_lo, VectorIndexH,		V128_lo, VectorIndexH,
▲ Show 20 Lines • Show All 2,965 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll

	Show First 20 Lines • Show All 651 Lines • ▼ Show 20 Lines

	exit:			exit:
	ret void			ret void
	}			}

	define void @sink_v16s16_8(i32 %p, i32 %d, i64 %n, <16 x i8> %a) {			define void @sink_v16s16_8(i32 %p, i32 %d, i64 %n, <16 x i8> %a) {
	; CHECK-LABEL: sink_v16s16_8:			; CHECK-LABEL: sink_v16s16_8:
	; CHECK: // %bb.0: // %entry			; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: dup v1.8b, v0.b[10]
	; CHECK-NEXT: mov x8, xzr			; CHECK-NEXT: mov x8, xzr
	; CHECK-NEXT: dup v0.16b, v0.b[10]			; CHECK-NEXT: dup v0.16b, v0.b[10]
	; CHECK-NEXT: .LBB9_1: // %loop			; CHECK-NEXT: .LBB9_1: // %loop
	; CHECK-NEXT: // =>This Inner Loop Header: Depth=1			; CHECK-NEXT: // =>This Inner Loop Header: Depth=1
	; CHECK-NEXT: ldr q2, [x0]			; CHECK-NEXT: ldr q1, [x0]
	; CHECK-NEXT: add x8, x8, #8			; CHECK-NEXT: add x8, x8, #8
	; CHECK-NEXT: subs x2, x2, #8			; CHECK-NEXT: subs x2, x2, #8
	; CHECK-NEXT: smull2 v3.8h, v2.16b, v0.16b			; CHECK-NEXT: smull2 v2.8h, v1.16b, v0.16b
	; CHECK-NEXT: smull v2.8h, v2.8b, v1.8b			; CHECK-NEXT: smull v1.8h, v1.8b, v0.8b
	; CHECK-NEXT: cmlt v3.8h, v3.8h, #0
	; CHECK-NEXT: cmlt v2.8h, v2.8h, #0			; CHECK-NEXT: cmlt v2.8h, v2.8h, #0
	; CHECK-NEXT: uzp1 v2.16b, v2.16b, v3.16b			; CHECK-NEXT: cmlt v1.8h, v1.8h, #0
	; CHECK-NEXT: str q2, [x0], #32			; CHECK-NEXT: uzp1 v1.16b, v1.16b, v2.16b
				; CHECK-NEXT: str q1, [x0], #32
	; CHECK-NEXT: b.ne .LBB9_1			; CHECK-NEXT: b.ne .LBB9_1
	; CHECK-NEXT: // %bb.2: // %exit			; CHECK-NEXT: // %bb.2: // %exit
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%ext = sext <16 x i8> %a to <16 x i16>			%ext = sext <16 x i8> %a to <16 x i16>
	%broadcast.splat = shufflevector <16 x i16> %ext, <16 x i16> poison, <16 x i32> <i32 10, i32 10, i32 10, i32 10, i32 10, i32 10, i32 10, i32 10, i32 10, i32 10, i32 10, i32 10, i32 10, i32 10, i32 10, i32 10>			%broadcast.splat = shufflevector <16 x i16> %ext, <16 x i16> poison, <16 x i32> <i32 10, i32 10, i32 10, i32 10, i32 10, i32 10, i32 10, i32 10, i32 10, i32 10, i32 10, i32 10, i32 10, i32 10, i32 10, i32 10>
	br label %loop			br label %loop

	▲ Show 20 Lines • Show All 292 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Reuse larger DUPLANE if available
ClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 541591

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/lib/Target/AArch64/AArch64InstrFormats.td

llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Reuse larger DUPLANE if availableClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 541591

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/lib/Target/AArch64/AArch64InstrFormats.td

llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll

[AArch64] Reuse larger DUPLANE if available
ClosedPublic