This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
4
AArch64Subtarget.cpp
-
test/Analysis/CostModel/AArch64/
-
Analysis/
-
CostModel/
-
AArch64/
-
vector-extract.ll

Differential D132185

[TTI][AArch64] Update vector extract cost for Neoverse-N1.
AbandonedPublic

Authored by vporpo on Aug 18 2022, 3:52 PM.

Download Raw Diff

Details

Reviewers

dmgreen
david-arm
fhahn
peterwaller-arm

Summary

According to the ARM Neoverse-N1 Software Optimization Guide the
extract instructions have a latency of 2 and a throughput of 2.
TTI returns a cost of 3, which seems too high.

PEXTR in x86 has a latency of 3 and throughput of 1, according to
https://www.agner.org/optimize/instruction_tables.pdf for Skylake,
yet TTI returns 1.

This patch sets the vector extract cost to 1.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

vporpo created this revision.Aug 18 2022, 3:52 PM

Herald added a project: Restricted Project. · View Herald TranscriptAug 18 2022, 3:52 PM

Herald added subscribers: pengfei, hiraditya, kristof.beyls. · View Herald Transcript

vporpo requested review of this revision.Aug 18 2022, 3:52 PM

Herald added a project: Restricted Project. · View Herald TranscriptAug 18 2022, 3:52 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B182107: Diff 453823.Aug 18 2022, 3:52 PM

mingmingl added a subscriber: mingmingl.Aug 18 2022, 7:11 PM

mingmingl added inline comments.

llvm/lib/Target/AArch64/AArch64Subtarget.cpp
191	nittest nit: this changes cost for both extract and insert, while summary mostly mentions EXT instruction cost. Might be good to call out that INS has a latency of 2 and throughput of 2 (unless it's common assumption that extract and insert instruction have the same cost). Also, from the studies of D128302, I think the cost of extract/insert is better modeled by considering user instruction into account (e.g., if user instruction can access lane directly, extract could be combined into user in emitted code and have no cost). Nevertheless, my gut feeling is that 3 is a high number (for instructions of latency 2 and throughput 2); not sure if 1 is too small.

vporpo added inline comments.Aug 18 2022, 8:21 PM

llvm/lib/Target/AArch64/AArch64Subtarget.cpp
191	Yes, I need to update the description and add a test for the insertelement instructions too. Yeah considering the user instruction is definitely more precise. I think that a cost of 1 may be all right as long as only 1 instruction is needed for the extraction. I think this is the logic in the cost calculation of the extracts in x86: it returns 1 if only 1 instruction is needed, or a higher cost if more instructions are needed.

Hi @vporpo, the change seems sensible to me. Have you built/run any benchmarks to see what effect this change has on neoverse-n1? I imagine it might make a difference to the output from SLPVectorizer, for example.

fhahn added a subscriber: fhahn.Aug 19 2022, 2:25 AM

fhahn added inline comments.

llvm/lib/Target/AArch64/AArch64Subtarget.cpp
191	I'd expect the extract cost to be similar on most recent-ish AArch64 cores. Should this be changed for more cores than just a single one?

I've tried setting the default VectorInsertExtractBaseCost to 2 in the past, but only ever seen performance regressions. From what I remember they were large and persuasive enough to discourage me from considering it any further.

The VectorInsertExtractBaseCost (the getVectorInstrCost) to an extent is not really a measure of the throughput of a vector<>gpr moves. It is more importantly a control of how much vector shuffling you are will to accept at the expense of just using scalar code. On a machine like N1 with 4 scalar pipelines and 2 SIMD, I'm not sure that it makes a lot of sense to more aggressively target SIMD.

The costmodel under AArch64 is pretty rough in places, and codegen is slowly getting better over time, so there is certainly room for improvement, but I think it would take quite a lot to convince me that this is the right way forward.

@david-arm I have not done extensive testing, it just looked strange to me that the cost is so high compared to x86. But I think @dmgreen's point makes sense, so we should probably not change the cost for now.

llvm/lib/Target/AArch64/AArch64Subtarget.cpp
191	Yeah it should be similar, but I think I would agree with @dmgreen .

@dmgreen The extract cost's default value and how the cost is used are two orthogonal things. I suspect that the regression you saw with cost == 2 might be the cause of a hidden bug somewhere else -- in other words, setting it to 3 simply papers over the real issue.

I think this patch is probably the right way to go. An alternative is to introduce an internal option to control the default value.

In D132185#3736062, @davidxl wrote:

@dmgreen The extract cost's default value and how the cost is used are two orthogonal things. I suspect that the regression you saw with cost == 2 might be the cause of a hidden bug somewhere else -- in other words, setting it to 3 simply papers over the real issue.

I think, if I'm understanding what you are saying, that somewhat I agree with you. The mid-end is made up of a lot of imperfect heuristics, with a cost-model that is not exact, and some of the decision made are not always optimal. Setting the cost of extracts to 1 or 2 makes the compiler favour SIMD more than scalar, through the costs via methods like getScalarizationOverhead and getVectorInstrCost. That can be beneficial in places but it can also hurt performance. And in practice, on average over many benchmarks, it seems to hurt performance more than it has helps.

I think this patch is probably the right way to go. An alternative is to introduce an internal option to control the default value.

There was an -aarch64-insert-extract-base-cost option added in D124835. Does that serve your purpose?

@dmgreen yes what you said is a good summary. The option seems good enough for now.

dmgreen mentioned this in D142359: [TTI][AArch64] Cost model vector INS instructions.Jan 24 2023, 7:41 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64Subtarget.cpp

1 line

test/

Analysis/

CostModel/

AArch64/

vector-extract.ll

60 lines

Diff 453823

llvm/lib/Target/AArch64/AArch64Subtarget.cpp

Show First 20 Lines • Show All 182 Lines • ▼ Show 20 Lines	case Kryo:
break;		break;
case NeoverseE1:		case NeoverseE1:
PrefFunctionLogAlignment = 3;		PrefFunctionLogAlignment = 3;
break;		break;
case NeoverseN1:		case NeoverseN1:
PrefFunctionLogAlignment = 4;		PrefFunctionLogAlignment = 4;
PrefLoopLogAlignment = 5;		PrefLoopLogAlignment = 5;
MaxBytesForLoopAlignment = 16;		MaxBytesForLoopAlignment = 16;
		VectorInsertExtractBaseCost = 1;
		mingminglUnsubmitted Not Done Reply Inline Actions nittest nit: this changes cost for both extract and insert, while summary mostly mentions EXT instruction cost. Might be good to call out that INS has a latency of 2 and throughput of 2 (unless it's common assumption that extract and insert instruction have the same cost). Also, from the studies of D128302, I think the cost of extract/insert is better modeled by considering user instruction into account (e.g., if user instruction can access lane directly, extract could be combined into user in emitted code and have no cost). Nevertheless, my gut feeling is that 3 is a high number (for instructions of latency 2 and throughput 2); not sure if 1 is too small. mingmingl: nittest nit: this changes cost for both extract and insert, while summary mostly mentions EXT…
		vporpoAuthorUnsubmitted Not Done Reply Inline Actions Yes, I need to update the description and add a test for the insertelement instructions too. Yeah considering the user instruction is definitely more precise. I think that a cost of 1 may be all right as long as only 1 instruction is needed for the extraction. I think this is the logic in the cost calculation of the extracts in x86: it returns 1 if only 1 instruction is needed, or a higher cost if more instructions are needed. vporpo: Yes, I need to update the description and add a test for the insertelement instructions too.
		fhahnUnsubmitted Not Done Reply Inline Actions I'd expect the extract cost to be similar on most recent-ish AArch64 cores. Should this be changed for more cores than just a single one? fhahn: I'd expect the extract cost to be similar on most recent-ish AArch64 cores. Should this be…
		vporpoAuthorUnsubmitted Not Done Reply Inline Actions Yeah it should be similar, but I think I would agree with @dmgreen . vporpo: Yeah it should be similar, but I think I would agree with @dmgreen .
break;		break;
case NeoverseN2:		case NeoverseN2:
PrefFunctionLogAlignment = 4;		PrefFunctionLogAlignment = 4;
PrefLoopLogAlignment = 5;		PrefLoopLogAlignment = 5;
MaxBytesForLoopAlignment = 16;		MaxBytesForLoopAlignment = 16;
VScaleForTuning = 1;		VScaleForTuning = 1;
break;		break;
case NeoverseV1:		case NeoverseV1:
▲ Show 20 Lines • Show All 214 Lines • Show Last 20 Lines

llvm/test/Analysis/CostModel/AArch64/vector-extract.ll

	; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py
	; RUN: opt < %s -passes="print<cost-model>" 2>&1 -disable-output -mtriple=aarch64 -mcpu=generic \| FileCheck %s --check-prefix=GENERIC			; RUN: opt < %s -passes="print<cost-model>" 2>&1 -disable-output -mtriple=aarch64 -mcpu=generic \| FileCheck %s --check-prefix=GENERIC
	; RUN: opt < %s -passes="print<cost-model>" 2>&1 -disable-output -mtriple=aarch64 -mcpu=neoverse-n1 \| FileCheck %s --check-prefix=NEOVERSE-N1			; RUN: opt < %s -passes="print<cost-model>" 2>&1 -disable-output -mtriple=aarch64 -mcpu=neoverse-n1 \| FileCheck %s --check-prefix=NEOVERSE-N1

	define void @extract_double(<2 x double> %vec) {			define void @extract_double(<2 x double> %vec) {
	; GENERIC-LABEL: 'extract_double'			; GENERIC-LABEL: 'extract_double'
	; GENERIC-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %extract_0 = extractelement <2 x double> %vec, i32 0			; GENERIC-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %extract_0 = extractelement <2 x double> %vec, i32 0
	; GENERIC-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %extract_1 = extractelement <2 x double> %vec, i32 1			; GENERIC-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %extract_1 = extractelement <2 x double> %vec, i32 1
	; GENERIC-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void			; GENERIC-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	;			;
	; NEOVERSE-N1-LABEL: 'extract_double'			; NEOVERSE-N1-LABEL: 'extract_double'
	; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %extract_0 = extractelement <2 x double> %vec, i32 0			; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %extract_0 = extractelement <2 x double> %vec, i32 0
	; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %extract_1 = extractelement <2 x double> %vec, i32 1			; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %extract_1 = extractelement <2 x double> %vec, i32 1
	; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void			; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	;			;
	%extract_0 = extractelement<2 x double> %vec, i32 0			%extract_0 = extractelement<2 x double> %vec, i32 0
	%extract_1 = extractelement<2 x double> %vec, i32 1			%extract_1 = extractelement<2 x double> %vec, i32 1
	ret void			ret void
	}			}

	define void @extract_float(<4 x float> %vec) {			define void @extract_float(<4 x float> %vec) {
	; GENERIC-LABEL: 'extract_float'			; GENERIC-LABEL: 'extract_float'
	; GENERIC-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %extract_0 = extractelement <4 x float> %vec, i32 0			; GENERIC-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %extract_0 = extractelement <4 x float> %vec, i32 0
	; GENERIC-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %extract_1 = extractelement <4 x float> %vec, i32 1			; GENERIC-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %extract_1 = extractelement <4 x float> %vec, i32 1
	; GENERIC-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %extract_2 = extractelement <4 x float> %vec, i32 2			; GENERIC-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %extract_2 = extractelement <4 x float> %vec, i32 2
	; GENERIC-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %extract_3 = extractelement <4 x float> %vec, i32 3			; GENERIC-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %extract_3 = extractelement <4 x float> %vec, i32 3
	; GENERIC-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void			; GENERIC-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	;			;
	; NEOVERSE-N1-LABEL: 'extract_float'			; NEOVERSE-N1-LABEL: 'extract_float'
	; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %extract_0 = extractelement <4 x float> %vec, i32 0			; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %extract_0 = extractelement <4 x float> %vec, i32 0
	; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %extract_1 = extractelement <4 x float> %vec, i32 1			; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %extract_1 = extractelement <4 x float> %vec, i32 1
	; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %extract_2 = extractelement <4 x float> %vec, i32 2			; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %extract_2 = extractelement <4 x float> %vec, i32 2
	; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %extract_3 = extractelement <4 x float> %vec, i32 3			; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %extract_3 = extractelement <4 x float> %vec, i32 3
	; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void			; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	;			;
	%extract_0 = extractelement<4 x float> %vec, i32 0			%extract_0 = extractelement<4 x float> %vec, i32 0
	%extract_1 = extractelement<4 x float> %vec, i32 1			%extract_1 = extractelement<4 x float> %vec, i32 1
	%extract_2 = extractelement<4 x float> %vec, i32 2			%extract_2 = extractelement<4 x float> %vec, i32 2
	%extract_3 = extractelement<4 x float> %vec, i32 3			%extract_3 = extractelement<4 x float> %vec, i32 3
	ret void			ret void
	}			}

	define void @extract_i64(<2 x i64> %vec) {			define void @extract_i64(<2 x i64> %vec) {
	; GENERIC-LABEL: 'extract_i64'			; GENERIC-LABEL: 'extract_i64'
	; GENERIC-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %extract_0 = extractelement <2 x i64> %vec, i32 0			; GENERIC-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %extract_0 = extractelement <2 x i64> %vec, i32 0
	; GENERIC-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %extract_1 = extractelement <2 x i64> %vec, i32 1			; GENERIC-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %extract_1 = extractelement <2 x i64> %vec, i32 1
	; GENERIC-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void			; GENERIC-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	;			;
	; NEOVERSE-N1-LABEL: 'extract_i64'			; NEOVERSE-N1-LABEL: 'extract_i64'
	; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %extract_0 = extractelement <2 x i64> %vec, i32 0			; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %extract_0 = extractelement <2 x i64> %vec, i32 0
	; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %extract_1 = extractelement <2 x i64> %vec, i32 1			; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %extract_1 = extractelement <2 x i64> %vec, i32 1
	; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void			; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	;			;
	%extract_0 = extractelement<2 x i64> %vec, i32 0			%extract_0 = extractelement<2 x i64> %vec, i32 0
	%extract_1 = extractelement<2 x i64> %vec, i32 1			%extract_1 = extractelement<2 x i64> %vec, i32 1
	ret void			ret void
	}			}

	define void @extract_i32(<4 x i32> %vec) {			define void @extract_i32(<4 x i32> %vec) {
	; GENERIC-LABEL: 'extract_i32'			; GENERIC-LABEL: 'extract_i32'
	; GENERIC-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %extract_0 = extractelement <4 x i32> %vec, i32 0			; GENERIC-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %extract_0 = extractelement <4 x i32> %vec, i32 0
	; GENERIC-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %extract_1 = extractelement <4 x i32> %vec, i32 1			; GENERIC-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %extract_1 = extractelement <4 x i32> %vec, i32 1
	; GENERIC-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %extract_2 = extractelement <4 x i32> %vec, i32 2			; GENERIC-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %extract_2 = extractelement <4 x i32> %vec, i32 2
	; GENERIC-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %extract_3 = extractelement <4 x i32> %vec, i32 3			; GENERIC-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %extract_3 = extractelement <4 x i32> %vec, i32 3
	; GENERIC-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void			; GENERIC-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	;			;
	; NEOVERSE-N1-LABEL: 'extract_i32'			; NEOVERSE-N1-LABEL: 'extract_i32'
	; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %extract_0 = extractelement <4 x i32> %vec, i32 0			; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %extract_0 = extractelement <4 x i32> %vec, i32 0
	; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %extract_1 = extractelement <4 x i32> %vec, i32 1			; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %extract_1 = extractelement <4 x i32> %vec, i32 1
	; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %extract_2 = extractelement <4 x i32> %vec, i32 2			; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %extract_2 = extractelement <4 x i32> %vec, i32 2
	; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %extract_3 = extractelement <4 x i32> %vec, i32 3			; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %extract_3 = extractelement <4 x i32> %vec, i32 3
	; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void			; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	;			;
	%extract_0 = extractelement<4 x i32> %vec, i32 0			%extract_0 = extractelement<4 x i32> %vec, i32 0
	%extract_1 = extractelement<4 x i32> %vec, i32 1			%extract_1 = extractelement<4 x i32> %vec, i32 1
	%extract_2 = extractelement<4 x i32> %vec, i32 2			%extract_2 = extractelement<4 x i32> %vec, i32 2
	%extract_3 = extractelement<4 x i32> %vec, i32 3			%extract_3 = extractelement<4 x i32> %vec, i32 3
	ret void			ret void
	}			}

	define void @extract_i16(<8 x i16> %vec) {			define void @extract_i16(<8 x i16> %vec) {
	; GENERIC-LABEL: 'extract_i16'			; GENERIC-LABEL: 'extract_i16'
	; GENERIC-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %extract_0 = extractelement <8 x i16> %vec, i32 0			; GENERIC-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %extract_0 = extractelement <8 x i16> %vec, i32 0
	; GENERIC-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %extract_1 = extractelement <8 x i16> %vec, i32 1			; GENERIC-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %extract_1 = extractelement <8 x i16> %vec, i32 1
	; GENERIC-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %extract_2 = extractelement <8 x i16> %vec, i32 2			; GENERIC-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %extract_2 = extractelement <8 x i16> %vec, i32 2
	; GENERIC-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %extract_3 = extractelement <8 x i16> %vec, i32 3			; GENERIC-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %extract_3 = extractelement <8 x i16> %vec, i32 3
	; GENERIC-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %extract_4 = extractelement <8 x i16> %vec, i32 4			; GENERIC-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %extract_4 = extractelement <8 x i16> %vec, i32 4
	; GENERIC-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %extract_5 = extractelement <8 x i16> %vec, i32 5			; GENERIC-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %extract_5 = extractelement <8 x i16> %vec, i32 5
	; GENERIC-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %extract_6 = extractelement <8 x i16> %vec, i32 6			; GENERIC-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %extract_6 = extractelement <8 x i16> %vec, i32 6
	; GENERIC-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %extract_7 = extractelement <8 x i16> %vec, i32 7			; GENERIC-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %extract_7 = extractelement <8 x i16> %vec, i32 7
	; GENERIC-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void			; GENERIC-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	;			;
	; NEOVERSE-N1-LABEL: 'extract_i16'			; NEOVERSE-N1-LABEL: 'extract_i16'
	; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %extract_0 = extractelement <8 x i16> %vec, i32 0			; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %extract_0 = extractelement <8 x i16> %vec, i32 0
	; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %extract_1 = extractelement <8 x i16> %vec, i32 1			; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %extract_1 = extractelement <8 x i16> %vec, i32 1
	; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %extract_2 = extractelement <8 x i16> %vec, i32 2			; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %extract_2 = extractelement <8 x i16> %vec, i32 2
	; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %extract_3 = extractelement <8 x i16> %vec, i32 3			; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %extract_3 = extractelement <8 x i16> %vec, i32 3
	; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %extract_4 = extractelement <8 x i16> %vec, i32 4			; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %extract_4 = extractelement <8 x i16> %vec, i32 4
	; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %extract_5 = extractelement <8 x i16> %vec, i32 5			; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %extract_5 = extractelement <8 x i16> %vec, i32 5
	; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %extract_6 = extractelement <8 x i16> %vec, i32 6			; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %extract_6 = extractelement <8 x i16> %vec, i32 6
	; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %extract_7 = extractelement <8 x i16> %vec, i32 7			; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %extract_7 = extractelement <8 x i16> %vec, i32 7
	; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void			; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	;			;
	%extract_0 = extractelement<8 x i16> %vec, i32 0			%extract_0 = extractelement<8 x i16> %vec, i32 0
	%extract_1 = extractelement<8 x i16> %vec, i32 1			%extract_1 = extractelement<8 x i16> %vec, i32 1
	%extract_2 = extractelement<8 x i16> %vec, i32 2			%extract_2 = extractelement<8 x i16> %vec, i32 2
	%extract_3 = extractelement<8 x i16> %vec, i32 3			%extract_3 = extractelement<8 x i16> %vec, i32 3
	%extract_4 = extractelement<8 x i16> %vec, i32 4			%extract_4 = extractelement<8 x i16> %vec, i32 4
	%extract_5 = extractelement<8 x i16> %vec, i32 5			%extract_5 = extractelement<8 x i16> %vec, i32 5
	Show All 19 Lines
	; GENERIC-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %extract_12 = extractelement <16 x i8> %vec, i32 12			; GENERIC-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %extract_12 = extractelement <16 x i8> %vec, i32 12
	; GENERIC-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %extract_13 = extractelement <16 x i8> %vec, i32 13			; GENERIC-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %extract_13 = extractelement <16 x i8> %vec, i32 13
	; GENERIC-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %extract_14 = extractelement <16 x i8> %vec, i32 14			; GENERIC-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %extract_14 = extractelement <16 x i8> %vec, i32 14
	; GENERIC-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %extract_15 = extractelement <16 x i8> %vec, i32 15			; GENERIC-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %extract_15 = extractelement <16 x i8> %vec, i32 15
	; GENERIC-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void			; GENERIC-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	;			;
	; NEOVERSE-N1-LABEL: 'extract_i8'			; NEOVERSE-N1-LABEL: 'extract_i8'
	; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %extract_0 = extractelement <16 x i8> %vec, i32 0			; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %extract_0 = extractelement <16 x i8> %vec, i32 0
	; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %extract_1 = extractelement <16 x i8> %vec, i32 1			; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %extract_1 = extractelement <16 x i8> %vec, i32 1
	; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %extract_2 = extractelement <16 x i8> %vec, i32 2			; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %extract_2 = extractelement <16 x i8> %vec, i32 2
	; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %extract_3 = extractelement <16 x i8> %vec, i32 3			; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %extract_3 = extractelement <16 x i8> %vec, i32 3
	; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %extract_4 = extractelement <16 x i8> %vec, i32 4			; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %extract_4 = extractelement <16 x i8> %vec, i32 4
	; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %extract_5 = extractelement <16 x i8> %vec, i32 5			; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %extract_5 = extractelement <16 x i8> %vec, i32 5
	; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %extract_6 = extractelement <16 x i8> %vec, i32 6			; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %extract_6 = extractelement <16 x i8> %vec, i32 6
	; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %extract_7 = extractelement <16 x i8> %vec, i32 7			; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %extract_7 = extractelement <16 x i8> %vec, i32 7
	; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %extract_8 = extractelement <16 x i8> %vec, i32 8			; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %extract_8 = extractelement <16 x i8> %vec, i32 8
	; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %extract_9 = extractelement <16 x i8> %vec, i32 9			; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %extract_9 = extractelement <16 x i8> %vec, i32 9
	; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %extract_10 = extractelement <16 x i8> %vec, i32 10			; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %extract_10 = extractelement <16 x i8> %vec, i32 10
	; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %extract_11 = extractelement <16 x i8> %vec, i32 11			; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %extract_11 = extractelement <16 x i8> %vec, i32 11
	; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %extract_12 = extractelement <16 x i8> %vec, i32 12			; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %extract_12 = extractelement <16 x i8> %vec, i32 12
	; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %extract_13 = extractelement <16 x i8> %vec, i32 13			; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %extract_13 = extractelement <16 x i8> %vec, i32 13
	; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %extract_14 = extractelement <16 x i8> %vec, i32 14			; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %extract_14 = extractelement <16 x i8> %vec, i32 14
	; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %extract_15 = extractelement <16 x i8> %vec, i32 15			; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %extract_15 = extractelement <16 x i8> %vec, i32 15
	; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void			; NEOVERSE-N1-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	;			;
	%extract_0 = extractelement<16 x i8> %vec, i32 0			%extract_0 = extractelement<16 x i8> %vec, i32 0
	%extract_1 = extractelement<16 x i8> %vec, i32 1			%extract_1 = extractelement<16 x i8> %vec, i32 1
	%extract_2 = extractelement<16 x i8> %vec, i32 2			%extract_2 = extractelement<16 x i8> %vec, i32 2
	%extract_3 = extractelement<16 x i8> %vec, i32 3			%extract_3 = extractelement<16 x i8> %vec, i32 3
	%extract_4 = extractelement<16 x i8> %vec, i32 4			%extract_4 = extractelement<16 x i8> %vec, i32 4
	%extract_5 = extractelement<16 x i8> %vec, i32 5			%extract_5 = extractelement<16 x i8> %vec, i32 5
	Show All 12 Lines