This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
-
AArch64Subtarget.h
-
AArch64TargetTransformInfo.h
-
AArch64TargetTransformInfo.cpp
-
test/
-
Analysis/CostModel/AArch64/
-
CostModel/
-
AArch64/
-
arith-fp.ll
-
arith-overflow.ll
1/2
bswap.ll
-
cast.ll
-
cmp.ll
-
ctlz.ll
1
cttz.ll
-
div.ll
-
fptoi_sat.ll
-
free-widening-casts.ll
-
fshl.ll
1/2
fshr.ll
1/3
getIntrinsicInstrCost-vector-reverse.ll
-
insert-extract.ll
-
masked_ldst.ll
-
mem-op-cost-model.ll
-
min-max.ll
-
reduce-fadd.ll
-
reduce-minmax.ll
-
rem.ll
-
shuffle-load.ll
-
shuffle-other.ll
1/3
shuffle-select.ll
-
sve-insert-extract.ll
-
sve-intrinsics.ll
1/4
vector-select.ll
-
Transforms/
-
LoopVectorize/AArch64/
-
AArch64/
-
aarch64-predication.ll
-
interleaved-vs-scalar.ll
-
interleaved_cost.ll
-
masked-op-cost.ll
-
predication_costs.ll
2/4
strict-fadd-cost.ll
-
unsafe-vf-hint-remark.ll
-
LowerMatrixIntrinsics/
-
dot-product-float.ll
-
SLPVectorizer/AArch64/
-
AArch64/
-
ext-trunc.ll
-
gather-cost.ll
-
getelementptr.ll
-
landing_pad.ll
-
matmul.ll
-
memory-runtime-checks.ll
-
multiple_reduction.ll
1/2
slp-fma-loss.ll
-
VectorCombine/AArch64/
-
AArch64/
1/3
load-extractelement-scalarization.ll

Differential D155459

[AArch64] Change the cost of vector insert/extract to 2
ClosedPublic

Authored by dmgreen on Jul 17 2023, 6:49 AM.

Download Raw Diff

Details

Reviewers

SjoerdMeijer
samtebbs
fhahn
t.p.northover
Allen
nilanjana_basu
efriedma
peterwaller-arm

Commits

rG2a859b201461: [AArch64] Change the cost of vector insert/extract to 2

Summary

The cost of vector instructions has always been high under AArch64, in order to add a high cost for inserts/extracts, shuffles and scalarization. This is a conservative approach to limit the scope of "unusual" SLP vectorization where the codegen ends up being quite poor, but has always been higher than the "correct" costs would be for any specific core.

This relaxes that, reducing the vector insert/extract cost from 3 to 2. It is a generalization of D142359 to all AArch64 cpus. The ScalarizationOverhead is also overridden for integer vector at the same time, to remove the effect of lane 0 being considered free for integer vectors (something that should only be true for float when scalarizing).

The lower insert/extract cost will reduce the cost of insert, extracts, shuffling and scalarization. The adjustments of ScalaizationOverhead will increase the cost on integer, especially for small vectors. The end result will be lower cost for float and long-integer types, some higher cost for some smaller vectors. This, along with the raw insert/extract cost being lower, will generally mean more vectorization from the Loop and SLP vectorizer.

We may learn to regret this, as that vectorization is not always profitable. In all the benchmarking I have done this is generally an improvement in the overall performance, and I've attempted to address the places where it wasn't with other costmodel adjustments.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

dmgreen created this revision.Jul 17 2023, 6:49 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 17 2023, 6:49 AM

Herald added subscribers: StephenFan, hiraditya, kristof.beyls. · View Herald Transcript

dmgreen requested review of this revision.Jul 17 2023, 6:49 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 17 2023, 6:49 AM

Herald added a subscriber: wangpc. · View Herald Transcript

dmgreen edited the summary of this revision. (Show Details)Jul 17 2023, 7:03 AM

SjoerdMeijer added inline comments.Jul 17 2023, 7:21 AM

llvm/test/Analysis/CostModel/AArch64/bswap.ll
13–14	Just a bit of a drive by question first. It's not really caused by this change, I think, but it looks like the cost modelling was already a bit off for these bswaps? https://godbolt.org/z/d1s4ToP1G Or am I missing something?

I'm not currently the right person to review this, resigning.

Noted a couple regressions on testcases which explicitly say they're not supposed to be vectorized.

A lot of the numbers for intrinsics are pretty clearly off by a very large amount. If nobody is going to look at them, maybe we should just kill off the tests in question so reviewers don't have to read meaningless updates to them?

llvm/test/Analysis/CostModel/AArch64/fshr.ll
183	Weird cost modeling.
llvm/test/Analysis/CostModel/AArch64/getIntrinsicInstrCost-vector-reverse.ll
25	Weird cost modeling.
llvm/test/Analysis/CostModel/AArch64/shuffle-select.ll
35	Weird cost modeling.
llvm/test/Analysis/CostModel/AArch64/vector-select.ll
692	Cost modeling is weird.
llvm/test/Transforms/LoopVectorize/AArch64/strict-fadd-cost.ll
52–53	This cost modeling is weird.
llvm/test/Transforms/SLPVectorizer/AArch64/slp-fma-loss.ll
204	Regression?
llvm/test/Transforms/VectorCombine/AArch64/load-extractelement-scalarization.ll
521	Regression?

fhahn added a reviewer: nilanjana_basu.Jul 17 2023, 8:15 AM

Harbormaster completed remote builds in B245831: Diff 540925.Jul 17 2023, 11:24 AM

A lot of the numbers for intrinsics are pretty clearly off by a very large amount. If nobody is going to look at them, maybe we should just kill off the tests in question so reviewers don't have to read meaningless updates to them?

A lot of them are at least roughly correct even if they look odd, and it is good to have the test coverage.

llvm/test/Analysis/CostModel/AArch64/bswap.ll
13–14	I don't know if anyone has looked into getting vector bswap costs correct in the past, just scalar. These will just be costed as if they were scalarized, so the changes here are really just saying that the scalarization overhead is changing a little. I can look into improving some of them, but like you say it's a bit unrelated to this change.
llvm/test/Analysis/CostModel/AArch64/fshr.ll
183	Yep I think we have only looked at cost modelling for constant funnel shifts, and those were added fairly recently. I believe the codegen should also be improved for the variable case.
llvm/test/Analysis/CostModel/AArch64/getIntrinsicInstrCost-vector-reverse.ll
25	I agree, but the codegen looks odd without +bf16 too: https://godbolt.org/z/oTG5ae6nP
llvm/test/Analysis/CostModel/AArch64/shuffle-select.ll
35	Do you mean because of the tbl? We have never costed tbls as cheap. I'm not sure if that would be profitable or not, and feels very much like a different issue.
llvm/test/Analysis/CostModel/AArch64/vector-select.ll
692	Because it is too low? It is scalarized without +fullfp16. That codegen could be better, and it looks like the cost is a bit low, not accounting for the scalarization cost of the extracts. I don't think we have focussed much in the past on the combination of fp16 code without fullfp16.
llvm/test/Transforms/LoopVectorize/AArch64/strict-fadd-cost.ll
52–53	This is an in-order reduction.
llvm/test/Transforms/SLPVectorizer/AArch64/slp-fma-loss.ll
204	I had looked into these. This test case was added without the underlying issue being fixed (fusing fmul and fadd). The tests were changed by the increased cost in ld1r instructions. In this case it just profitable again now. You can see it picks 2x vectorization though, not 4x, which seems to come because of the `insertelement <2 x float> <float poison, float 3.000000e+00>, float [[X:%.*]], i32 0`, which is counts as the cost of a constant vector with x inserted into the bottom lane, both of which are incorrectly counted as zero.
llvm/test/Transforms/VectorCombine/AArch64/load-extractelement-scalarization.ll
521	The cost of an extract of lane zero is still 0 (which is known to be wrong but doesn't look like something we can change without causing too many regressions. I was really hoping to remove it for integer type at the same time as this, but it looks like it causes too many problems to remove. I'm hoping that can be improved in the future, and that will hopefully be easier if the base scalar cost is lower). This seems to already handled by instruction selection https://godbolt.org/z/7GEcxo8WT, so shouldnt be a problem on its own. I can change the test to use lane 1 to show it still applies for other lanes.

efriedma added inline comments.Jul 19 2023, 1:24 AM

llvm/test/Analysis/CostModel/AArch64/getIntrinsicInstrCost-vector-reverse.ll
25	Still way overestimating the cost... but yes.
llvm/test/Analysis/CostModel/AArch64/shuffle-select.ll
35	We should be modeling the fact that tbl exists, at least. (I mean, it doesn't need to be super-cheap, but basically all ARM chips have a reasonably fast tbl.)
llvm/test/Analysis/CostModel/AArch64/vector-select.ll
692	Wait, we scalarize this? I thought I checked this, but must not have. We really shouldn't scalarize, though.
llvm/test/Transforms/LoopVectorize/AArch64/strict-fadd-cost.ll
52–53	Then why does it cost 1 at VF 4?
llvm/test/Transforms/VectorCombine/AArch64/load-extractelement-scalarization.ll
521	Okay.

SjoerdMeijer added inline comments.Jul 20 2023, 5:50 AM

llvm/test/Analysis/CostModel/AArch64/vector-select.ll
692	Without fullfp16 support, which is what this is checking with "COST-NOFP16-NEXT", I expect this to get scalarised.

dmgreen added inline comments.Jul 27 2023, 1:22 AM

llvm/test/Transforms/LoopVectorize/AArch64/strict-fadd-cost.ll
52–53	It, for some reason, prints the costs twice and these are matching two different things. There is some details in 2e0bf67df1437cb0156d7f5dd9e1b701749f96ca. I'll rebased on the updated costs.

Rebase over adjusted costs.

Harbormaster completed remote builds in B248468: Diff 544649.Jul 27 2023, 3:22 AM

LGTM. I'll trust your benchmarking on this.

llvm/test/Analysis/CostModel/AArch64/cttz.ll
108	Worth noting the actual cost here is 4 instructions. (We don't scalarize it; we lower using `cnt`.)

This revision is now accepted and ready to land.Jul 27 2023, 11:10 AM

mingmingl added a subscriber: mingmingl.Jul 27 2023, 12:28 PM

Thanks.

This has a fairly high chance of causing some problems somewhere. Please send reports of any regressions and I can see about addressing them.

This revision was landed with ongoing or failed builds.Jul 28 2023, 1:27 PM

Closed by commit rG2a859b201461: [AArch64] Change the cost of vector insert/extract to 2 (authored by dmgreen). · Explain Why

This revision was automatically updated to reflect the committed changes.

dmgreen added a commit: rG2a859b201461: [AArch64] Change the cost of vector insert/extract to 2.

labrinea mentioned this in D133441: [SLP] Vectorize mutual horizontal reductions..Sep 4 2023, 2:09 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64Subtarget.h

2 lines

AArch64TargetTransformInfo.h

5 lines

AArch64TargetTransformInfo.cpp

12 lines

test/

Analysis/

CostModel/

AArch64/

16 lines

12 lines

2 lines

116 lines

2 lines

4 lines

28 lines

216 lines

102 lines

free-widening-casts.ll

2 lines

fshl.ll

14 lines

fshr.ll

14 lines

getIntrinsicInstrCost-vector-reverse.ll

4 lines

60 lines

36 lines

48 lines

32 lines

98 lines

32 lines

336 lines

28 lines

30 lines

6 lines

sve-insert-extract.ll

48 lines

sve-intrinsics.ll

34 lines

vector-select.ll

16 lines

Transforms/

LoopVectorize/

AArch64/

aarch64-predication.ll

2 lines

interleaved-vs-scalar.ll

2 lines

6 lines

4 lines

34 lines

16 lines

unsafe-vf-hint-remark.ll

7 lines

LowerMatrixIntrinsics/

dot-product-float.ll

95 lines

SLPVectorizer/

AArch64/

47 lines

2 lines

2 lines

4 lines

80 lines

memory-runtime-checks.ll

33 lines

multiple_reduction.ll

639 lines

slp-fma-loss.ll

16 lines

VectorCombine/

AArch64/

load-extractelement-scalarization.ll

13 lines

Diff 545259

llvm/lib/Target/AArch64/AArch64Subtarget.h

Show First 20 Lines • Show All 97 Lines • ▼ Show 20 Lines	protected:
unsigned MinVectorRegisterBitWidth = 64;		unsigned MinVectorRegisterBitWidth = 64;

// Bool members corresponding to the SubtargetFeatures defined in tablegen		// Bool members corresponding to the SubtargetFeatures defined in tablegen
#define GET_SUBTARGETINFO_MACRO(ATTRIBUTE, DEFAULT, GETTER) \		#define GET_SUBTARGETINFO_MACRO(ATTRIBUTE, DEFAULT, GETTER) \
bool ATTRIBUTE = DEFAULT;		bool ATTRIBUTE = DEFAULT;
#include "AArch64GenSubtargetInfo.inc"		#include "AArch64GenSubtargetInfo.inc"

uint8_t MaxInterleaveFactor = 2;		uint8_t MaxInterleaveFactor = 2;
uint8_t VectorInsertExtractBaseCost = 3;		uint8_t VectorInsertExtractBaseCost = 2;
uint16_t CacheLineSize = 0;		uint16_t CacheLineSize = 0;
uint16_t PrefetchDistance = 0;		uint16_t PrefetchDistance = 0;
uint16_t MinPrefetchStride = 1;		uint16_t MinPrefetchStride = 1;
unsigned MaxPrefetchIterationsAhead = UINT_MAX;		unsigned MaxPrefetchIterationsAhead = UINT_MAX;
Align PrefFunctionAlignment;		Align PrefFunctionAlignment;
Align PrefLoopAlignment;		Align PrefLoopAlignment;
unsigned MaxBytesForLoopAlignment = 0;		unsigned MaxBytesForLoopAlignment = 0;
unsigned MaxJumpTableSize = 0;		unsigned MaxJumpTableSize = 0;
▲ Show 20 Lines • Show All 316 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h

Show First 20 Lines • Show All 379 Lines • ▼ Show 20 Lines	InstructionCost getArithmeticReductionCost(unsigned Opcode, VectorType *Ty,
TTI::TargetCostKind CostKind);		TTI::TargetCostKind CostKind);

InstructionCost getShuffleCost(TTI::ShuffleKind Kind, VectorType *Tp,		InstructionCost getShuffleCost(TTI::ShuffleKind Kind, VectorType *Tp,
ArrayRef<int> Mask,		ArrayRef<int> Mask,
TTI::TargetCostKind CostKind, int Index,		TTI::TargetCostKind CostKind, int Index,
VectorType *SubTp,		VectorType *SubTp,
ArrayRef<const Value *> Args = std::nullopt);		ArrayRef<const Value *> Args = std::nullopt);

		InstructionCost getScalarizationOverhead(VectorType *Ty,
		const APInt &DemandedElts,
		bool Insert, bool Extract,
		TTI::TargetCostKind CostKind);

/// Return the cost of the scaling factor used in the addressing		/// Return the cost of the scaling factor used in the addressing
/// mode represented by AM for this target, for a load/store		/// mode represented by AM for this target, for a load/store
/// of the specified type.		/// of the specified type.
/// If the AM is supported, the return value must be >= 0.		/// If the AM is supported, the return value must be >= 0.
/// If the AM is not supported, it returns a negative value.		/// If the AM is not supported, it returns a negative value.
InstructionCost getScalingFactorCost(Type Ty, GlobalValue BaseGV,		InstructionCost getScalingFactorCost(Type Ty, GlobalValue BaseGV,
int64_t BaseOffset, bool HasBaseReg,		int64_t BaseOffset, bool HasBaseReg,
int64_t Scale, unsigned AddrSpace) const;		int64_t Scale, unsigned AddrSpace) const;
Show All 17 Lines

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

	Show First 20 Lines • Show All 2,554 Lines • ▼ Show 20 Lines

	InstructionCost AArch64TTIImpl::getVectorInstrCost(const Instruction &I,			InstructionCost AArch64TTIImpl::getVectorInstrCost(const Instruction &I,
	Type *Val,			Type *Val,
	TTI::TargetCostKind CostKind,			TTI::TargetCostKind CostKind,
	unsigned Index) {			unsigned Index) {
	return getVectorInstrCostHelper(&I, Val, Index, true /* HasRealUse */);			return getVectorInstrCostHelper(&I, Val, Index, true /* HasRealUse */);
	}			}

				InstructionCost AArch64TTIImpl::getScalarizationOverhead(
				VectorType *Ty, const APInt &DemandedElts, bool Insert, bool Extract,
				TTI::TargetCostKind CostKind) {
				if (isa<ScalableVectorType>(Ty))
				return InstructionCost::getInvalid();
				if (Ty->getElementType()->isFloatingPointTy())
				return BaseT::getScalarizationOverhead(Ty, DemandedElts, Insert, Extract,
				CostKind);
				return DemandedElts.popcount() * (Insert + Extract) *
				ST->getVectorInsertExtractBaseCost();
				}

	InstructionCost AArch64TTIImpl::getArithmeticInstrCost(			InstructionCost AArch64TTIImpl::getArithmeticInstrCost(
	unsigned Opcode, Type *Ty, TTI::TargetCostKind CostKind,			unsigned Opcode, Type *Ty, TTI::TargetCostKind CostKind,
	TTI::OperandValueInfo Op1Info, TTI::OperandValueInfo Op2Info,			TTI::OperandValueInfo Op1Info, TTI::OperandValueInfo Op2Info,
	ArrayRef<const Value *> Args,			ArrayRef<const Value *> Args,
	const Instruction *CxtI) {			const Instruction *CxtI) {

	// TODO: Handle more cost kinds.			// TODO: Handle more cost kinds.
	if (CostKind != TTI::TCK_RecipThroughput)			if (CostKind != TTI::TCK_RecipThroughput)
	▲ Show 20 Lines • Show All 1,236 Lines • Show Last 20 Lines

llvm/test/Analysis/CostModel/AArch64/arith-fp.ll

Show First 20 Lines • Show All 192 Lines • ▼ Show 20 Lines	;
%V4F64 = fdiv <4 x double> undef, undef		%V4F64 = fdiv <4 x double> undef, undef

ret i32 undef		ret i32 undef
}		}

define i32 @frem(i32 %arg) {		define i32 @frem(i32 %arg) {
; CHECK-LABEL: 'frem'		; CHECK-LABEL: 'frem'
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %F16 = frem half undef, undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %F16 = frem half undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 26 for instruction: %V4F16 = frem <4 x half> undef, undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %V4F16 = frem <4 x half> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 58 for instruction: %V8F16 = frem <8 x half> undef, undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 44 for instruction: %V8F16 = frem <8 x half> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 116 for instruction: %V16F16 = frem <16 x half> undef, undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 88 for instruction: %V16F16 = frem <16 x half> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %F32 = frem float undef, undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %F32 = frem float undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %V2F32 = frem <2 x float> undef, undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V2F32 = frem <2 x float> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 26 for instruction: %V4F32 = frem <4 x float> undef, undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %V4F32 = frem <4 x float> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 52 for instruction: %V8F32 = frem <8 x float> undef, undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %V8F32 = frem <8 x float> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %F64 = frem double undef, undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %F64 = frem double undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %V2F64 = frem <2 x double> undef, undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V2F64 = frem <2 x double> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %V4F64 = frem <4 x double> undef, undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %V4F64 = frem <4 x double> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
;		;
%F16 = frem half undef, undef		%F16 = frem half undef, undef
%V4F16 = frem <4 x half> undef, undef		%V4F16 = frem <4 x half> undef, undef
%V8F16 = frem <8 x half> undef, undef		%V8F16 = frem <8 x half> undef, undef
%V16F16 = frem <16 x half> undef, undef		%V16F16 = frem <16 x half> undef, undef

%F32 = frem float undef, undef		%F32 = frem float undef, undef
▲ Show 20 Lines • Show All 194 Lines • Show Last 20 Lines

llvm/test/Analysis/CostModel/AArch64/arith-overflow.ll

	Show First 20 Lines • Show All 349 Lines • ▼ Show 20 Lines
	declare {i8, i1} @llvm.smul.with.overflow.i8(i8, i8)			declare {i8, i1} @llvm.smul.with.overflow.i8(i8, i8)
	declare {<16 x i8>, <16 x i1>} @llvm.smul.with.overflow.v16i8(<16 x i8>, <16 x i8>)			declare {<16 x i8>, <16 x i1>} @llvm.smul.with.overflow.v16i8(<16 x i8>, <16 x i8>)
	declare {<32 x i8>, <32 x i1>} @llvm.smul.with.overflow.v32i8(<32 x i8>, <32 x i8>)			declare {<32 x i8>, <32 x i1>} @llvm.smul.with.overflow.v32i8(<32 x i8>, <32 x i8>)
	declare {<64 x i8>, <64 x i1>} @llvm.smul.with.overflow.v64i8(<64 x i8>, <64 x i8>)			declare {<64 x i8>, <64 x i1>} @llvm.smul.with.overflow.v64i8(<64 x i8>, <64 x i8>)

	define i32 @smul(i32 %arg) {			define i32 @smul(i32 %arg) {
	; RECIP-LABEL: 'smul'			; RECIP-LABEL: 'smul'
	; RECIP-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I64 = call { i64, i1 } @llvm.smul.with.overflow.i64(i64 undef, i64 undef)			; RECIP-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I64 = call { i64, i1 } @llvm.smul.with.overflow.i64(i64 undef, i64 undef)
	; RECIP-NEXT: Cost Model: Found an estimated cost of 18 for instruction: %V2I64 = call { <2 x i64>, <2 x i1> } @llvm.smul.with.overflow.v2i64(<2 x i64> undef, <2 x i64> undef)			; RECIP-NEXT: Cost Model: Found an estimated cost of 34 for instruction: %V2I64 = call { <2 x i64>, <2 x i1> } @llvm.smul.with.overflow.v2i64(<2 x i64> undef, <2 x i64> undef)
	; RECIP-NEXT: Cost Model: Found an estimated cost of 36 for instruction: %V4I64 = call { <4 x i64>, <4 x i1> } @llvm.smul.with.overflow.v4i64(<4 x i64> undef, <4 x i64> undef)			; RECIP-NEXT: Cost Model: Found an estimated cost of 68 for instruction: %V4I64 = call { <4 x i64>, <4 x i1> } @llvm.smul.with.overflow.v4i64(<4 x i64> undef, <4 x i64> undef)
	; RECIP-NEXT: Cost Model: Found an estimated cost of 72 for instruction: %V8I64 = call { <8 x i64>, <8 x i1> } @llvm.smul.with.overflow.v8i64(<8 x i64> undef, <8 x i64> undef)			; RECIP-NEXT: Cost Model: Found an estimated cost of 136 for instruction: %V8I64 = call { <8 x i64>, <8 x i1> } @llvm.smul.with.overflow.v8i64(<8 x i64> undef, <8 x i64> undef)
	; RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %I32 = call { i32, i1 } @llvm.smul.with.overflow.i32(i32 undef, i32 undef)			; RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %I32 = call { i32, i1 } @llvm.smul.with.overflow.i32(i32 undef, i32 undef)
	; RECIP-NEXT: Cost Model: Found an estimated cost of 38 for instruction: %V4I32 = call { <4 x i32>, <4 x i1> } @llvm.smul.with.overflow.v4i32(<4 x i32> undef, <4 x i32> undef)			; RECIP-NEXT: Cost Model: Found an estimated cost of 38 for instruction: %V4I32 = call { <4 x i32>, <4 x i1> } @llvm.smul.with.overflow.v4i32(<4 x i32> undef, <4 x i32> undef)
	; RECIP-NEXT: Cost Model: Found an estimated cost of 76 for instruction: %V8I32 = call { <8 x i32>, <8 x i1> } @llvm.smul.with.overflow.v8i32(<8 x i32> undef, <8 x i32> undef)			; RECIP-NEXT: Cost Model: Found an estimated cost of 76 for instruction: %V8I32 = call { <8 x i32>, <8 x i1> } @llvm.smul.with.overflow.v8i32(<8 x i32> undef, <8 x i32> undef)
	; RECIP-NEXT: Cost Model: Found an estimated cost of 152 for instruction: %V16I32 = call { <16 x i32>, <16 x i1> } @llvm.smul.with.overflow.v16i32(<16 x i32> undef, <16 x i32> undef)			; RECIP-NEXT: Cost Model: Found an estimated cost of 152 for instruction: %V16I32 = call { <16 x i32>, <16 x i1> } @llvm.smul.with.overflow.v16i32(<16 x i32> undef, <16 x i32> undef)
	; RECIP-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %I16 = call { i16, i1 } @llvm.smul.with.overflow.i16(i16 undef, i16 undef)			; RECIP-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %I16 = call { i16, i1 } @llvm.smul.with.overflow.i16(i16 undef, i16 undef)
	; RECIP-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %V8I16 = call { <8 x i16>, <8 x i1> } @llvm.smul.with.overflow.v8i16(<8 x i16> undef, <8 x i16> undef)			; RECIP-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %V8I16 = call { <8 x i16>, <8 x i1> } @llvm.smul.with.overflow.v8i16(<8 x i16> undef, <8 x i16> undef)
	; RECIP-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %V16I16 = call { <16 x i16>, <16 x i1> } @llvm.smul.with.overflow.v16i16(<16 x i16> undef, <16 x i16> undef)			; RECIP-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %V16I16 = call { <16 x i16>, <16 x i1> } @llvm.smul.with.overflow.v16i16(<16 x i16> undef, <16 x i16> undef)
	; RECIP-NEXT: Cost Model: Found an estimated cost of 48 for instruction: %V32I16 = call { <32 x i16>, <32 x i1> } @llvm.smul.with.overflow.v32i16(<32 x i16> undef, <32 x i16> undef)			; RECIP-NEXT: Cost Model: Found an estimated cost of 48 for instruction: %V32I16 = call { <32 x i16>, <32 x i1> } @llvm.smul.with.overflow.v32i16(<32 x i16> undef, <32 x i16> undef)
	▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines
	declare {i8, i1} @llvm.umul.with.overflow.i8(i8, i8)			declare {i8, i1} @llvm.umul.with.overflow.i8(i8, i8)
	declare {<16 x i8>, <16 x i1>} @llvm.umul.with.overflow.v16i8(<16 x i8>, <16 x i8>)			declare {<16 x i8>, <16 x i1>} @llvm.umul.with.overflow.v16i8(<16 x i8>, <16 x i8>)
	declare {<32 x i8>, <32 x i1>} @llvm.umul.with.overflow.v32i8(<32 x i8>, <32 x i8>)			declare {<32 x i8>, <32 x i1>} @llvm.umul.with.overflow.v32i8(<32 x i8>, <32 x i8>)
	declare {<64 x i8>, <64 x i1>} @llvm.umul.with.overflow.v64i8(<64 x i8>, <64 x i8>)			declare {<64 x i8>, <64 x i1>} @llvm.umul.with.overflow.v64i8(<64 x i8>, <64 x i8>)

	define i32 @umul(i32 %arg) {			define i32 @umul(i32 %arg) {
	; RECIP-LABEL: 'umul'			; RECIP-LABEL: 'umul'
	; RECIP-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I64 = call { i64, i1 } @llvm.umul.with.overflow.i64(i64 undef, i64 undef)			; RECIP-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I64 = call { i64, i1 } @llvm.umul.with.overflow.i64(i64 undef, i64 undef)
	; RECIP-NEXT: Cost Model: Found an estimated cost of 17 for instruction: %V2I64 = call { <2 x i64>, <2 x i1> } @llvm.umul.with.overflow.v2i64(<2 x i64> undef, <2 x i64> undef)			; RECIP-NEXT: Cost Model: Found an estimated cost of 33 for instruction: %V2I64 = call { <2 x i64>, <2 x i1> } @llvm.umul.with.overflow.v2i64(<2 x i64> undef, <2 x i64> undef)
	; RECIP-NEXT: Cost Model: Found an estimated cost of 34 for instruction: %V4I64 = call { <4 x i64>, <4 x i1> } @llvm.umul.with.overflow.v4i64(<4 x i64> undef, <4 x i64> undef)			; RECIP-NEXT: Cost Model: Found an estimated cost of 66 for instruction: %V4I64 = call { <4 x i64>, <4 x i1> } @llvm.umul.with.overflow.v4i64(<4 x i64> undef, <4 x i64> undef)
	; RECIP-NEXT: Cost Model: Found an estimated cost of 68 for instruction: %V8I64 = call { <8 x i64>, <8 x i1> } @llvm.umul.with.overflow.v8i64(<8 x i64> undef, <8 x i64> undef)			; RECIP-NEXT: Cost Model: Found an estimated cost of 132 for instruction: %V8I64 = call { <8 x i64>, <8 x i1> } @llvm.umul.with.overflow.v8i64(<8 x i64> undef, <8 x i64> undef)
	; RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %I32 = call { i32, i1 } @llvm.umul.with.overflow.i32(i32 undef, i32 undef)			; RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %I32 = call { i32, i1 } @llvm.umul.with.overflow.i32(i32 undef, i32 undef)
	; RECIP-NEXT: Cost Model: Found an estimated cost of 37 for instruction: %V4I32 = call { <4 x i32>, <4 x i1> } @llvm.umul.with.overflow.v4i32(<4 x i32> undef, <4 x i32> undef)			; RECIP-NEXT: Cost Model: Found an estimated cost of 37 for instruction: %V4I32 = call { <4 x i32>, <4 x i1> } @llvm.umul.with.overflow.v4i32(<4 x i32> undef, <4 x i32> undef)
	; RECIP-NEXT: Cost Model: Found an estimated cost of 74 for instruction: %V8I32 = call { <8 x i32>, <8 x i1> } @llvm.umul.with.overflow.v8i32(<8 x i32> undef, <8 x i32> undef)			; RECIP-NEXT: Cost Model: Found an estimated cost of 74 for instruction: %V8I32 = call { <8 x i32>, <8 x i1> } @llvm.umul.with.overflow.v8i32(<8 x i32> undef, <8 x i32> undef)
	; RECIP-NEXT: Cost Model: Found an estimated cost of 148 for instruction: %V16I32 = call { <16 x i32>, <16 x i1> } @llvm.umul.with.overflow.v16i32(<16 x i32> undef, <16 x i32> undef)			; RECIP-NEXT: Cost Model: Found an estimated cost of 148 for instruction: %V16I32 = call { <16 x i32>, <16 x i1> } @llvm.umul.with.overflow.v16i32(<16 x i32> undef, <16 x i32> undef)
	; RECIP-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %I16 = call { i16, i1 } @llvm.umul.with.overflow.i16(i16 undef, i16 undef)			; RECIP-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %I16 = call { i16, i1 } @llvm.umul.with.overflow.i16(i16 undef, i16 undef)
	; RECIP-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %V8I16 = call { <8 x i16>, <8 x i1> } @llvm.umul.with.overflow.v8i16(<8 x i16> undef, <8 x i16> undef)			; RECIP-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %V8I16 = call { <8 x i16>, <8 x i1> } @llvm.umul.with.overflow.v8i16(<8 x i16> undef, <8 x i16> undef)
	; RECIP-NEXT: Cost Model: Found an estimated cost of 22 for instruction: %V16I16 = call { <16 x i16>, <16 x i1> } @llvm.umul.with.overflow.v16i16(<16 x i16> undef, <16 x i16> undef)			; RECIP-NEXT: Cost Model: Found an estimated cost of 22 for instruction: %V16I16 = call { <16 x i16>, <16 x i1> } @llvm.umul.with.overflow.v16i16(<16 x i16> undef, <16 x i16> undef)
	; RECIP-NEXT: Cost Model: Found an estimated cost of 44 for instruction: %V32I16 = call { <32 x i16>, <32 x i1> } @llvm.umul.with.overflow.v32i16(<32 x i16> undef, <32 x i16> undef)			; RECIP-NEXT: Cost Model: Found an estimated cost of 44 for instruction: %V32I16 = call { <32 x i16>, <32 x i1> } @llvm.umul.with.overflow.v32i16(<32 x i16> undef, <32 x i16> undef)
	▲ Show 20 Lines • Show All 47 Lines • Show Last 20 Lines

llvm/test/Analysis/CostModel/AArch64/bswap.ll

	; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py UTC_ARGS: --version 2			; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py UTC_ARGS: --version 2
	; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -mtriple=aarch64--linux-gnu < %s \| FileCheck %s			; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -mtriple=aarch64--linux-gnu < %s \| FileCheck %s

	; Verify the cost of bswap instructions.			; Verify the cost of bswap instructions.

	target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"

	declare i16 @llvm.bswap.i16(i16)			declare i16 @llvm.bswap.i16(i16)
	declare i32 @llvm.bswap.i32(i32)			declare i32 @llvm.bswap.i32(i32)
	declare i64 @llvm.bswap.i64(i64)			declare i64 @llvm.bswap.i64(i64)

	declare <4 x i16> @llvm.bswap.v4i16(<4 x i16>)			declare <4 x i16> @llvm.bswap.v4i16(<4 x i16>)
	declare <8 x i16> @llvm.bswap.v8i16(<8 x i16>)			declare <8 x i16> @llvm.bswap.v8i16(<8 x i16>)
	declare <16 x i16> @llvm.bswap.v16i16(<16 x i16>)			declare <16 x i16> @llvm.bswap.v16i16(<16 x i16>)
				SjoerdMeijerUnsubmitted Not Done Reply Inline Actions Just a bit of a drive by question first. It's not really caused by this change, I think, but it looks like the cost modelling was already a bit off for these bswaps? https://godbolt.org/z/d1s4ToP1G Or am I missing something? SjoerdMeijer: Just a bit of a drive by question first. It's not really caused by this change, I think, but it…
				dmgreenAuthorUnsubmitted Done Reply Inline Actions I don't know if anyone has looked into getting vector bswap costs correct in the past, just scalar. These will just be costed as if they were scalarized, so the changes here are really just saying that the scalarization overhead is changing a little. I can look into improving some of them, but like you say it's a bit unrelated to this change. dmgreen: I don't know if anyone has looked into getting vector bswap costs correct in the past, just…
	declare <2 x i32> @llvm.bswap.v2i32(<2 x i32>)			declare <2 x i32> @llvm.bswap.v2i32(<2 x i32>)
	declare <4 x i32> @llvm.bswap.v4i32(<4 x i32>)			declare <4 x i32> @llvm.bswap.v4i32(<4 x i32>)
	declare <8 x i32> @llvm.bswap.v8i32(<8 x i32>)			declare <8 x i32> @llvm.bswap.v8i32(<8 x i32>)
	declare <2 x i64> @llvm.bswap.v2i64(<2 x i64>)			declare <2 x i64> @llvm.bswap.v2i64(<2 x i64>)
	declare <4 x i64> @llvm.bswap.v4i64(<4 x i64>)			declare <4 x i64> @llvm.bswap.v4i64(<4 x i64>)
	declare <3 x i32> @llvm.bswap.v3i32(<3 x i32>)			declare <3 x i32> @llvm.bswap.v3i32(<3 x i32>)
	declare <4 x i48> @llvm.bswap.v4i48(<4 x i48>)			declare <4 x i48> @llvm.bswap.v4i48(<4 x i48>)

	Show All 16 Lines
	; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i16 = call <8 x i16> @llvm.bswap.v8i16(<8 x i16> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i16 = call <8 x i16> @llvm.bswap.v8i16(<8 x i16> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v16i16 = call <16 x i16> @llvm.bswap.v16i16(<16 x i16> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v16i16 = call <16 x i16> @llvm.bswap.v16i16(<16 x i16> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2i32 = call <2 x i32> @llvm.bswap.v2i32(<2 x i32> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2i32 = call <2 x i32> @llvm.bswap.v2i32(<2 x i32> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i32 = call <4 x i32> @llvm.bswap.v4i32(<4 x i32> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i32 = call <4 x i32> @llvm.bswap.v4i32(<4 x i32> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v8i32 = call <8 x i32> @llvm.bswap.v8i32(<8 x i32> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v8i32 = call <8 x i32> @llvm.bswap.v8i32(<8 x i32> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2i64 = call <2 x i64> @llvm.bswap.v2i64(<2 x i64> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2i64 = call <2 x i64> @llvm.bswap.v2i64(<2 x i64> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v4i64 = call <4 x i64> @llvm.bswap.v4i64(<4 x i64> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v4i64 = call <4 x i64> @llvm.bswap.v4i64(<4 x i64> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v3i32 = call <3 x i32> @llvm.bswap.v3i32(<3 x i32> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v3i32 = call <3 x i32> @llvm.bswap.v3i32(<3 x i32> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %v4i48 = call <4 x i48> @llvm.bswap.v4i48(<4 x i48> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %v4i48 = call <4 x i48> @llvm.bswap.v4i48(<4 x i48> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	;			;
	%v4i16 = call <4 x i16> @llvm.bswap.v4i16(<4 x i16> undef)			%v4i16 = call <4 x i16> @llvm.bswap.v4i16(<4 x i16> undef)
	%v8i16 = call <8 x i16> @llvm.bswap.v8i16(<8 x i16> undef)			%v8i16 = call <8 x i16> @llvm.bswap.v8i16(<8 x i16> undef)
	%v16i16 = call <16 x i16> @llvm.bswap.v16i16(<16 x i16> undef)			%v16i16 = call <16 x i16> @llvm.bswap.v16i16(<16 x i16> undef)
	%v2i32 = call <2 x i32> @llvm.bswap.v2i32(<2 x i32> undef)			%v2i32 = call <2 x i32> @llvm.bswap.v2i32(<2 x i32> undef)
	%v4i32 = call <4 x i32> @llvm.bswap.v4i32(<4 x i32> undef)			%v4i32 = call <4 x i32> @llvm.bswap.v4i32(<4 x i32> undef)
	%v8i32 = call <8 x i32> @llvm.bswap.v8i32(<8 x i32> undef)			%v8i32 = call <8 x i32> @llvm.bswap.v8i32(<8 x i32> undef)
	%v2i64 = call <2 x i64> @llvm.bswap.v2i64(<2 x i64> undef)			%v2i64 = call <2 x i64> @llvm.bswap.v2i64(<2 x i64> undef)
	%v4i64 = call <4 x i64> @llvm.bswap.v4i64(<4 x i64> undef)			%v4i64 = call <4 x i64> @llvm.bswap.v4i64(<4 x i64> undef)

	%v3i32 = call <3 x i32> @llvm.bswap.v3i32(<3 x i32> undef)			%v3i32 = call <3 x i32> @llvm.bswap.v3i32(<3 x i32> undef)
	%v4i48 = call <4 x i48> @llvm.bswap.v4i48(<4 x i48> undef)			%v4i48 = call <4 x i48> @llvm.bswap.v4i48(<4 x i48> undef)
	ret void			ret void
	}			}

llvm/test/Analysis/CostModel/AArch64/cast.ll

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 941 Lines • ▼ Show 20 Lines
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r102 = fptoui <2 x double> undef to <2 x i8>		; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r102 = fptoui <2 x double> undef to <2 x i8>
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r103 = fptosi <2 x double> undef to <2 x i8>		; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r103 = fptosi <2 x double> undef to <2 x i8>
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r104 = fptoui <2 x double> undef to <2 x i16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r104 = fptoui <2 x double> undef to <2 x i16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r105 = fptosi <2 x double> undef to <2 x i16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r105 = fptosi <2 x double> undef to <2 x i16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r106 = fptoui <2 x double> undef to <2 x i32>		; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r106 = fptoui <2 x double> undef to <2 x i32>
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r107 = fptosi <2 x double> undef to <2 x i32>		; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r107 = fptosi <2 x double> undef to <2 x i32>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r108 = fptoui <2 x double> undef to <2 x i64>		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r108 = fptoui <2 x double> undef to <2 x i64>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r109 = fptosi <2 x double> undef to <2 x i64>		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r109 = fptosi <2 x double> undef to <2 x i64>
; CHECK-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %r110 = fptoui <4 x float> undef to <4 x i1>		; CHECK-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %r110 = fptoui <4 x float> undef to <4 x i1>
; CHECK-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %r111 = fptosi <4 x float> undef to <4 x i1>		; CHECK-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %r111 = fptosi <4 x float> undef to <4 x i1>
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r112 = fptoui <4 x float> undef to <4 x i8>		; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r112 = fptoui <4 x float> undef to <4 x i8>
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r113 = fptosi <4 x float> undef to <4 x i8>		; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r113 = fptosi <4 x float> undef to <4 x i8>
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r114 = fptoui <4 x float> undef to <4 x i16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r114 = fptoui <4 x float> undef to <4 x i16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r115 = fptosi <4 x float> undef to <4 x i16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r115 = fptosi <4 x float> undef to <4 x i16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r116 = fptoui <4 x float> undef to <4 x i32>		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r116 = fptoui <4 x float> undef to <4 x i32>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r117 = fptosi <4 x float> undef to <4 x i32>		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r117 = fptosi <4 x float> undef to <4 x i32>
; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %r118 = fptoui <4 x float> undef to <4 x i64>		; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %r118 = fptoui <4 x float> undef to <4 x i64>
; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %r119 = fptosi <4 x float> undef to <4 x i64>		; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %r119 = fptosi <4 x float> undef to <4 x i64>
; CHECK-NEXT: Cost Model: Found an estimated cost of 21 for instruction: %r120 = fptoui <4 x double> undef to <4 x i1>		; CHECK-NEXT: Cost Model: Found an estimated cost of 21 for instruction: %r120 = fptoui <4 x double> undef to <4 x i1>
; CHECK-NEXT: Cost Model: Found an estimated cost of 21 for instruction: %r121 = fptosi <4 x double> undef to <4 x i1>		; CHECK-NEXT: Cost Model: Found an estimated cost of 21 for instruction: %r121 = fptosi <4 x double> undef to <4 x i1>
; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %r122 = fptoui <4 x double> undef to <4 x i8>		; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %r122 = fptoui <4 x double> undef to <4 x i8>
; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %r123 = fptosi <4 x double> undef to <4 x i8>		; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %r123 = fptosi <4 x double> undef to <4 x i8>
; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %r124 = fptoui <4 x double> undef to <4 x i16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %r124 = fptoui <4 x double> undef to <4 x i16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %r125 = fptosi <4 x double> undef to <4 x i16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %r125 = fptosi <4 x double> undef to <4 x i16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %r126 = fptoui <4 x double> undef to <4 x i32>		; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %r126 = fptoui <4 x double> undef to <4 x i32>
; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %r127 = fptosi <4 x double> undef to <4 x i32>		; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %r127 = fptosi <4 x double> undef to <4 x i32>
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r128 = fptoui <4 x double> undef to <4 x i64>		; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r128 = fptoui <4 x double> undef to <4 x i64>
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r129 = fptosi <4 x double> undef to <4 x i64>		; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r129 = fptosi <4 x double> undef to <4 x i64>
; CHECK-NEXT: Cost Model: Found an estimated cost of 57 for instruction: %r130 = fptoui <8 x float> undef to <8 x i1>		; CHECK-NEXT: Cost Model: Found an estimated cost of 41 for instruction: %r130 = fptoui <8 x float> undef to <8 x i1>
; CHECK-NEXT: Cost Model: Found an estimated cost of 57 for instruction: %r131 = fptosi <8 x float> undef to <8 x i1>		; CHECK-NEXT: Cost Model: Found an estimated cost of 41 for instruction: %r131 = fptosi <8 x float> undef to <8 x i1>
; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %r132 = fptoui <8 x float> undef to <8 x i8>		; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %r132 = fptoui <8 x float> undef to <8 x i8>
; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %r133 = fptosi <8 x float> undef to <8 x i8>		; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %r133 = fptosi <8 x float> undef to <8 x i8>
; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %r134 = fptoui <8 x float> undef to <8 x i16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %r134 = fptoui <8 x float> undef to <8 x i16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %r135 = fptosi <8 x float> undef to <8 x i16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %r135 = fptosi <8 x float> undef to <8 x i16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r136 = fptoui <8 x float> undef to <8 x i32>		; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r136 = fptoui <8 x float> undef to <8 x i32>
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r137 = fptosi <8 x float> undef to <8 x i32>		; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r137 = fptosi <8 x float> undef to <8 x i32>
; CHECK-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r138 = fptoui <8 x float> undef to <8 x i64>		; CHECK-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r138 = fptoui <8 x float> undef to <8 x i64>
; CHECK-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r139 = fptosi <8 x float> undef to <8 x i64>		; CHECK-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r139 = fptosi <8 x float> undef to <8 x i64>
; CHECK-NEXT: Cost Model: Found an estimated cost of 43 for instruction: %r140 = fptoui <8 x double> undef to <8 x i1>		; CHECK-NEXT: Cost Model: Found an estimated cost of 43 for instruction: %r140 = fptoui <8 x double> undef to <8 x i1>
; CHECK-NEXT: Cost Model: Found an estimated cost of 43 for instruction: %r141 = fptosi <8 x double> undef to <8 x i1>		; CHECK-NEXT: Cost Model: Found an estimated cost of 43 for instruction: %r141 = fptosi <8 x double> undef to <8 x i1>
; CHECK-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %r142 = fptoui <8 x double> undef to <8 x i8>		; CHECK-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %r142 = fptoui <8 x double> undef to <8 x i8>
; CHECK-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %r143 = fptosi <8 x double> undef to <8 x i8>		; CHECK-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %r143 = fptosi <8 x double> undef to <8 x i8>
; CHECK-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %r144 = fptoui <8 x double> undef to <8 x i16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %r144 = fptoui <8 x double> undef to <8 x i16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %r145 = fptosi <8 x double> undef to <8 x i16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %r145 = fptosi <8 x double> undef to <8 x i16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r146 = fptoui <8 x double> undef to <8 x i32>		; CHECK-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r146 = fptoui <8 x double> undef to <8 x i32>
; CHECK-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r147 = fptosi <8 x double> undef to <8 x i32>		; CHECK-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r147 = fptosi <8 x double> undef to <8 x i32>
; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %r148 = fptoui <8 x double> undef to <8 x i64>		; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %r148 = fptoui <8 x double> undef to <8 x i64>
; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %r149 = fptosi <8 x double> undef to <8 x i64>		; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %r149 = fptosi <8 x double> undef to <8 x i64>
; CHECK-NEXT: Cost Model: Found an estimated cost of 115 for instruction: %r150 = fptoui <16 x float> undef to <16 x i1>		; CHECK-NEXT: Cost Model: Found an estimated cost of 83 for instruction: %r150 = fptoui <16 x float> undef to <16 x i1>
; CHECK-NEXT: Cost Model: Found an estimated cost of 115 for instruction: %r151 = fptosi <16 x float> undef to <16 x i1>		; CHECK-NEXT: Cost Model: Found an estimated cost of 83 for instruction: %r151 = fptosi <16 x float> undef to <16 x i1>
; CHECK-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %r152 = fptoui <16 x float> undef to <16 x i8>		; CHECK-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %r152 = fptoui <16 x float> undef to <16 x i8>
; CHECK-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %r153 = fptosi <16 x float> undef to <16 x i8>		; CHECK-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %r153 = fptosi <16 x float> undef to <16 x i8>
; CHECK-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r154 = fptoui <16 x float> undef to <16 x i16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r154 = fptoui <16 x float> undef to <16 x i16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r155 = fptosi <16 x float> undef to <16 x i16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r155 = fptosi <16 x float> undef to <16 x i16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %r156 = fptoui <16 x float> undef to <16 x i32>		; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %r156 = fptoui <16 x float> undef to <16 x i32>
; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %r157 = fptosi <16 x float> undef to <16 x i32>		; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %r157 = fptosi <16 x float> undef to <16 x i32>
; CHECK-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %r158 = fptoui <16 x float> undef to <16 x i64>		; CHECK-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %r158 = fptoui <16 x float> undef to <16 x i64>
; CHECK-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %r159 = fptosi <16 x float> undef to <16 x i64>		; CHECK-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %r159 = fptosi <16 x float> undef to <16 x i64>
▲ Show 20 Lines • Show All 368 Lines • ▼ Show 20 Lines
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r102 = fptoui <2 x double> undef to <2 x i8>		; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r102 = fptoui <2 x double> undef to <2 x i8>
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r103 = fptosi <2 x double> undef to <2 x i8>		; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r103 = fptosi <2 x double> undef to <2 x i8>
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r104 = fptoui <2 x double> undef to <2 x i16>		; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r104 = fptoui <2 x double> undef to <2 x i16>
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r105 = fptosi <2 x double> undef to <2 x i16>		; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r105 = fptosi <2 x double> undef to <2 x i16>
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r106 = fptoui <2 x double> undef to <2 x i32>		; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r106 = fptoui <2 x double> undef to <2 x i32>
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r107 = fptosi <2 x double> undef to <2 x i32>		; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r107 = fptosi <2 x double> undef to <2 x i32>
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r108 = fptoui <2 x double> undef to <2 x i64>		; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r108 = fptoui <2 x double> undef to <2 x i64>
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r109 = fptosi <2 x double> undef to <2 x i64>		; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r109 = fptosi <2 x double> undef to <2 x i64>
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %r110 = fptoui <4 x float> undef to <4 x i1>		; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %r110 = fptoui <4 x float> undef to <4 x i1>
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %r111 = fptosi <4 x float> undef to <4 x i1>		; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %r111 = fptosi <4 x float> undef to <4 x i1>
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r112 = fptoui <4 x float> undef to <4 x i8>		; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r112 = fptoui <4 x float> undef to <4 x i8>
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r113 = fptosi <4 x float> undef to <4 x i8>		; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r113 = fptosi <4 x float> undef to <4 x i8>
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r114 = fptoui <4 x float> undef to <4 x i16>		; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r114 = fptoui <4 x float> undef to <4 x i16>
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r115 = fptosi <4 x float> undef to <4 x i16>		; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r115 = fptosi <4 x float> undef to <4 x i16>
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r116 = fptoui <4 x float> undef to <4 x i32>		; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r116 = fptoui <4 x float> undef to <4 x i32>
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r117 = fptosi <4 x float> undef to <4 x i32>		; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r117 = fptosi <4 x float> undef to <4 x i32>
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r118 = fptoui <4 x float> undef to <4 x i64>		; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r118 = fptoui <4 x float> undef to <4 x i64>
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r119 = fptosi <4 x float> undef to <4 x i64>		; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r119 = fptosi <4 x float> undef to <4 x i64>
▲ Show 20 Lines • Show All 195 Lines • ▼ Show 20 Lines
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r102 = fptoui <2 x double> undef to <2 x i8>		; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r102 = fptoui <2 x double> undef to <2 x i8>
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r103 = fptosi <2 x double> undef to <2 x i8>		; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r103 = fptosi <2 x double> undef to <2 x i8>
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r104 = fptoui <2 x double> undef to <2 x i16>		; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r104 = fptoui <2 x double> undef to <2 x i16>
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r105 = fptosi <2 x double> undef to <2 x i16>		; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r105 = fptosi <2 x double> undef to <2 x i16>
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r106 = fptoui <2 x double> undef to <2 x i32>		; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r106 = fptoui <2 x double> undef to <2 x i32>
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r107 = fptosi <2 x double> undef to <2 x i32>		; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r107 = fptosi <2 x double> undef to <2 x i32>
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r108 = fptoui <2 x double> undef to <2 x i64>		; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r108 = fptoui <2 x double> undef to <2 x i64>
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r109 = fptosi <2 x double> undef to <2 x i64>		; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r109 = fptosi <2 x double> undef to <2 x i64>
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %r110 = fptoui <4 x float> undef to <4 x i1>		; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %r110 = fptoui <4 x float> undef to <4 x i1>
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %r111 = fptosi <4 x float> undef to <4 x i1>		; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %r111 = fptosi <4 x float> undef to <4 x i1>
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r112 = fptoui <4 x float> undef to <4 x i8>		; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r112 = fptoui <4 x float> undef to <4 x i8>
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r113 = fptosi <4 x float> undef to <4 x i8>		; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r113 = fptosi <4 x float> undef to <4 x i8>
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r114 = fptoui <4 x float> undef to <4 x i16>		; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r114 = fptoui <4 x float> undef to <4 x i16>
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r115 = fptosi <4 x float> undef to <4 x i16>		; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r115 = fptosi <4 x float> undef to <4 x i16>
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r116 = fptoui <4 x float> undef to <4 x i32>		; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r116 = fptoui <4 x float> undef to <4 x i32>
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r117 = fptosi <4 x float> undef to <4 x i32>		; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r117 = fptosi <4 x float> undef to <4 x i32>
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r118 = fptoui <4 x float> undef to <4 x i64>		; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r118 = fptoui <4 x float> undef to <4 x i64>
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r119 = fptosi <4 x float> undef to <4 x i64>		; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r119 = fptosi <4 x float> undef to <4 x i64>
▲ Show 20 Lines • Show All 1,610 Lines • ▼ Show 20 Lines	;
store i16 %r4, ptr undef		store i16 %r4, ptr undef
%r5 = trunc i16 undef to i8		%r5 = trunc i16 undef to i8
store i8 %r5, ptr undef		store i8 %r5, ptr undef
ret i32 undef		ret i32 undef
}		}

define void @extend_extract() {		define void @extend_extract() {
; CHECK-LABEL: 'extend_extract'		; CHECK-LABEL: 'extend_extract'
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %e8 = extractelement <8 x i8> undef, i32 1		; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %e8 = extractelement <8 x i8> undef, i32 1
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %e16 = extractelement <8 x i16> undef, i32 1		; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %e16 = extractelement <8 x i16> undef, i32 1
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %e32 = extractelement <8 x i32> undef, i32 1		; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %e32 = extractelement <8 x i32> undef, i32 1
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s8_16 = sext i8 %e8 to i16		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s8_16 = sext i8 %e8 to i16
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %z8_16 = zext i8 %e8 to i16		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %z8_16 = zext i8 %e8 to i16
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s8_32 = sext i8 %e8 to i32		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s8_32 = sext i8 %e8 to i32
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %z8_32 = zext i8 %e8 to i32		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %z8_32 = zext i8 %e8 to i32
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s8_64 = sext i8 %e8 to i64		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s8_64 = sext i8 %e8 to i64
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %z8_64 = zext i8 %e8 to i64		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %z8_64 = zext i8 %e8 to i64
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s16_32 = sext i16 %e16 to i32		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s16_32 = sext i16 %e16 to i32
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %z16_32 = zext i16 %e16 to i32		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %z16_32 = zext i16 %e16 to i32
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s16_64 = sext i16 %e16 to i64		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s16_64 = sext i16 %e16 to i64
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %z16_64 = zext i16 %e16 to i64		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %z16_64 = zext i16 %e16 to i64
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s32_64 = sext i32 %e32 to i64		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s32_64 = sext i32 %e32 to i64
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %z32_64 = zext i32 %e32 to i64		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %z32_64 = zext i32 %e32 to i64
; CHECK-NEXT: Cost Model: Found an estimated cost of 13 for instruction: call void @use(i16 %s8_16, i16 %z8_16, i32 %s8_32, i32 %z8_32, i64 %s8_64, i64 %z8_64, i32 %s16_32, i32 %z16_32, i64 %s16_64, i64 %z16_64, i64 %s32_64, i64 %z32_64)		; CHECK-NEXT: Cost Model: Found an estimated cost of 13 for instruction: call void @use(i16 %s8_16, i16 %z8_16, i32 %s8_32, i32 %z8_32, i64 %s8_64, i64 %z8_64, i32 %s16_32, i32 %z16_32, i64 %s16_64, i64 %z16_64, i64 %s32_64, i64 %z32_64)
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
; SVE-LABEL: 'extend_extract'		; SVE-LABEL: 'extend_extract'
; SVE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %e8 = extractelement <8 x i8> undef, i32 1		; SVE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %e8 = extractelement <8 x i8> undef, i32 1
; SVE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %e16 = extractelement <8 x i16> undef, i32 1		; SVE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %e16 = extractelement <8 x i16> undef, i32 1
; SVE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %e32 = extractelement <8 x i32> undef, i32 1		; SVE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %e32 = extractelement <8 x i32> undef, i32 1
; SVE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s8_16 = sext i8 %e8 to i16		; SVE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s8_16 = sext i8 %e8 to i16
; SVE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %z8_16 = zext i8 %e8 to i16		; SVE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %z8_16 = zext i8 %e8 to i16
; SVE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s8_32 = sext i8 %e8 to i32		; SVE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s8_32 = sext i8 %e8 to i32
; SVE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %z8_32 = zext i8 %e8 to i32		; SVE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %z8_32 = zext i8 %e8 to i32
; SVE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s8_64 = sext i8 %e8 to i64		; SVE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s8_64 = sext i8 %e8 to i64
; SVE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %z8_64 = zext i8 %e8 to i64		; SVE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %z8_64 = zext i8 %e8 to i64
; SVE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s16_32 = sext i16 %e16 to i32		; SVE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s16_32 = sext i16 %e16 to i32
; SVE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %z16_32 = zext i16 %e16 to i32		; SVE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %z16_32 = zext i16 %e16 to i32
Show All 40 Lines
; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r90 = fptoui <2 x half> undef to <2 x i1>		; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r90 = fptoui <2 x half> undef to <2 x i1>
; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r91 = fptosi <2 x half> undef to <2 x i1>		; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r91 = fptosi <2 x half> undef to <2 x i1>
; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r92 = fptoui <2 x half> undef to <2 x i8>		; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r92 = fptoui <2 x half> undef to <2 x i8>
; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r93 = fptosi <2 x half> undef to <2 x i8>		; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r93 = fptosi <2 x half> undef to <2 x i8>
; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r94 = fptoui <2 x half> undef to <2 x i16>		; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r94 = fptoui <2 x half> undef to <2 x i16>
; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r95 = fptosi <2 x half> undef to <2 x i16>		; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r95 = fptosi <2 x half> undef to <2 x i16>
; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r96 = fptoui <2 x half> undef to <2 x i32>		; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r96 = fptoui <2 x half> undef to <2 x i32>
; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r97 = fptosi <2 x half> undef to <2 x i32>		; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r97 = fptosi <2 x half> undef to <2 x i32>
; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %r98 = fptoui <2 x half> undef to <2 x i64>		; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r98 = fptoui <2 x half> undef to <2 x i64>
; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %r99 = fptosi <2 x half> undef to <2 x i64>		; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r99 = fptosi <2 x half> undef to <2 x i64>
; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r110 = fptoui <4 x half> undef to <4 x i1>		; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r110 = fptoui <4 x half> undef to <4 x i1>
; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r111 = fptosi <4 x half> undef to <4 x i1>		; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r111 = fptosi <4 x half> undef to <4 x i1>
; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r112 = fptoui <4 x half> undef to <4 x i8>		; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r112 = fptoui <4 x half> undef to <4 x i8>
; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r113 = fptosi <4 x half> undef to <4 x i8>		; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r113 = fptosi <4 x half> undef to <4 x i8>
; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r114 = fptoui <4 x half> undef to <4 x i16>		; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r114 = fptoui <4 x half> undef to <4 x i16>
; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r115 = fptosi <4 x half> undef to <4 x i16>		; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r115 = fptosi <4 x half> undef to <4 x i16>
; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 22 for instruction: %r116 = fptoui <4 x half> undef to <4 x i32>		; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %r116 = fptoui <4 x half> undef to <4 x i32>
; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 22 for instruction: %r117 = fptosi <4 x half> undef to <4 x i32>		; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %r117 = fptosi <4 x half> undef to <4 x i32>
; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 17 for instruction: %r118 = fptoui <4 x half> undef to <4 x i64>		; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 21 for instruction: %r118 = fptoui <4 x half> undef to <4 x i64>
; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 17 for instruction: %r119 = fptosi <4 x half> undef to <4 x i64>		; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 21 for instruction: %r119 = fptosi <4 x half> undef to <4 x i64>
; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 64 for instruction: %r130 = fptoui <8 x half> undef to <8 x i1>		; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %r130 = fptoui <8 x half> undef to <8 x i1>
; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 64 for instruction: %r131 = fptosi <8 x half> undef to <8 x i1>		; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %r131 = fptosi <8 x half> undef to <8 x i1>
; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 50 for instruction: %r132 = fptoui <8 x half> undef to <8 x i8>		; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %r132 = fptoui <8 x half> undef to <8 x i8>
; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 50 for instruction: %r133 = fptosi <8 x half> undef to <8 x i8>		; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %r133 = fptosi <8 x half> undef to <8 x i8>
; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r134 = fptoui <8 x half> undef to <8 x i16>		; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r134 = fptoui <8 x half> undef to <8 x i16>
; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r135 = fptosi <8 x half> undef to <8 x i16>		; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r135 = fptosi <8 x half> undef to <8 x i16>
; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 45 for instruction: %r136 = fptoui <8 x half> undef to <8 x i32>		; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 41 for instruction: %r136 = fptoui <8 x half> undef to <8 x i32>
; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 45 for instruction: %r137 = fptosi <8 x half> undef to <8 x i32>		; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 41 for instruction: %r137 = fptosi <8 x half> undef to <8 x i32>
; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 35 for instruction: %r138 = fptoui <8 x half> undef to <8 x i64>		; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 43 for instruction: %r138 = fptoui <8 x half> undef to <8 x i64>
; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 35 for instruction: %r139 = fptosi <8 x half> undef to <8 x i64>		; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 43 for instruction: %r139 = fptosi <8 x half> undef to <8 x i64>
; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 129 for instruction: %r150 = fptoui <16 x half> undef to <16 x i1>		; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 81 for instruction: %r150 = fptoui <16 x half> undef to <16 x i1>
; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 129 for instruction: %r151 = fptosi <16 x half> undef to <16 x i1>		; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 81 for instruction: %r151 = fptosi <16 x half> undef to <16 x i1>
; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 101 for instruction: %r152 = fptoui <16 x half> undef to <16 x i8>		; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 81 for instruction: %r152 = fptoui <16 x half> undef to <16 x i8>
; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 101 for instruction: %r153 = fptosi <16 x half> undef to <16 x i8>		; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 81 for instruction: %r153 = fptosi <16 x half> undef to <16 x i8>
; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r154 = fptoui <16 x half> undef to <16 x i16>		; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r154 = fptoui <16 x half> undef to <16 x i16>
; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r155 = fptosi <16 x half> undef to <16 x i16>		; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r155 = fptosi <16 x half> undef to <16 x i16>
; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 90 for instruction: %r156 = fptoui <16 x half> undef to <16 x i32>		; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 82 for instruction: %r156 = fptoui <16 x half> undef to <16 x i32>
; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 90 for instruction: %r157 = fptosi <16 x half> undef to <16 x i32>		; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 82 for instruction: %r157 = fptosi <16 x half> undef to <16 x i32>
; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 70 for instruction: %r158 = fptoui <16 x half> undef to <16 x i64>		; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 86 for instruction: %r158 = fptoui <16 x half> undef to <16 x i64>
; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 70 for instruction: %r159 = fptosi <16 x half> undef to <16 x i64>		; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 86 for instruction: %r159 = fptosi <16 x half> undef to <16 x i64>
; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r250 = uitofp <8 x i1> undef to <8 x half>		; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r250 = uitofp <8 x i1> undef to <8 x half>
; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r251 = sitofp <8 x i1> undef to <8 x half>		; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r251 = sitofp <8 x i1> undef to <8 x half>
; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r252 = uitofp <8 x i8> undef to <8 x half>		; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r252 = uitofp <8 x i8> undef to <8 x half>
; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r253 = sitofp <8 x i8> undef to <8 x half>		; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r253 = sitofp <8 x i8> undef to <8 x half>
; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r254 = uitofp <8 x i16> undef to <8 x half>		; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r254 = uitofp <8 x i16> undef to <8 x half>
; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r255 = sitofp <8 x i16> undef to <8 x half>		; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r255 = sitofp <8 x i16> undef to <8 x half>
; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %r256 = uitofp <8 x i32> undef to <8 x half>		; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %r256 = uitofp <8 x i32> undef to <8 x half>
; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %r257 = sitofp <8 x i32> undef to <8 x half>		; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %r257 = sitofp <8 x i32> undef to <8 x half>
▲ Show 20 Lines • Show All 98 Lines • ▼ Show 20 Lines
; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r90 = fptoui <2 x half> undef to <2 x i1>		; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r90 = fptoui <2 x half> undef to <2 x i1>
; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r91 = fptosi <2 x half> undef to <2 x i1>		; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r91 = fptosi <2 x half> undef to <2 x i1>
; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r92 = fptoui <2 x half> undef to <2 x i8>		; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r92 = fptoui <2 x half> undef to <2 x i8>
; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r93 = fptosi <2 x half> undef to <2 x i8>		; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r93 = fptosi <2 x half> undef to <2 x i8>
; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r94 = fptoui <2 x half> undef to <2 x i16>		; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r94 = fptoui <2 x half> undef to <2 x i16>
; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r95 = fptosi <2 x half> undef to <2 x i16>		; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r95 = fptosi <2 x half> undef to <2 x i16>
; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r96 = fptoui <2 x half> undef to <2 x i32>		; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r96 = fptoui <2 x half> undef to <2 x i32>
; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r97 = fptosi <2 x half> undef to <2 x i32>		; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r97 = fptosi <2 x half> undef to <2 x i32>
; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %r98 = fptoui <2 x half> undef to <2 x i64>		; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r98 = fptoui <2 x half> undef to <2 x i64>
; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %r99 = fptosi <2 x half> undef to <2 x i64>		; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r99 = fptosi <2 x half> undef to <2 x i64>
; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r110 = fptoui <4 x half> undef to <4 x i1>		; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r110 = fptoui <4 x half> undef to <4 x i1>
; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r111 = fptosi <4 x half> undef to <4 x i1>		; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r111 = fptosi <4 x half> undef to <4 x i1>
; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r112 = fptoui <4 x half> undef to <4 x i8>		; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r112 = fptoui <4 x half> undef to <4 x i8>
; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r113 = fptosi <4 x half> undef to <4 x i8>		; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r113 = fptosi <4 x half> undef to <4 x i8>
; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r114 = fptoui <4 x half> undef to <4 x i16>		; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r114 = fptoui <4 x half> undef to <4 x i16>
; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r115 = fptosi <4 x half> undef to <4 x i16>		; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r115 = fptosi <4 x half> undef to <4 x i16>
; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r116 = fptoui <4 x half> undef to <4 x i32>		; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r116 = fptoui <4 x half> undef to <4 x i32>
; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r117 = fptosi <4 x half> undef to <4 x i32>		; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r117 = fptosi <4 x half> undef to <4 x i32>
; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 17 for instruction: %r118 = fptoui <4 x half> undef to <4 x i64>		; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 21 for instruction: %r118 = fptoui <4 x half> undef to <4 x i64>
; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 17 for instruction: %r119 = fptosi <4 x half> undef to <4 x i64>		; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 21 for instruction: %r119 = fptosi <4 x half> undef to <4 x i64>
; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 64 for instruction: %r130 = fptoui <8 x half> undef to <8 x i1>		; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %r130 = fptoui <8 x half> undef to <8 x i1>
; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 64 for instruction: %r131 = fptosi <8 x half> undef to <8 x i1>		; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %r131 = fptosi <8 x half> undef to <8 x i1>
; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r132 = fptoui <8 x half> undef to <8 x i8>		; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r132 = fptoui <8 x half> undef to <8 x i8>
; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r133 = fptosi <8 x half> undef to <8 x i8>		; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r133 = fptosi <8 x half> undef to <8 x i8>
; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r134 = fptoui <8 x half> undef to <8 x i16>		; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r134 = fptoui <8 x half> undef to <8 x i16>
; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r135 = fptosi <8 x half> undef to <8 x i16>		; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r135 = fptosi <8 x half> undef to <8 x i16>
; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %r136 = fptoui <8 x half> undef to <8 x i32>		; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %r136 = fptoui <8 x half> undef to <8 x i32>
; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %r137 = fptosi <8 x half> undef to <8 x i32>		; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %r137 = fptosi <8 x half> undef to <8 x i32>
; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 35 for instruction: %r138 = fptoui <8 x half> undef to <8 x i64>		; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 43 for instruction: %r138 = fptoui <8 x half> undef to <8 x i64>
; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 35 for instruction: %r139 = fptosi <8 x half> undef to <8 x i64>		; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 43 for instruction: %r139 = fptosi <8 x half> undef to <8 x i64>
; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 129 for instruction: %r150 = fptoui <16 x half> undef to <16 x i1>		; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 81 for instruction: %r150 = fptoui <16 x half> undef to <16 x i1>
; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 129 for instruction: %r151 = fptosi <16 x half> undef to <16 x i1>		; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 81 for instruction: %r151 = fptosi <16 x half> undef to <16 x i1>
; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %r152 = fptoui <16 x half> undef to <16 x i8>		; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %r152 = fptoui <16 x half> undef to <16 x i8>
; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %r153 = fptosi <16 x half> undef to <16 x i8>		; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %r153 = fptosi <16 x half> undef to <16 x i8>
; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r154 = fptoui <16 x half> undef to <16 x i16>		; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r154 = fptoui <16 x half> undef to <16 x i16>
; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r155 = fptosi <16 x half> undef to <16 x i16>		; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r155 = fptosi <16 x half> undef to <16 x i16>
; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %r156 = fptoui <16 x half> undef to <16 x i32>		; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %r156 = fptoui <16 x half> undef to <16 x i32>
; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %r157 = fptosi <16 x half> undef to <16 x i32>		; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %r157 = fptosi <16 x half> undef to <16 x i32>
; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 70 for instruction: %r158 = fptoui <16 x half> undef to <16 x i64>		; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 86 for instruction: %r158 = fptoui <16 x half> undef to <16 x i64>
; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 70 for instruction: %r159 = fptosi <16 x half> undef to <16 x i64>		; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 86 for instruction: %r159 = fptosi <16 x half> undef to <16 x i64>
; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r250 = uitofp <8 x i1> undef to <8 x half>		; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r250 = uitofp <8 x i1> undef to <8 x half>
; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r251 = sitofp <8 x i1> undef to <8 x half>		; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r251 = sitofp <8 x i1> undef to <8 x half>
; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r252 = uitofp <8 x i8> undef to <8 x half>		; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r252 = uitofp <8 x i8> undef to <8 x half>
; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r253 = sitofp <8 x i8> undef to <8 x half>		; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r253 = sitofp <8 x i8> undef to <8 x half>
; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r254 = uitofp <8 x i16> undef to <8 x half>		; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r254 = uitofp <8 x i16> undef to <8 x half>
; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r255 = sitofp <8 x i16> undef to <8 x half>		; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r255 = sitofp <8 x i16> undef to <8 x half>
; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %r256 = uitofp <8 x i32> undef to <8 x half>		; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %r256 = uitofp <8 x i32> undef to <8 x half>
; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %r257 = sitofp <8 x i32> undef to <8 x half>		; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %r257 = sitofp <8 x i32> undef to <8 x half>
Show All 25 Lines
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r90 = fptoui <2 x half> undef to <2 x i1>		; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r90 = fptoui <2 x half> undef to <2 x i1>
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r91 = fptosi <2 x half> undef to <2 x i1>		; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r91 = fptosi <2 x half> undef to <2 x i1>
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r92 = fptoui <2 x half> undef to <2 x i8>		; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r92 = fptoui <2 x half> undef to <2 x i8>
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r93 = fptosi <2 x half> undef to <2 x i8>		; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r93 = fptosi <2 x half> undef to <2 x i8>
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r94 = fptoui <2 x half> undef to <2 x i16>		; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r94 = fptoui <2 x half> undef to <2 x i16>
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r95 = fptosi <2 x half> undef to <2 x i16>		; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r95 = fptosi <2 x half> undef to <2 x i16>
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r96 = fptoui <2 x half> undef to <2 x i32>		; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r96 = fptoui <2 x half> undef to <2 x i32>
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r97 = fptosi <2 x half> undef to <2 x i32>		; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r97 = fptosi <2 x half> undef to <2 x i32>
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %r98 = fptoui <2 x half> undef to <2 x i64>		; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r98 = fptoui <2 x half> undef to <2 x i64>
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %r99 = fptosi <2 x half> undef to <2 x i64>		; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r99 = fptosi <2 x half> undef to <2 x i64>
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r110 = fptoui <4 x half> undef to <4 x i1>		; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r110 = fptoui <4 x half> undef to <4 x i1>
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r111 = fptosi <4 x half> undef to <4 x i1>		; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r111 = fptosi <4 x half> undef to <4 x i1>
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r112 = fptoui <4 x half> undef to <4 x i8>		; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r112 = fptoui <4 x half> undef to <4 x i8>
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r113 = fptosi <4 x half> undef to <4 x i8>		; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r113 = fptosi <4 x half> undef to <4 x i8>
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r114 = fptoui <4 x half> undef to <4 x i16>		; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r114 = fptoui <4 x half> undef to <4 x i16>
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r115 = fptosi <4 x half> undef to <4 x i16>		; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r115 = fptosi <4 x half> undef to <4 x i16>
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r116 = fptoui <4 x half> undef to <4 x i32>		; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r116 = fptoui <4 x half> undef to <4 x i32>
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r117 = fptosi <4 x half> undef to <4 x i32>		; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r117 = fptosi <4 x half> undef to <4 x i32>
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r118 = fptoui <4 x half> undef to <4 x i64>		; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r118 = fptoui <4 x half> undef to <4 x i64>
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r119 = fptosi <4 x half> undef to <4 x i64>		; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r119 = fptosi <4 x half> undef to <4 x i64>
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 64 for instruction: %r130 = fptoui <8 x half> undef to <8 x i1>		; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %r130 = fptoui <8 x half> undef to <8 x i1>
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 64 for instruction: %r131 = fptosi <8 x half> undef to <8 x i1>		; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %r131 = fptosi <8 x half> undef to <8 x i1>
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r132 = fptoui <8 x half> undef to <8 x i8>		; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r132 = fptoui <8 x half> undef to <8 x i8>
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r133 = fptosi <8 x half> undef to <8 x i8>		; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r133 = fptosi <8 x half> undef to <8 x i8>
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r134 = fptoui <8 x half> undef to <8 x i16>		; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r134 = fptoui <8 x half> undef to <8 x i16>
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r135 = fptosi <8 x half> undef to <8 x i16>		; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r135 = fptosi <8 x half> undef to <8 x i16>
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r136 = fptoui <8 x half> undef to <8 x i32>		; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r136 = fptoui <8 x half> undef to <8 x i32>
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r137 = fptosi <8 x half> undef to <8 x i32>		; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r137 = fptosi <8 x half> undef to <8 x i32>
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r138 = fptoui <8 x half> undef to <8 x i64>		; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r138 = fptoui <8 x half> undef to <8 x i64>
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r139 = fptosi <8 x half> undef to <8 x i64>		; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r139 = fptosi <8 x half> undef to <8 x i64>
▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r90 = fptoui <2 x half> undef to <2 x i1>		; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r90 = fptoui <2 x half> undef to <2 x i1>
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r91 = fptosi <2 x half> undef to <2 x i1>		; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r91 = fptosi <2 x half> undef to <2 x i1>
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r92 = fptoui <2 x half> undef to <2 x i8>		; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r92 = fptoui <2 x half> undef to <2 x i8>
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r93 = fptosi <2 x half> undef to <2 x i8>		; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r93 = fptosi <2 x half> undef to <2 x i8>
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r94 = fptoui <2 x half> undef to <2 x i16>		; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r94 = fptoui <2 x half> undef to <2 x i16>
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r95 = fptosi <2 x half> undef to <2 x i16>		; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r95 = fptosi <2 x half> undef to <2 x i16>
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r96 = fptoui <2 x half> undef to <2 x i32>		; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r96 = fptoui <2 x half> undef to <2 x i32>
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r97 = fptosi <2 x half> undef to <2 x i32>		; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r97 = fptosi <2 x half> undef to <2 x i32>
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %r98 = fptoui <2 x half> undef to <2 x i64>		; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r98 = fptoui <2 x half> undef to <2 x i64>
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %r99 = fptosi <2 x half> undef to <2 x i64>		; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r99 = fptosi <2 x half> undef to <2 x i64>
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r110 = fptoui <4 x half> undef to <4 x i1>		; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r110 = fptoui <4 x half> undef to <4 x i1>
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r111 = fptosi <4 x half> undef to <4 x i1>		; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r111 = fptosi <4 x half> undef to <4 x i1>
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r112 = fptoui <4 x half> undef to <4 x i8>		; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r112 = fptoui <4 x half> undef to <4 x i8>
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r113 = fptosi <4 x half> undef to <4 x i8>		; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r113 = fptosi <4 x half> undef to <4 x i8>
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r114 = fptoui <4 x half> undef to <4 x i16>		; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r114 = fptoui <4 x half> undef to <4 x i16>
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r115 = fptosi <4 x half> undef to <4 x i16>		; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r115 = fptosi <4 x half> undef to <4 x i16>
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r116 = fptoui <4 x half> undef to <4 x i32>		; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r116 = fptoui <4 x half> undef to <4 x i32>
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r117 = fptosi <4 x half> undef to <4 x i32>		; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r117 = fptosi <4 x half> undef to <4 x i32>
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r118 = fptoui <4 x half> undef to <4 x i64>		; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r118 = fptoui <4 x half> undef to <4 x i64>
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r119 = fptosi <4 x half> undef to <4 x i64>		; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r119 = fptosi <4 x half> undef to <4 x i64>
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 64 for instruction: %r130 = fptoui <8 x half> undef to <8 x i1>		; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %r130 = fptoui <8 x half> undef to <8 x i1>
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 64 for instruction: %r131 = fptosi <8 x half> undef to <8 x i1>		; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %r131 = fptosi <8 x half> undef to <8 x i1>
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r132 = fptoui <8 x half> undef to <8 x i8>		; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r132 = fptoui <8 x half> undef to <8 x i8>
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r133 = fptosi <8 x half> undef to <8 x i8>		; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r133 = fptosi <8 x half> undef to <8 x i8>
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r134 = fptoui <8 x half> undef to <8 x i16>		; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r134 = fptoui <8 x half> undef to <8 x i16>
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r135 = fptosi <8 x half> undef to <8 x i16>		; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r135 = fptosi <8 x half> undef to <8 x i16>
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r136 = fptoui <8 x half> undef to <8 x i32>		; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r136 = fptoui <8 x half> undef to <8 x i32>
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r137 = fptosi <8 x half> undef to <8 x i32>		; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r137 = fptosi <8 x half> undef to <8 x i32>
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r138 = fptoui <8 x half> undef to <8 x i64>		; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r138 = fptoui <8 x half> undef to <8 x i64>
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r139 = fptosi <8 x half> undef to <8 x i64>		; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r139 = fptosi <8 x half> undef to <8 x i64>
▲ Show 20 Lines • Show All 328 Lines • Show Last 20 Lines

llvm/test/Analysis/CostModel/AArch64/cmp.ll

	Show All 11 Lines
	; CHECK-THROUGHPUT-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %c64 = icmp ne i64 undef, undef			; CHECK-THROUGHPUT-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %c64 = icmp ne i64 undef, undef
	; CHECK-THROUGHPUT-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %c128 = icmp ult i128 undef, undef			; CHECK-THROUGHPUT-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %c128 = icmp ult i128 undef, undef
	; CHECK-THROUGHPUT-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %cv16i8 = icmp slt <16 x i8> undef, undef			; CHECK-THROUGHPUT-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %cv16i8 = icmp slt <16 x i8> undef, undef
	; CHECK-THROUGHPUT-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %cv8i16 = icmp ult <8 x i16> undef, undef			; CHECK-THROUGHPUT-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %cv8i16 = icmp ult <8 x i16> undef, undef
	; CHECK-THROUGHPUT-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %cv4i32 = icmp sge <4 x i32> undef, undef			; CHECK-THROUGHPUT-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %cv4i32 = icmp sge <4 x i32> undef, undef
	; CHECK-THROUGHPUT-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %cf16 = fcmp oge half undef, undef			; CHECK-THROUGHPUT-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %cf16 = fcmp oge half undef, undef
	; CHECK-THROUGHPUT-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %cf32 = fcmp ogt float undef, undef			; CHECK-THROUGHPUT-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %cf32 = fcmp ogt float undef, undef
	; CHECK-THROUGHPUT-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %cf64 = fcmp ogt double undef, undef			; CHECK-THROUGHPUT-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %cf64 = fcmp ogt double undef, undef
	; CHECK-THROUGHPUT-NEXT: Cost Model: Found an estimated cost of 29 for instruction: %cfv816 = fcmp olt <8 x half> undef, undef			; CHECK-THROUGHPUT-NEXT: Cost Model: Found an estimated cost of 22 for instruction: %cfv816 = fcmp olt <8 x half> undef, undef
	; CHECK-THROUGHPUT-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %cfv432 = fcmp oge <4 x float> undef, undef			; CHECK-THROUGHPUT-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %cfv432 = fcmp oge <4 x float> undef, undef
	; CHECK-THROUGHPUT-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %cfv264 = fcmp oge <2 x double> undef, undef			; CHECK-THROUGHPUT-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %cfv264 = fcmp oge <2 x double> undef, undef
	; CHECK-THROUGHPUT-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void			; CHECK-THROUGHPUT-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	;			;
	; CHECK-SIZE-LABEL: 'cmps'			; CHECK-SIZE-LABEL: 'cmps'
	; CHECK-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %c8 = icmp slt i8 undef, undef			; CHECK-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %c8 = icmp slt i8 undef, undef
	; CHECK-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %c16 = icmp ult i16 undef, undef			; CHECK-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %c16 = icmp ult i16 undef, undef
	; CHECK-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %c32 = icmp sge i32 undef, undef			; CHECK-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %c32 = icmp sge i32 undef, undef
	▲ Show 20 Lines • Show All 95 Lines • Show Last 20 Lines

llvm/test/Analysis/CostModel/AArch64/ctlz.ll

	Show First 20 Lines • Show All 49 Lines • ▼ Show 20 Lines
	declare i16 @llvm.ctlz.i16(i16)			declare i16 @llvm.ctlz.i16(i16)
	declare i8 @llvm.ctlz.i8(i8)			declare i8 @llvm.ctlz.i8(i8)

	; Verify the cost of vector ctlz instructions.			; Verify the cost of vector ctlz instructions.

	define <2 x i64> @test_ctlz_v2i64(<2 x i64> %a) {			define <2 x i64> @test_ctlz_v2i64(<2 x i64> %a) {
	;			;
	; CHECK-LABEL: 'test_ctlz_v2i64'			; CHECK-LABEL: 'test_ctlz_v2i64'
	; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %ctlz = call <2 x i64> @llvm.ctlz.v2i64(<2 x i64> %a, i1 false)			; CHECK-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %ctlz = call <2 x i64> @llvm.ctlz.v2i64(<2 x i64> %a, i1 false)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <2 x i64> %ctlz			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <2 x i64> %ctlz
	;			;
	%ctlz = call <2 x i64> @llvm.ctlz.v2i64(<2 x i64> %a, i1 false)			%ctlz = call <2 x i64> @llvm.ctlz.v2i64(<2 x i64> %a, i1 false)
	ret <2 x i64> %ctlz			ret <2 x i64> %ctlz
	}			}

	define <2 x i32> @test_ctlz_v2i32(<2 x i32> %a) {			define <2 x i32> @test_ctlz_v2i32(<2 x i32> %a) {
	;			;
	▲ Show 20 Lines • Show All 82 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <16 x i8> %ctlz			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <16 x i8> %ctlz
	;			;
	%ctlz = call <16 x i8> @llvm.ctlz.v16i8(<16 x i8> %a, i1 false)			%ctlz = call <16 x i8> @llvm.ctlz.v16i8(<16 x i8> %a, i1 false)
	ret <16 x i8> %ctlz			ret <16 x i8> %ctlz
	}			}

	define <4 x i64> @test_ctlz_v4i64(<4 x i64> %a) {			define <4 x i64> @test_ctlz_v4i64(<4 x i64> %a) {
	; CHECK-LABEL: 'test_ctlz_v4i64'			; CHECK-LABEL: 'test_ctlz_v4i64'
	; CHECK-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %ctlz = call <4 x i64> @llvm.ctlz.v4i64(<4 x i64> %a, i1 false)			; CHECK-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %ctlz = call <4 x i64> @llvm.ctlz.v4i64(<4 x i64> %a, i1 false)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i64> %ctlz			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i64> %ctlz
	;			;
	%ctlz = call <4 x i64> @llvm.ctlz.v4i64(<4 x i64> %a, i1 false)			%ctlz = call <4 x i64> @llvm.ctlz.v4i64(<4 x i64> %a, i1 false)
	ret <4 x i64> %ctlz			ret <4 x i64> %ctlz
	}			}

	define <8 x i32> @test_ctlz_v8i32(<8 x i32> %a) {			define <8 x i32> @test_ctlz_v8i32(<8 x i32> %a) {
	; CHECK-LABEL: 'test_ctlz_v8i32'			; CHECK-LABEL: 'test_ctlz_v8i32'
	▲ Show 20 Lines • Show All 41 Lines • Show Last 20 Lines

llvm/test/Analysis/CostModel/AArch64/cttz.ll

	Show First 20 Lines • Show All 49 Lines • ▼ Show 20 Lines
	declare i16 @llvm.cttz.i16(i16)			declare i16 @llvm.cttz.i16(i16)
	declare i8 @llvm.cttz.i8(i8)			declare i8 @llvm.cttz.i8(i8)

	; Verify the cost of vector cttz instructions.			; Verify the cost of vector cttz instructions.

	define <2 x i64> @test_cttz_v2i64(<2 x i64> %a) {			define <2 x i64> @test_cttz_v2i64(<2 x i64> %a) {
	;			;
	; CHECK-LABEL: 'test_cttz_v2i64'			; CHECK-LABEL: 'test_cttz_v2i64'
	; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %cttz = call <2 x i64> @llvm.cttz.v2i64(<2 x i64> %a, i1 true)			; CHECK-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %cttz = call <2 x i64> @llvm.cttz.v2i64(<2 x i64> %a, i1 true)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <2 x i64> %cttz			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <2 x i64> %cttz
	;			;
	%cttz = call <2 x i64> @llvm.cttz.v2i64(<2 x i64> %a, i1 true)			%cttz = call <2 x i64> @llvm.cttz.v2i64(<2 x i64> %a, i1 true)
	ret <2 x i64> %cttz			ret <2 x i64> %cttz
	}			}

	define <2 x i32> @test_cttz_v2i32(<2 x i32> %a) {			define <2 x i32> @test_cttz_v2i32(<2 x i32> %a) {
	;			;
	; CHECK-LABEL: 'test_cttz_v2i32'			; CHECK-LABEL: 'test_cttz_v2i32'
	; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %cttz = call <2 x i32> @llvm.cttz.v2i32(<2 x i32> %a, i1 true)			; CHECK-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %cttz = call <2 x i32> @llvm.cttz.v2i32(<2 x i32> %a, i1 true)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <2 x i32> %cttz			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <2 x i32> %cttz
	;			;
	%cttz = call <2 x i32> @llvm.cttz.v2i32(<2 x i32> %a, i1 true)			%cttz = call <2 x i32> @llvm.cttz.v2i32(<2 x i32> %a, i1 true)
	ret <2 x i32> %cttz			ret <2 x i32> %cttz
	}			}

	define <4 x i32> @test_cttz_v4i32(<4 x i32> %a) {			define <4 x i32> @test_cttz_v4i32(<4 x i32> %a) {
	;			;
	; CHECK-LABEL: 'test_cttz_v4i32'			; CHECK-LABEL: 'test_cttz_v4i32'
	; CHECK-NEXT: Cost Model: Found an estimated cost of 22 for instruction: %cttz = call <4 x i32> @llvm.cttz.v4i32(<4 x i32> %a, i1 true)			; CHECK-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %cttz = call <4 x i32> @llvm.cttz.v4i32(<4 x i32> %a, i1 true)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i32> %cttz			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i32> %cttz
	;			;
	%cttz = call <4 x i32> @llvm.cttz.v4i32(<4 x i32> %a, i1 true)			%cttz = call <4 x i32> @llvm.cttz.v4i32(<4 x i32> %a, i1 true)
	ret <4 x i32> %cttz			ret <4 x i32> %cttz
	}			}

	define <2 x i16> @test_cttz_v2i16(<2 x i16> %a) {			define <2 x i16> @test_cttz_v2i16(<2 x i16> %a) {
	;			;
	; CHECK-LABEL: 'test_cttz_v2i16'			; CHECK-LABEL: 'test_cttz_v2i16'
	; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %cttz = call <2 x i16> @llvm.cttz.v2i16(<2 x i16> %a, i1 true)			; CHECK-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %cttz = call <2 x i16> @llvm.cttz.v2i16(<2 x i16> %a, i1 true)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <2 x i16> %cttz			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <2 x i16> %cttz
	;			;
	%cttz = call <2 x i16> @llvm.cttz.v2i16(<2 x i16> %a, i1 true)			%cttz = call <2 x i16> @llvm.cttz.v2i16(<2 x i16> %a, i1 true)
	ret <2 x i16> %cttz			ret <2 x i16> %cttz
	}			}

	define <4 x i16> @test_cttz_v4i16(<4 x i16> %a) {			define <4 x i16> @test_cttz_v4i16(<4 x i16> %a) {
	;			;
	; CHECK-LABEL: 'test_cttz_v4i16'			; CHECK-LABEL: 'test_cttz_v4i16'
	; CHECK-NEXT: Cost Model: Found an estimated cost of 22 for instruction: %cttz = call <4 x i16> @llvm.cttz.v4i16(<4 x i16> %a, i1 true)			; CHECK-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %cttz = call <4 x i16> @llvm.cttz.v4i16(<4 x i16> %a, i1 true)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i16> %cttz			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i16> %cttz
	;			;
	%cttz = call <4 x i16> @llvm.cttz.v4i16(<4 x i16> %a, i1 true)			%cttz = call <4 x i16> @llvm.cttz.v4i16(<4 x i16> %a, i1 true)
	ret <4 x i16> %cttz			ret <4 x i16> %cttz
	}			}

	define <8 x i16> @test_cttz_v8i16(<8 x i16> %a) {			define <8 x i16> @test_cttz_v8i16(<8 x i16> %a) {
	;			;
	; CHECK-LABEL: 'test_cttz_v8i16'			; CHECK-LABEL: 'test_cttz_v8i16'
	; CHECK-NEXT: Cost Model: Found an estimated cost of 50 for instruction: %cttz = call <8 x i16> @llvm.cttz.v8i16(<8 x i16> %a, i1 true)			; CHECK-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %cttz = call <8 x i16> @llvm.cttz.v8i16(<8 x i16> %a, i1 true)
				efriedmaUnsubmitted Not Done Reply Inline Actions Worth noting the actual cost here is 4 instructions. (We don't scalarize it; we lower using `cnt`.) efriedma: Worth noting the actual cost here is 4 instructions. (We don't scalarize it; we lower using…
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <8 x i16> %cttz			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <8 x i16> %cttz
	;			;
	%cttz = call <8 x i16> @llvm.cttz.v8i16(<8 x i16> %a, i1 true)			%cttz = call <8 x i16> @llvm.cttz.v8i16(<8 x i16> %a, i1 true)
	ret <8 x i16> %cttz			ret <8 x i16> %cttz
	}			}

	define <2 x i8> @test_cttz_v2i8(<2 x i8> %a) {			define <2 x i8> @test_cttz_v2i8(<2 x i8> %a) {
	;			;
	; CHECK-LABEL: 'test_cttz_v2i8'			; CHECK-LABEL: 'test_cttz_v2i8'
	; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %cttz = call <2 x i8> @llvm.cttz.v2i8(<2 x i8> %a, i1 true)			; CHECK-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %cttz = call <2 x i8> @llvm.cttz.v2i8(<2 x i8> %a, i1 true)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <2 x i8> %cttz			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <2 x i8> %cttz
	;			;
	%cttz = call <2 x i8> @llvm.cttz.v2i8(<2 x i8> %a, i1 true)			%cttz = call <2 x i8> @llvm.cttz.v2i8(<2 x i8> %a, i1 true)
	ret <2 x i8> %cttz			ret <2 x i8> %cttz
	}			}

	define <4 x i8> @test_cttz_v4i8(<4 x i8> %a) {			define <4 x i8> @test_cttz_v4i8(<4 x i8> %a) {
	;			;
	; CHECK-LABEL: 'test_cttz_v4i8'			; CHECK-LABEL: 'test_cttz_v4i8'
	; CHECK-NEXT: Cost Model: Found an estimated cost of 22 for instruction: %cttz = call <4 x i8> @llvm.cttz.v4i8(<4 x i8> %a, i1 true)			; CHECK-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %cttz = call <4 x i8> @llvm.cttz.v4i8(<4 x i8> %a, i1 true)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i8> %cttz			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i8> %cttz
	;			;
	%cttz = call <4 x i8> @llvm.cttz.v4i8(<4 x i8> %a, i1 true)			%cttz = call <4 x i8> @llvm.cttz.v4i8(<4 x i8> %a, i1 true)
	ret <4 x i8> %cttz			ret <4 x i8> %cttz
	}			}

	define <8 x i8> @test_cttz_v8i8(<8 x i8> %a) {			define <8 x i8> @test_cttz_v8i8(<8 x i8> %a) {
	;			;
	; CHECK-LABEL: 'test_cttz_v8i8'			; CHECK-LABEL: 'test_cttz_v8i8'
	; CHECK-NEXT: Cost Model: Found an estimated cost of 50 for instruction: %cttz = call <8 x i8> @llvm.cttz.v8i8(<8 x i8> %a, i1 true)			; CHECK-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %cttz = call <8 x i8> @llvm.cttz.v8i8(<8 x i8> %a, i1 true)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <8 x i8> %cttz			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <8 x i8> %cttz
	;			;
	%cttz = call <8 x i8> @llvm.cttz.v8i8(<8 x i8> %a, i1 true)			%cttz = call <8 x i8> @llvm.cttz.v8i8(<8 x i8> %a, i1 true)
	ret <8 x i8> %cttz			ret <8 x i8> %cttz
	}			}

	define <16 x i8> @test_cttz_v16i8(<16 x i8> %a) {			define <16 x i8> @test_cttz_v16i8(<16 x i8> %a) {
	;			;
	; CHECK-LABEL: 'test_cttz_v16i8'			; CHECK-LABEL: 'test_cttz_v16i8'
	; CHECK-NEXT: Cost Model: Found an estimated cost of 106 for instruction: %cttz = call <16 x i8> @llvm.cttz.v16i8(<16 x i8> %a, i1 true)			; CHECK-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %cttz = call <16 x i8> @llvm.cttz.v16i8(<16 x i8> %a, i1 true)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <16 x i8> %cttz			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <16 x i8> %cttz
	;			;
	%cttz = call <16 x i8> @llvm.cttz.v16i8(<16 x i8> %a, i1 true)			%cttz = call <16 x i8> @llvm.cttz.v16i8(<16 x i8> %a, i1 true)
	ret <16 x i8> %cttz			ret <16 x i8> %cttz
	}			}

	define <4 x i64> @test_cttz_v4i64(<4 x i64> %a) {			define <4 x i64> @test_cttz_v4i64(<4 x i64> %a) {
	;			;
	; CHECK-LABEL: 'test_cttz_v4i64'			; CHECK-LABEL: 'test_cttz_v4i64'
	; CHECK-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %cttz = call <4 x i64> @llvm.cttz.v4i64(<4 x i64> %a, i1 true)			; CHECK-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %cttz = call <4 x i64> @llvm.cttz.v4i64(<4 x i64> %a, i1 true)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i64> %cttz			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i64> %cttz
	;			;
	%cttz = call <4 x i64> @llvm.cttz.v4i64(<4 x i64> %a, i1 true)			%cttz = call <4 x i64> @llvm.cttz.v4i64(<4 x i64> %a, i1 true)
	ret <4 x i64> %cttz			ret <4 x i64> %cttz
	}			}

	define <8 x i32> @test_cttz_v8i32(<8 x i32> %a) {			define <8 x i32> @test_cttz_v8i32(<8 x i32> %a) {
	;			;
	; CHECK-LABEL: 'test_cttz_v8i32'			; CHECK-LABEL: 'test_cttz_v8i32'
	; CHECK-NEXT: Cost Model: Found an estimated cost of 44 for instruction: %cttz = call <8 x i32> @llvm.cttz.v8i32(<8 x i32> %a, i1 true)			; CHECK-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %cttz = call <8 x i32> @llvm.cttz.v8i32(<8 x i32> %a, i1 true)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <8 x i32> %cttz			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <8 x i32> %cttz
	;			;
	%cttz = call <8 x i32> @llvm.cttz.v8i32(<8 x i32> %a, i1 true)			%cttz = call <8 x i32> @llvm.cttz.v8i32(<8 x i32> %a, i1 true)
	ret <8 x i32> %cttz			ret <8 x i32> %cttz
	}			}

	define <16 x i16> @test_cttz_v16i16(<16 x i16> %a) {			define <16 x i16> @test_cttz_v16i16(<16 x i16> %a) {
	;			;
	; CHECK-LABEL: 'test_cttz_v16i16'			; CHECK-LABEL: 'test_cttz_v16i16'
	; CHECK-NEXT: Cost Model: Found an estimated cost of 100 for instruction: %cttz = call <16 x i16> @llvm.cttz.v16i16(<16 x i16> %a, i1 true)			; CHECK-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %cttz = call <16 x i16> @llvm.cttz.v16i16(<16 x i16> %a, i1 true)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <16 x i16> %cttz			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <16 x i16> %cttz
	;			;
	%cttz = call <16 x i16> @llvm.cttz.v16i16(<16 x i16> %a, i1 true)			%cttz = call <16 x i16> @llvm.cttz.v16i16(<16 x i16> %a, i1 true)
	ret <16 x i16> %cttz			ret <16 x i16> %cttz
	}			}

	define <32 x i8> @test_cttz_v32i8(<32 x i8> %a) {			define <32 x i8> @test_cttz_v32i8(<32 x i8> %a) {
	;			;
	; CHECK-LABEL: 'test_cttz_v32i8'			; CHECK-LABEL: 'test_cttz_v32i8'
	; CHECK-NEXT: Cost Model: Found an estimated cost of 212 for instruction: %cttz = call <32 x i8> @llvm.cttz.v32i8(<32 x i8> %a, i1 true)			; CHECK-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %cttz = call <32 x i8> @llvm.cttz.v32i8(<32 x i8> %a, i1 true)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <32 x i8> %cttz			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <32 x i8> %cttz
	;			;
	%cttz = call <32 x i8> @llvm.cttz.v32i8(<32 x i8> %a, i1 true)			%cttz = call <32 x i8> @llvm.cttz.v32i8(<32 x i8> %a, i1 true)
	ret <32 x i8> %cttz			ret <32 x i8> %cttz
	}			}

	declare <2 x i64> @llvm.cttz.v2i64(<2 x i64>, i1)			declare <2 x i64> @llvm.cttz.v2i64(<2 x i64>, i1)
	declare <2 x i32> @llvm.cttz.v2i32(<2 x i32>, i1)			declare <2 x i32> @llvm.cttz.v2i32(<2 x i32>, i1)
	Show All 13 Lines

llvm/test/Analysis/CostModel/AArch64/div.ll

; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py
; RUN: opt < %s -mtriple=aarch64-unknown-linux-gnu -passes="print<cost-model>" 2>&1 -disable-output \| FileCheck %s		; RUN: opt < %s -mtriple=aarch64-unknown-linux-gnu -passes="print<cost-model>" 2>&1 -disable-output \| FileCheck %s

target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"		target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"

define i32 @sdiv() {		define i32 @sdiv() {
; CHECK-LABEL: 'sdiv'		; CHECK-LABEL: 'sdiv'
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I64 = sdiv i64 undef, undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I64 = sdiv i64 undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %V2i64 = sdiv <2 x i64> undef, undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %V2i64 = sdiv <2 x i64> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 48 for instruction: %V4i64 = sdiv <4 x i64> undef, undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 56 for instruction: %V4i64 = sdiv <4 x i64> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 96 for instruction: %V8i64 = sdiv <8 x i64> undef, undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 112 for instruction: %V8i64 = sdiv <8 x i64> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I32 = sdiv i32 undef, undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I32 = sdiv i32 undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 52 for instruction: %V4i32 = sdiv <4 x i32> undef, undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 48 for instruction: %V4i32 = sdiv <4 x i32> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 104 for instruction: %V8i32 = sdiv <8 x i32> undef, undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 96 for instruction: %V8i32 = sdiv <8 x i32> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 208 for instruction: %V16i32 = sdiv <16 x i32> undef, undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 192 for instruction: %V16i32 = sdiv <16 x i32> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I16 = sdiv i16 undef, undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I16 = sdiv i16 undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 108 for instruction: %V8i16 = sdiv <8 x i16> undef, undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 88 for instruction: %V8i16 = sdiv <8 x i16> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 216 for instruction: %V16i16 = sdiv <16 x i16> undef, undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 176 for instruction: %V16i16 = sdiv <16 x i16> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 432 for instruction: %V32i16 = sdiv <32 x i16> undef, undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 352 for instruction: %V32i16 = sdiv <32 x i16> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I8 = sdiv i8 undef, undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I8 = sdiv i8 undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 220 for instruction: %V16i8 = sdiv <16 x i8> undef, undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 168 for instruction: %V16i8 = sdiv <16 x i8> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 440 for instruction: %V32i8 = sdiv <32 x i8> undef, undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 336 for instruction: %V32i8 = sdiv <32 x i8> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 880 for instruction: %V64i8 = sdiv <64 x i8> undef, undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 672 for instruction: %V64i8 = sdiv <64 x i8> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
;		;
%I64 = sdiv i64 undef, undef		%I64 = sdiv i64 undef, undef
%V2i64 = sdiv <2 x i64> undef, undef		%V2i64 = sdiv <2 x i64> undef, undef
%V4i64 = sdiv <4 x i64> undef, undef		%V4i64 = sdiv <4 x i64> undef, undef
%V8i64 = sdiv <8 x i64> undef, undef		%V8i64 = sdiv <8 x i64> undef, undef

%I32 = sdiv i32 undef, undef		%I32 = sdiv i32 undef, undef
Show All 12 Lines	;
%V64i8 = sdiv <64 x i8> undef, undef		%V64i8 = sdiv <64 x i8> undef, undef

ret i32 undef		ret i32 undef
}		}

define i32 @udiv() {		define i32 @udiv() {
; CHECK-LABEL: 'udiv'		; CHECK-LABEL: 'udiv'
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I64 = udiv i64 undef, undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I64 = udiv i64 undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %V2i64 = udiv <2 x i64> undef, undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %V2i64 = udiv <2 x i64> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 48 for instruction: %V4i64 = udiv <4 x i64> undef, undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 56 for instruction: %V4i64 = udiv <4 x i64> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 96 for instruction: %V8i64 = udiv <8 x i64> undef, undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 112 for instruction: %V8i64 = udiv <8 x i64> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I32 = udiv i32 undef, undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I32 = udiv i32 undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 52 for instruction: %V4i32 = udiv <4 x i32> undef, undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 48 for instruction: %V4i32 = udiv <4 x i32> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 104 for instruction: %V8i32 = udiv <8 x i32> undef, undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 96 for instruction: %V8i32 = udiv <8 x i32> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 208 for instruction: %V16i32 = udiv <16 x i32> undef, undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 192 for instruction: %V16i32 = udiv <16 x i32> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I16 = udiv i16 undef, undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I16 = udiv i16 undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 108 for instruction: %V8i16 = udiv <8 x i16> undef, undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 88 for instruction: %V8i16 = udiv <8 x i16> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 216 for instruction: %V16i16 = udiv <16 x i16> undef, undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 176 for instruction: %V16i16 = udiv <16 x i16> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 432 for instruction: %V32i16 = udiv <32 x i16> undef, undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 352 for instruction: %V32i16 = udiv <32 x i16> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I8 = udiv i8 undef, undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I8 = udiv i8 undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 220 for instruction: %V16i8 = udiv <16 x i8> undef, undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 168 for instruction: %V16i8 = udiv <16 x i8> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 440 for instruction: %V32i8 = udiv <32 x i8> undef, undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 336 for instruction: %V32i8 = udiv <32 x i8> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 880 for instruction: %V64i8 = udiv <64 x i8> undef, undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 672 for instruction: %V64i8 = udiv <64 x i8> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
;		;
%I64 = udiv i64 undef, undef		%I64 = udiv i64 undef, undef
%V2i64 = udiv <2 x i64> undef, undef		%V2i64 = udiv <2 x i64> undef, undef
%V4i64 = udiv <4 x i64> undef, undef		%V4i64 = udiv <4 x i64> undef, undef
%V8i64 = udiv <8 x i64> undef, undef		%V8i64 = udiv <8 x i64> undef, undef

%I32 = udiv i32 undef, undef		%I32 = udiv i32 undef, undef
Show All 12 Lines	;
%V64i8 = udiv <64 x i8> undef, undef		%V64i8 = udiv <64 x i8> undef, undef

ret i32 undef		ret i32 undef
}		}

define i32 @sdiv_const() {		define i32 @sdiv_const() {
; CHECK-LABEL: 'sdiv_const'		; CHECK-LABEL: 'sdiv_const'
; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %I64 = sdiv i64 undef, 7		; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %I64 = sdiv i64 undef, 7
; CHECK-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %V2i64 = sdiv <2 x i64> undef, <i64 6, i64 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %V2i64 = sdiv <2 x i64> undef, <i64 6, i64 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 48 for instruction: %V4i64 = sdiv <4 x i64> undef, <i64 4, i64 5, i64 6, i64 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 56 for instruction: %V4i64 = sdiv <4 x i64> undef, <i64 4, i64 5, i64 6, i64 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 96 for instruction: %V8i64 = sdiv <8 x i64> undef, <i64 4, i64 5, i64 6, i64 7, i64 8, i64 9, i64 10, i64 11>		; CHECK-NEXT: Cost Model: Found an estimated cost of 112 for instruction: %V8i64 = sdiv <8 x i64> undef, <i64 4, i64 5, i64 6, i64 7, i64 8, i64 9, i64 10, i64 11>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I32 = sdiv i32 undef, 7		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I32 = sdiv i32 undef, 7
; CHECK-NEXT: Cost Model: Found an estimated cost of 52 for instruction: %V4i32 = sdiv <4 x i32> undef, <i32 4, i32 5, i32 6, i32 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 48 for instruction: %V4i32 = sdiv <4 x i32> undef, <i32 4, i32 5, i32 6, i32 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 104 for instruction: %V8i32 = sdiv <8 x i32> undef, <i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11>		; CHECK-NEXT: Cost Model: Found an estimated cost of 96 for instruction: %V8i32 = sdiv <8 x i32> undef, <i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11>
; CHECK-NEXT: Cost Model: Found an estimated cost of 208 for instruction: %V16i32 = sdiv <16 x i32> undef, <i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 17, i32 18, i32 19>		; CHECK-NEXT: Cost Model: Found an estimated cost of 192 for instruction: %V16i32 = sdiv <16 x i32> undef, <i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 17, i32 18, i32 19>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I16 = sdiv i16 undef, 7		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I16 = sdiv i16 undef, 7
; CHECK-NEXT: Cost Model: Found an estimated cost of 108 for instruction: %V8i16 = sdiv <8 x i16> undef, <i16 4, i16 5, i16 6, i16 7, i16 8, i16 9, i16 10, i16 11>		; CHECK-NEXT: Cost Model: Found an estimated cost of 88 for instruction: %V8i16 = sdiv <8 x i16> undef, <i16 4, i16 5, i16 6, i16 7, i16 8, i16 9, i16 10, i16 11>
; CHECK-NEXT: Cost Model: Found an estimated cost of 216 for instruction: %V16i16 = sdiv <16 x i16> undef, <i16 4, i16 5, i16 6, i16 7, i16 8, i16 9, i16 10, i16 11, i16 12, i16 13, i16 14, i16 15, i16 16, i16 17, i16 18, i16 19>		; CHECK-NEXT: Cost Model: Found an estimated cost of 176 for instruction: %V16i16 = sdiv <16 x i16> undef, <i16 4, i16 5, i16 6, i16 7, i16 8, i16 9, i16 10, i16 11, i16 12, i16 13, i16 14, i16 15, i16 16, i16 17, i16 18, i16 19>
; CHECK-NEXT: Cost Model: Found an estimated cost of 432 for instruction: %V32i16 = sdiv <32 x i16> undef, <i16 4, i16 5, i16 6, i16 7, i16 8, i16 9, i16 10, i16 11, i16 12, i16 13, i16 14, i16 15, i16 16, i16 17, i16 18, i16 19, i16 4, i16 5, i16 6, i16 7, i16 8, i16 9, i16 10, i16 11, i16 12, i16 13, i16 14, i16 15, i16 16, i16 17, i16 18, i16 19>		; CHECK-NEXT: Cost Model: Found an estimated cost of 352 for instruction: %V32i16 = sdiv <32 x i16> undef, <i16 4, i16 5, i16 6, i16 7, i16 8, i16 9, i16 10, i16 11, i16 12, i16 13, i16 14, i16 15, i16 16, i16 17, i16 18, i16 19, i16 4, i16 5, i16 6, i16 7, i16 8, i16 9, i16 10, i16 11, i16 12, i16 13, i16 14, i16 15, i16 16, i16 17, i16 18, i16 19>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I8 = sdiv i8 undef, 7		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I8 = sdiv i8 undef, 7
; CHECK-NEXT: Cost Model: Found an estimated cost of 220 for instruction: %V16i8 = sdiv <16 x i8> undef, <i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19>		; CHECK-NEXT: Cost Model: Found an estimated cost of 168 for instruction: %V16i8 = sdiv <16 x i8> undef, <i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19>
; CHECK-NEXT: Cost Model: Found an estimated cost of 440 for instruction: %V32i8 = sdiv <32 x i8> undef, <i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19>		; CHECK-NEXT: Cost Model: Found an estimated cost of 336 for instruction: %V32i8 = sdiv <32 x i8> undef, <i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19>
; CHECK-NEXT: Cost Model: Found an estimated cost of 880 for instruction: %V64i8 = sdiv <64 x i8> undef, <i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19>		; CHECK-NEXT: Cost Model: Found an estimated cost of 672 for instruction: %V64i8 = sdiv <64 x i8> undef, <i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19>
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
;		;
%I64 = sdiv i64 undef, 7		%I64 = sdiv i64 undef, 7
%V2i64 = sdiv <2 x i64> undef, <i64 6, i64 7>		%V2i64 = sdiv <2 x i64> undef, <i64 6, i64 7>
%V4i64 = sdiv <4 x i64> undef, <i64 4, i64 5, i64 6, i64 7>		%V4i64 = sdiv <4 x i64> undef, <i64 4, i64 5, i64 6, i64 7>
%V8i64 = sdiv <8 x i64> undef, <i64 4, i64 5, i64 6, i64 7, i64 8, i64 9, i64 10, i64 11>		%V8i64 = sdiv <8 x i64> undef, <i64 4, i64 5, i64 6, i64 7, i64 8, i64 9, i64 10, i64 11>

%I32 = sdiv i32 undef, 7		%I32 = sdiv i32 undef, 7
Show All 12 Lines	;
%V64i8 = sdiv <64 x i8> undef, <i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19>		%V64i8 = sdiv <64 x i8> undef, <i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19>

ret i32 undef		ret i32 undef
}		}

define i32 @udiv_const() {		define i32 @udiv_const() {
; CHECK-LABEL: 'udiv_const'		; CHECK-LABEL: 'udiv_const'
; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %I64 = udiv i64 undef, 7		; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %I64 = udiv i64 undef, 7
; CHECK-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %V2i64 = udiv <2 x i64> undef, <i64 6, i64 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %V2i64 = udiv <2 x i64> undef, <i64 6, i64 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 48 for instruction: %V4i64 = udiv <4 x i64> undef, <i64 4, i64 5, i64 6, i64 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 56 for instruction: %V4i64 = udiv <4 x i64> undef, <i64 4, i64 5, i64 6, i64 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 96 for instruction: %V8i64 = udiv <8 x i64> undef, <i64 4, i64 5, i64 6, i64 7, i64 8, i64 9, i64 10, i64 11>		; CHECK-NEXT: Cost Model: Found an estimated cost of 112 for instruction: %V8i64 = udiv <8 x i64> undef, <i64 4, i64 5, i64 6, i64 7, i64 8, i64 9, i64 10, i64 11>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I32 = udiv i32 undef, 7		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I32 = udiv i32 undef, 7
; CHECK-NEXT: Cost Model: Found an estimated cost of 52 for instruction: %V4i32 = udiv <4 x i32> undef, <i32 4, i32 5, i32 6, i32 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 48 for instruction: %V4i32 = udiv <4 x i32> undef, <i32 4, i32 5, i32 6, i32 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 104 for instruction: %V8i32 = udiv <8 x i32> undef, <i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11>		; CHECK-NEXT: Cost Model: Found an estimated cost of 96 for instruction: %V8i32 = udiv <8 x i32> undef, <i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11>
; CHECK-NEXT: Cost Model: Found an estimated cost of 208 for instruction: %V16i32 = udiv <16 x i32> undef, <i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 17, i32 18, i32 19>		; CHECK-NEXT: Cost Model: Found an estimated cost of 192 for instruction: %V16i32 = udiv <16 x i32> undef, <i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 17, i32 18, i32 19>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I16 = udiv i16 undef, 7		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I16 = udiv i16 undef, 7
; CHECK-NEXT: Cost Model: Found an estimated cost of 108 for instruction: %V8i16 = udiv <8 x i16> undef, <i16 4, i16 5, i16 6, i16 7, i16 8, i16 9, i16 10, i16 11>		; CHECK-NEXT: Cost Model: Found an estimated cost of 88 for instruction: %V8i16 = udiv <8 x i16> undef, <i16 4, i16 5, i16 6, i16 7, i16 8, i16 9, i16 10, i16 11>
; CHECK-NEXT: Cost Model: Found an estimated cost of 216 for instruction: %V16i16 = udiv <16 x i16> undef, <i16 4, i16 5, i16 6, i16 7, i16 8, i16 9, i16 10, i16 11, i16 12, i16 13, i16 14, i16 15, i16 16, i16 17, i16 18, i16 19>		; CHECK-NEXT: Cost Model: Found an estimated cost of 176 for instruction: %V16i16 = udiv <16 x i16> undef, <i16 4, i16 5, i16 6, i16 7, i16 8, i16 9, i16 10, i16 11, i16 12, i16 13, i16 14, i16 15, i16 16, i16 17, i16 18, i16 19>
; CHECK-NEXT: Cost Model: Found an estimated cost of 432 for instruction: %V32i16 = udiv <32 x i16> undef, <i16 4, i16 5, i16 6, i16 7, i16 8, i16 9, i16 10, i16 11, i16 12, i16 13, i16 14, i16 15, i16 16, i16 17, i16 18, i16 19, i16 4, i16 5, i16 6, i16 7, i16 8, i16 9, i16 10, i16 11, i16 12, i16 13, i16 14, i16 15, i16 16, i16 17, i16 18, i16 19>		; CHECK-NEXT: Cost Model: Found an estimated cost of 352 for instruction: %V32i16 = udiv <32 x i16> undef, <i16 4, i16 5, i16 6, i16 7, i16 8, i16 9, i16 10, i16 11, i16 12, i16 13, i16 14, i16 15, i16 16, i16 17, i16 18, i16 19, i16 4, i16 5, i16 6, i16 7, i16 8, i16 9, i16 10, i16 11, i16 12, i16 13, i16 14, i16 15, i16 16, i16 17, i16 18, i16 19>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I8 = udiv i8 undef, 7		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I8 = udiv i8 undef, 7
; CHECK-NEXT: Cost Model: Found an estimated cost of 220 for instruction: %V16i8 = udiv <16 x i8> undef, <i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19>		; CHECK-NEXT: Cost Model: Found an estimated cost of 168 for instruction: %V16i8 = udiv <16 x i8> undef, <i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19>
; CHECK-NEXT: Cost Model: Found an estimated cost of 440 for instruction: %V32i8 = udiv <32 x i8> undef, <i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19>		; CHECK-NEXT: Cost Model: Found an estimated cost of 336 for instruction: %V32i8 = udiv <32 x i8> undef, <i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19>
; CHECK-NEXT: Cost Model: Found an estimated cost of 880 for instruction: %V64i8 = udiv <64 x i8> undef, <i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19>		; CHECK-NEXT: Cost Model: Found an estimated cost of 672 for instruction: %V64i8 = udiv <64 x i8> undef, <i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19>
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
;		;
%I64 = udiv i64 undef, 7		%I64 = udiv i64 undef, 7
%V2i64 = udiv <2 x i64> undef, <i64 6, i64 7>		%V2i64 = udiv <2 x i64> undef, <i64 6, i64 7>
%V4i64 = udiv <4 x i64> undef, <i64 4, i64 5, i64 6, i64 7>		%V4i64 = udiv <4 x i64> undef, <i64 4, i64 5, i64 6, i64 7>
%V8i64 = udiv <8 x i64> undef, <i64 4, i64 5, i64 6, i64 7, i64 8, i64 9, i64 10, i64 11>		%V8i64 = udiv <8 x i64> undef, <i64 4, i64 5, i64 6, i64 7, i64 8, i64 9, i64 10, i64 11>

%I32 = udiv i32 undef, 7		%I32 = udiv i32 undef, 7
▲ Show 20 Lines • Show All 98 Lines • ▼ Show 20 Lines	;
%V64i8 = udiv <64 x i8> undef, <i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7>		%V64i8 = udiv <64 x i8> undef, <i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7>

ret i32 undef		ret i32 undef
}		}

define i32 @sdiv_constpow2() {		define i32 @sdiv_constpow2() {
; CHECK-LABEL: 'sdiv_constpow2'		; CHECK-LABEL: 'sdiv_constpow2'
; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %I64 = sdiv i64 undef, 16		; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %I64 = sdiv i64 undef, 16
; CHECK-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %V2i64 = sdiv <2 x i64> undef, <i64 8, i64 16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %V2i64 = sdiv <2 x i64> undef, <i64 8, i64 16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 48 for instruction: %V4i64 = sdiv <4 x i64> undef, <i64 2, i64 4, i64 8, i64 16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 56 for instruction: %V4i64 = sdiv <4 x i64> undef, <i64 2, i64 4, i64 8, i64 16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 96 for instruction: %V8i64 = sdiv <8 x i64> undef, <i64 2, i64 4, i64 8, i64 16, i64 32, i64 64, i64 128, i64 256>		; CHECK-NEXT: Cost Model: Found an estimated cost of 112 for instruction: %V8i64 = sdiv <8 x i64> undef, <i64 2, i64 4, i64 8, i64 16, i64 32, i64 64, i64 128, i64 256>
; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %I32 = sdiv i32 undef, 16		; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %I32 = sdiv i32 undef, 16
; CHECK-NEXT: Cost Model: Found an estimated cost of 52 for instruction: %V4i32 = sdiv <4 x i32> undef, <i32 2, i32 4, i32 8, i32 16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 48 for instruction: %V4i32 = sdiv <4 x i32> undef, <i32 2, i32 4, i32 8, i32 16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 104 for instruction: %V8i32 = sdiv <8 x i32> undef, <i32 2, i32 4, i32 8, i32 16, i32 32, i32 64, i32 128, i32 256>		; CHECK-NEXT: Cost Model: Found an estimated cost of 96 for instruction: %V8i32 = sdiv <8 x i32> undef, <i32 2, i32 4, i32 8, i32 16, i32 32, i32 64, i32 128, i32 256>
; CHECK-NEXT: Cost Model: Found an estimated cost of 208 for instruction: %V16i32 = sdiv <16 x i32> undef, <i32 2, i32 4, i32 8, i32 16, i32 32, i32 64, i32 128, i32 256, i32 2, i32 4, i32 8, i32 16, i32 32, i32 64, i32 128, i32 256>		; CHECK-NEXT: Cost Model: Found an estimated cost of 192 for instruction: %V16i32 = sdiv <16 x i32> undef, <i32 2, i32 4, i32 8, i32 16, i32 32, i32 64, i32 128, i32 256, i32 2, i32 4, i32 8, i32 16, i32 32, i32 64, i32 128, i32 256>
; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %I16 = sdiv i16 undef, 16		; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %I16 = sdiv i16 undef, 16
; CHECK-NEXT: Cost Model: Found an estimated cost of 108 for instruction: %V8i16 = sdiv <8 x i16> undef, <i16 2, i16 4, i16 8, i16 16, i16 32, i16 64, i16 128, i16 256>		; CHECK-NEXT: Cost Model: Found an estimated cost of 88 for instruction: %V8i16 = sdiv <8 x i16> undef, <i16 2, i16 4, i16 8, i16 16, i16 32, i16 64, i16 128, i16 256>
; CHECK-NEXT: Cost Model: Found an estimated cost of 216 for instruction: %V16i16 = sdiv <16 x i16> undef, <i16 2, i16 4, i16 8, i16 16, i16 32, i16 64, i16 128, i16 256, i16 2, i16 4, i16 8, i16 16, i16 32, i16 64, i16 128, i16 256>		; CHECK-NEXT: Cost Model: Found an estimated cost of 176 for instruction: %V16i16 = sdiv <16 x i16> undef, <i16 2, i16 4, i16 8, i16 16, i16 32, i16 64, i16 128, i16 256, i16 2, i16 4, i16 8, i16 16, i16 32, i16 64, i16 128, i16 256>
; CHECK-NEXT: Cost Model: Found an estimated cost of 432 for instruction: %V32i16 = sdiv <32 x i16> undef, <i16 2, i16 4, i16 8, i16 16, i16 32, i16 64, i16 128, i16 256, i16 2, i16 4, i16 8, i16 16, i16 32, i16 64, i16 128, i16 256, i16 2, i16 4, i16 8, i16 16, i16 32, i16 64, i16 128, i16 256, i16 2, i16 4, i16 8, i16 16, i16 32, i16 64, i16 128, i16 256>		; CHECK-NEXT: Cost Model: Found an estimated cost of 352 for instruction: %V32i16 = sdiv <32 x i16> undef, <i16 2, i16 4, i16 8, i16 16, i16 32, i16 64, i16 128, i16 256, i16 2, i16 4, i16 8, i16 16, i16 32, i16 64, i16 128, i16 256, i16 2, i16 4, i16 8, i16 16, i16 32, i16 64, i16 128, i16 256, i16 2, i16 4, i16 8, i16 16, i16 32, i16 64, i16 128, i16 256>
; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %I8 = sdiv i8 undef, 16		; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %I8 = sdiv i8 undef, 16
; CHECK-NEXT: Cost Model: Found an estimated cost of 220 for instruction: %V16i8 = sdiv <16 x i8> undef, <i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 168 for instruction: %V16i8 = sdiv <16 x i8> undef, <i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 440 for instruction: %V32i8 = sdiv <32 x i8> undef, <i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 336 for instruction: %V32i8 = sdiv <32 x i8> undef, <i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 880 for instruction: %V64i8 = sdiv <64 x i8> undef, <i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 672 for instruction: %V64i8 = sdiv <64 x i8> undef, <i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
;		;
%I64 = sdiv i64 undef, 16		%I64 = sdiv i64 undef, 16
%V2i64 = sdiv <2 x i64> undef, <i64 8, i64 16>		%V2i64 = sdiv <2 x i64> undef, <i64 8, i64 16>
%V4i64 = sdiv <4 x i64> undef, <i64 2, i64 4, i64 8, i64 16>		%V4i64 = sdiv <4 x i64> undef, <i64 2, i64 4, i64 8, i64 16>
%V8i64 = sdiv <8 x i64> undef, <i64 2, i64 4, i64 8, i64 16, i64 32, i64 64, i64 128, i64 256>		%V8i64 = sdiv <8 x i64> undef, <i64 2, i64 4, i64 8, i64 16, i64 32, i64 64, i64 128, i64 256>

%I32 = sdiv i32 undef, 16		%I32 = sdiv i32 undef, 16
Show All 12 Lines	;
%V64i8 = sdiv <64 x i8> undef, <i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16>		%V64i8 = sdiv <64 x i8> undef, <i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16>

ret i32 undef		ret i32 undef
}		}

define i32 @udiv_constpow2() {		define i32 @udiv_constpow2() {
; CHECK-LABEL: 'udiv_constpow2'		; CHECK-LABEL: 'udiv_constpow2'
; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %I64 = udiv i64 undef, 16		; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %I64 = udiv i64 undef, 16
; CHECK-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %V2i64 = udiv <2 x i64> undef, <i64 8, i64 16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %V2i64 = udiv <2 x i64> undef, <i64 8, i64 16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 48 for instruction: %V4i64 = udiv <4 x i64> undef, <i64 2, i64 4, i64 8, i64 16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 56 for instruction: %V4i64 = udiv <4 x i64> undef, <i64 2, i64 4, i64 8, i64 16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 96 for instruction: %V8i64 = udiv <8 x i64> undef, <i64 2, i64 4, i64 8, i64 16, i64 32, i64 64, i64 128, i64 256>		; CHECK-NEXT: Cost Model: Found an estimated cost of 112 for instruction: %V8i64 = udiv <8 x i64> undef, <i64 2, i64 4, i64 8, i64 16, i64 32, i64 64, i64 128, i64 256>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I32 = udiv i32 undef, 16		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I32 = udiv i32 undef, 16
; CHECK-NEXT: Cost Model: Found an estimated cost of 52 for instruction: %V4i32 = udiv <4 x i32> undef, <i32 2, i32 4, i32 8, i32 16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 48 for instruction: %V4i32 = udiv <4 x i32> undef, <i32 2, i32 4, i32 8, i32 16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 104 for instruction: %V8i32 = udiv <8 x i32> undef, <i32 2, i32 4, i32 8, i32 16, i32 32, i32 64, i32 128, i32 256>		; CHECK-NEXT: Cost Model: Found an estimated cost of 96 for instruction: %V8i32 = udiv <8 x i32> undef, <i32 2, i32 4, i32 8, i32 16, i32 32, i32 64, i32 128, i32 256>
; CHECK-NEXT: Cost Model: Found an estimated cost of 208 for instruction: %V16i32 = udiv <16 x i32> undef, <i32 2, i32 4, i32 8, i32 16, i32 32, i32 64, i32 128, i32 256, i32 2, i32 4, i32 8, i32 16, i32 32, i32 64, i32 128, i32 256>		; CHECK-NEXT: Cost Model: Found an estimated cost of 192 for instruction: %V16i32 = udiv <16 x i32> undef, <i32 2, i32 4, i32 8, i32 16, i32 32, i32 64, i32 128, i32 256, i32 2, i32 4, i32 8, i32 16, i32 32, i32 64, i32 128, i32 256>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I16 = udiv i16 undef, 16		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I16 = udiv i16 undef, 16
; CHECK-NEXT: Cost Model: Found an estimated cost of 108 for instruction: %V8i16 = udiv <8 x i16> undef, <i16 2, i16 4, i16 8, i16 16, i16 32, i16 64, i16 128, i16 256>		; CHECK-NEXT: Cost Model: Found an estimated cost of 88 for instruction: %V8i16 = udiv <8 x i16> undef, <i16 2, i16 4, i16 8, i16 16, i16 32, i16 64, i16 128, i16 256>
; CHECK-NEXT: Cost Model: Found an estimated cost of 216 for instruction: %V16i16 = udiv <16 x i16> undef, <i16 2, i16 4, i16 8, i16 16, i16 32, i16 64, i16 128, i16 256, i16 2, i16 4, i16 8, i16 16, i16 32, i16 64, i16 128, i16 256>		; CHECK-NEXT: Cost Model: Found an estimated cost of 176 for instruction: %V16i16 = udiv <16 x i16> undef, <i16 2, i16 4, i16 8, i16 16, i16 32, i16 64, i16 128, i16 256, i16 2, i16 4, i16 8, i16 16, i16 32, i16 64, i16 128, i16 256>
; CHECK-NEXT: Cost Model: Found an estimated cost of 432 for instruction: %V32i16 = udiv <32 x i16> undef, <i16 2, i16 4, i16 8, i16 16, i16 32, i16 64, i16 128, i16 256, i16 2, i16 4, i16 8, i16 16, i16 32, i16 64, i16 128, i16 256, i16 2, i16 4, i16 8, i16 16, i16 32, i16 64, i16 128, i16 256, i16 2, i16 4, i16 8, i16 16, i16 32, i16 64, i16 128, i16 256>		; CHECK-NEXT: Cost Model: Found an estimated cost of 352 for instruction: %V32i16 = udiv <32 x i16> undef, <i16 2, i16 4, i16 8, i16 16, i16 32, i16 64, i16 128, i16 256, i16 2, i16 4, i16 8, i16 16, i16 32, i16 64, i16 128, i16 256, i16 2, i16 4, i16 8, i16 16, i16 32, i16 64, i16 128, i16 256, i16 2, i16 4, i16 8, i16 16, i16 32, i16 64, i16 128, i16 256>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I8 = udiv i8 undef, 16		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I8 = udiv i8 undef, 16
; CHECK-NEXT: Cost Model: Found an estimated cost of 220 for instruction: %V16i8 = udiv <16 x i8> undef, <i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 168 for instruction: %V16i8 = udiv <16 x i8> undef, <i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 440 for instruction: %V32i8 = udiv <32 x i8> undef, <i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 336 for instruction: %V32i8 = udiv <32 x i8> undef, <i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 880 for instruction: %V64i8 = udiv <64 x i8> undef, <i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 672 for instruction: %V64i8 = udiv <64 x i8> undef, <i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
;		;
%I64 = udiv i64 undef, 16		%I64 = udiv i64 undef, 16
%V2i64 = udiv <2 x i64> undef, <i64 8, i64 16>		%V2i64 = udiv <2 x i64> undef, <i64 8, i64 16>
%V4i64 = udiv <4 x i64> undef, <i64 2, i64 4, i64 8, i64 16>		%V4i64 = udiv <4 x i64> undef, <i64 2, i64 4, i64 8, i64 16>
%V8i64 = udiv <8 x i64> undef, <i64 2, i64 4, i64 8, i64 16, i64 32, i64 64, i64 128, i64 256>		%V8i64 = udiv <8 x i64> undef, <i64 2, i64 4, i64 8, i64 16, i64 32, i64 64, i64 128, i64 256>

%I32 = udiv i32 undef, 16		%I32 = udiv i32 undef, 16
Show All 12 Lines	;
%V64i8 = udiv <64 x i8> undef, <i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16>		%V64i8 = udiv <64 x i8> undef, <i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16>

ret i32 undef		ret i32 undef
}		}

define i32 @sdiv_uniformconstpow2() {		define i32 @sdiv_uniformconstpow2() {
; CHECK-LABEL: 'sdiv_uniformconstpow2'		; CHECK-LABEL: 'sdiv_uniformconstpow2'
; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %I64 = sdiv i64 undef, 16		; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %I64 = sdiv i64 undef, 16
; CHECK-NEXT: Cost Model: Found an estimated cost of 13 for instruction: %V2i64 = sdiv <2 x i64> undef, <i64 16, i64 16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 15 for instruction: %V2i64 = sdiv <2 x i64> undef, <i64 16, i64 16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 26 for instruction: %V4i64 = sdiv <4 x i64> undef, <i64 16, i64 16, i64 16, i64 16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 30 for instruction: %V4i64 = sdiv <4 x i64> undef, <i64 16, i64 16, i64 16, i64 16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 52 for instruction: %V8i64 = sdiv <8 x i64> undef, <i64 16, i64 16, i64 16, i64 16, i64 16, i64 16, i64 16, i64 16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 60 for instruction: %V8i64 = sdiv <8 x i64> undef, <i64 16, i64 16, i64 16, i64 16, i64 16, i64 16, i64 16, i64 16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %I32 = sdiv i32 undef, 16		; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %I32 = sdiv i32 undef, 16
; CHECK-NEXT: Cost Model: Found an estimated cost of 29 for instruction: %V4i32 = sdiv <4 x i32> undef, <i32 16, i32 16, i32 16, i32 16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 27 for instruction: %V4i32 = sdiv <4 x i32> undef, <i32 16, i32 16, i32 16, i32 16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 58 for instruction: %V8i32 = sdiv <8 x i32> undef, <i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 54 for instruction: %V8i32 = sdiv <8 x i32> undef, <i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 116 for instruction: %V16i32 = sdiv <16 x i32> undef, <i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 108 for instruction: %V16i32 = sdiv <16 x i32> undef, <i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %I16 = sdiv i16 undef, 16		; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %I16 = sdiv i16 undef, 16
; CHECK-NEXT: Cost Model: Found an estimated cost of 61 for instruction: %V8i16 = sdiv <8 x i16> undef, <i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 51 for instruction: %V8i16 = sdiv <8 x i16> undef, <i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 122 for instruction: %V16i16 = sdiv <16 x i16> undef, <i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 102 for instruction: %V16i16 = sdiv <16 x i16> undef, <i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 244 for instruction: %V32i16 = sdiv <32 x i16> undef, <i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 204 for instruction: %V32i16 = sdiv <32 x i16> undef, <i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %I8 = sdiv i8 undef, 16		; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %I8 = sdiv i8 undef, 16
; CHECK-NEXT: Cost Model: Found an estimated cost of 125 for instruction: %V16i8 = sdiv <16 x i8> undef, <i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 99 for instruction: %V16i8 = sdiv <16 x i8> undef, <i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 250 for instruction: %V32i8 = sdiv <32 x i8> undef, <i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 198 for instruction: %V32i8 = sdiv <32 x i8> undef, <i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 500 for instruction: %V64i8 = sdiv <64 x i8> undef, <i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 396 for instruction: %V64i8 = sdiv <64 x i8> undef, <i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
;		;
%I64 = sdiv i64 undef, 16		%I64 = sdiv i64 undef, 16
%V2i64 = sdiv <2 x i64> undef, <i64 16, i64 16>		%V2i64 = sdiv <2 x i64> undef, <i64 16, i64 16>
%V4i64 = sdiv <4 x i64> undef, <i64 16, i64 16, i64 16, i64 16>		%V4i64 = sdiv <4 x i64> undef, <i64 16, i64 16, i64 16, i64 16>
%V8i64 = sdiv <8 x i64> undef, <i64 16, i64 16, i64 16, i64 16, i64 16, i64 16, i64 16, i64 16>		%V8i64 = sdiv <8 x i64> undef, <i64 16, i64 16, i64 16, i64 16, i64 16, i64 16, i64 16, i64 16>

%I32 = sdiv i32 undef, 16		%I32 = sdiv i32 undef, 16
▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	;
%V64i8 = udiv <64 x i8> undef, <i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16>		%V64i8 = udiv <64 x i8> undef, <i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16>

ret i32 undef		ret i32 undef
}		}

define i32 @sdiv_constnegpow2() {		define i32 @sdiv_constnegpow2() {
; CHECK-LABEL: 'sdiv_constnegpow2'		; CHECK-LABEL: 'sdiv_constnegpow2'
; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %I64 = sdiv i64 undef, -16		; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %I64 = sdiv i64 undef, -16
; CHECK-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %V2i64 = sdiv <2 x i64> undef, <i64 -8, i64 -16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %V2i64 = sdiv <2 x i64> undef, <i64 -8, i64 -16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 48 for instruction: %V4i64 = sdiv <4 x i64> undef, <i64 -2, i64 -4, i64 -8, i64 -16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 56 for instruction: %V4i64 = sdiv <4 x i64> undef, <i64 -2, i64 -4, i64 -8, i64 -16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 96 for instruction: %V8i64 = sdiv <8 x i64> undef, <i64 -2, i64 -4, i64 -8, i64 -16, i64 -32, i64 -64, i64 -128, i64 -256>		; CHECK-NEXT: Cost Model: Found an estimated cost of 112 for instruction: %V8i64 = sdiv <8 x i64> undef, <i64 -2, i64 -4, i64 -8, i64 -16, i64 -32, i64 -64, i64 -128, i64 -256>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I32 = sdiv i32 undef, -16		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I32 = sdiv i32 undef, -16
; CHECK-NEXT: Cost Model: Found an estimated cost of 52 for instruction: %V4i32 = sdiv <4 x i32> undef, <i32 -2, i32 -4, i32 -8, i32 -16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 48 for instruction: %V4i32 = sdiv <4 x i32> undef, <i32 -2, i32 -4, i32 -8, i32 -16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 104 for instruction: %V8i32 = sdiv <8 x i32> undef, <i32 -2, i32 -4, i32 -8, i32 -16, i32 -32, i32 -64, i32 -128, i32 -256>		; CHECK-NEXT: Cost Model: Found an estimated cost of 96 for instruction: %V8i32 = sdiv <8 x i32> undef, <i32 -2, i32 -4, i32 -8, i32 -16, i32 -32, i32 -64, i32 -128, i32 -256>
; CHECK-NEXT: Cost Model: Found an estimated cost of 208 for instruction: %V16i32 = sdiv <16 x i32> undef, <i32 -2, i32 -4, i32 -8, i32 -16, i32 -32, i32 -64, i32 -128, i32 -256, i32 -2, i32 -4, i32 -8, i32 -16, i32 -32, i32 -64, i32 -128, i32 -256>		; CHECK-NEXT: Cost Model: Found an estimated cost of 192 for instruction: %V16i32 = sdiv <16 x i32> undef, <i32 -2, i32 -4, i32 -8, i32 -16, i32 -32, i32 -64, i32 -128, i32 -256, i32 -2, i32 -4, i32 -8, i32 -16, i32 -32, i32 -64, i32 -128, i32 -256>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I16 = sdiv i16 undef, -16		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I16 = sdiv i16 undef, -16
; CHECK-NEXT: Cost Model: Found an estimated cost of 108 for instruction: %V8i16 = sdiv <8 x i16> undef, <i16 -2, i16 -4, i16 -8, i16 -16, i16 -32, i16 -64, i16 -128, i16 -256>		; CHECK-NEXT: Cost Model: Found an estimated cost of 88 for instruction: %V8i16 = sdiv <8 x i16> undef, <i16 -2, i16 -4, i16 -8, i16 -16, i16 -32, i16 -64, i16 -128, i16 -256>
; CHECK-NEXT: Cost Model: Found an estimated cost of 216 for instruction: %V16i16 = sdiv <16 x i16> undef, <i16 -2, i16 -4, i16 -8, i16 -16, i16 -32, i16 -64, i16 -128, i16 -256, i16 -2, i16 -4, i16 -8, i16 -16, i16 -32, i16 -64, i16 -128, i16 -256>		; CHECK-NEXT: Cost Model: Found an estimated cost of 176 for instruction: %V16i16 = sdiv <16 x i16> undef, <i16 -2, i16 -4, i16 -8, i16 -16, i16 -32, i16 -64, i16 -128, i16 -256, i16 -2, i16 -4, i16 -8, i16 -16, i16 -32, i16 -64, i16 -128, i16 -256>
; CHECK-NEXT: Cost Model: Found an estimated cost of 432 for instruction: %V32i16 = sdiv <32 x i16> undef, <i16 -2, i16 -4, i16 -8, i16 -16, i16 -32, i16 -64, i16 -128, i16 -256, i16 -2, i16 -4, i16 -8, i16 -16, i16 -32, i16 -64, i16 -128, i16 -256, i16 -2, i16 -4, i16 -8, i16 -16, i16 -32, i16 -64, i16 -128, i16 -256, i16 -2, i16 -4, i16 -8, i16 -16, i16 -32, i16 -64, i16 -128, i16 -256>		; CHECK-NEXT: Cost Model: Found an estimated cost of 352 for instruction: %V32i16 = sdiv <32 x i16> undef, <i16 -2, i16 -4, i16 -8, i16 -16, i16 -32, i16 -64, i16 -128, i16 -256, i16 -2, i16 -4, i16 -8, i16 -16, i16 -32, i16 -64, i16 -128, i16 -256, i16 -2, i16 -4, i16 -8, i16 -16, i16 -32, i16 -64, i16 -128, i16 -256, i16 -2, i16 -4, i16 -8, i16 -16, i16 -32, i16 -64, i16 -128, i16 -256>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I8 = sdiv i8 undef, -16		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I8 = sdiv i8 undef, -16
; CHECK-NEXT: Cost Model: Found an estimated cost of 220 for instruction: %V16i8 = sdiv <16 x i8> undef, <i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 168 for instruction: %V16i8 = sdiv <16 x i8> undef, <i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 440 for instruction: %V32i8 = sdiv <32 x i8> undef, <i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 336 for instruction: %V32i8 = sdiv <32 x i8> undef, <i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 880 for instruction: %V64i8 = sdiv <64 x i8> undef, <i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 672 for instruction: %V64i8 = sdiv <64 x i8> undef, <i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
;		;
%I64 = sdiv i64 undef, -16		%I64 = sdiv i64 undef, -16
%V2i64 = sdiv <2 x i64> undef, <i64 -8, i64 -16>		%V2i64 = sdiv <2 x i64> undef, <i64 -8, i64 -16>
%V4i64 = sdiv <4 x i64> undef, <i64 -2, i64 -4, i64 -8, i64 -16>		%V4i64 = sdiv <4 x i64> undef, <i64 -2, i64 -4, i64 -8, i64 -16>
%V8i64 = sdiv <8 x i64> undef, <i64 -2, i64 -4, i64 -8, i64 -16, i64 -32, i64 -64, i64 -128, i64 -256>		%V8i64 = sdiv <8 x i64> undef, <i64 -2, i64 -4, i64 -8, i64 -16, i64 -32, i64 -64, i64 -128, i64 -256>

%I32 = sdiv i32 undef, -16		%I32 = sdiv i32 undef, -16
Show All 12 Lines	;
%V64i8 = sdiv <64 x i8> undef, <i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16>		%V64i8 = sdiv <64 x i8> undef, <i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16>

ret i32 undef		ret i32 undef
}		}

define i32 @udiv_constnegpow2() {		define i32 @udiv_constnegpow2() {
; CHECK-LABEL: 'udiv_constnegpow2'		; CHECK-LABEL: 'udiv_constnegpow2'
; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %I64 = udiv i64 undef, -16		; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %I64 = udiv i64 undef, -16
; CHECK-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %V2i64 = udiv <2 x i64> undef, <i64 -8, i64 -16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %V2i64 = udiv <2 x i64> undef, <i64 -8, i64 -16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 48 for instruction: %V4i64 = udiv <4 x i64> undef, <i64 -2, i64 -4, i64 -8, i64 -16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 56 for instruction: %V4i64 = udiv <4 x i64> undef, <i64 -2, i64 -4, i64 -8, i64 -16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 96 for instruction: %V8i64 = udiv <8 x i64> undef, <i64 -2, i64 -4, i64 -8, i64 -16, i64 -32, i64 -64, i64 -128, i64 -256>		; CHECK-NEXT: Cost Model: Found an estimated cost of 112 for instruction: %V8i64 = udiv <8 x i64> undef, <i64 -2, i64 -4, i64 -8, i64 -16, i64 -32, i64 -64, i64 -128, i64 -256>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I32 = udiv i32 undef, -16		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I32 = udiv i32 undef, -16
; CHECK-NEXT: Cost Model: Found an estimated cost of 52 for instruction: %V4i32 = udiv <4 x i32> undef, <i32 -2, i32 -4, i32 -8, i32 -16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 48 for instruction: %V4i32 = udiv <4 x i32> undef, <i32 -2, i32 -4, i32 -8, i32 -16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 104 for instruction: %V8i32 = udiv <8 x i32> undef, <i32 -2, i32 -4, i32 -8, i32 -16, i32 -32, i32 -64, i32 -128, i32 -256>		; CHECK-NEXT: Cost Model: Found an estimated cost of 96 for instruction: %V8i32 = udiv <8 x i32> undef, <i32 -2, i32 -4, i32 -8, i32 -16, i32 -32, i32 -64, i32 -128, i32 -256>
; CHECK-NEXT: Cost Model: Found an estimated cost of 208 for instruction: %V16i32 = udiv <16 x i32> undef, <i32 -2, i32 -4, i32 -8, i32 -16, i32 -32, i32 -64, i32 -128, i32 -256, i32 -2, i32 -4, i32 -8, i32 -16, i32 -32, i32 -64, i32 -128, i32 -256>		; CHECK-NEXT: Cost Model: Found an estimated cost of 192 for instruction: %V16i32 = udiv <16 x i32> undef, <i32 -2, i32 -4, i32 -8, i32 -16, i32 -32, i32 -64, i32 -128, i32 -256, i32 -2, i32 -4, i32 -8, i32 -16, i32 -32, i32 -64, i32 -128, i32 -256>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I16 = udiv i16 undef, -16		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I16 = udiv i16 undef, -16
; CHECK-NEXT: Cost Model: Found an estimated cost of 108 for instruction: %V8i16 = udiv <8 x i16> undef, <i16 -2, i16 -4, i16 -8, i16 -16, i16 -32, i16 -64, i16 -128, i16 -256>		; CHECK-NEXT: Cost Model: Found an estimated cost of 88 for instruction: %V8i16 = udiv <8 x i16> undef, <i16 -2, i16 -4, i16 -8, i16 -16, i16 -32, i16 -64, i16 -128, i16 -256>
; CHECK-NEXT: Cost Model: Found an estimated cost of 216 for instruction: %V16i16 = udiv <16 x i16> undef, <i16 -2, i16 -4, i16 -8, i16 -16, i16 -32, i16 -64, i16 -128, i16 -256, i16 -2, i16 -4, i16 -8, i16 -16, i16 -32, i16 -64, i16 -128, i16 -256>		; CHECK-NEXT: Cost Model: Found an estimated cost of 176 for instruction: %V16i16 = udiv <16 x i16> undef, <i16 -2, i16 -4, i16 -8, i16 -16, i16 -32, i16 -64, i16 -128, i16 -256, i16 -2, i16 -4, i16 -8, i16 -16, i16 -32, i16 -64, i16 -128, i16 -256>
; CHECK-NEXT: Cost Model: Found an estimated cost of 432 for instruction: %V32i16 = udiv <32 x i16> undef, <i16 -2, i16 -4, i16 -8, i16 -16, i16 -32, i16 -64, i16 -128, i16 -256, i16 -2, i16 -4, i16 -8, i16 -16, i16 -32, i16 -64, i16 -128, i16 -256, i16 -2, i16 -4, i16 -8, i16 -16, i16 -32, i16 -64, i16 -128, i16 -256, i16 -2, i16 -4, i16 -8, i16 -16, i16 -32, i16 -64, i16 -128, i16 -256>		; CHECK-NEXT: Cost Model: Found an estimated cost of 352 for instruction: %V32i16 = udiv <32 x i16> undef, <i16 -2, i16 -4, i16 -8, i16 -16, i16 -32, i16 -64, i16 -128, i16 -256, i16 -2, i16 -4, i16 -8, i16 -16, i16 -32, i16 -64, i16 -128, i16 -256, i16 -2, i16 -4, i16 -8, i16 -16, i16 -32, i16 -64, i16 -128, i16 -256, i16 -2, i16 -4, i16 -8, i16 -16, i16 -32, i16 -64, i16 -128, i16 -256>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I8 = udiv i8 undef, -16		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I8 = udiv i8 undef, -16
; CHECK-NEXT: Cost Model: Found an estimated cost of 220 for instruction: %V16i8 = udiv <16 x i8> undef, <i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 168 for instruction: %V16i8 = udiv <16 x i8> undef, <i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 440 for instruction: %V32i8 = udiv <32 x i8> undef, <i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 336 for instruction: %V32i8 = udiv <32 x i8> undef, <i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 880 for instruction: %V64i8 = udiv <64 x i8> undef, <i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 672 for instruction: %V64i8 = udiv <64 x i8> undef, <i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
;		;
%I64 = udiv i64 undef, -16		%I64 = udiv i64 undef, -16
%V2i64 = udiv <2 x i64> undef, <i64 -8, i64 -16>		%V2i64 = udiv <2 x i64> undef, <i64 -8, i64 -16>
%V4i64 = udiv <4 x i64> undef, <i64 -2, i64 -4, i64 -8, i64 -16>		%V4i64 = udiv <4 x i64> undef, <i64 -2, i64 -4, i64 -8, i64 -16>
%V8i64 = udiv <8 x i64> undef, <i64 -2, i64 -4, i64 -8, i64 -16, i64 -32, i64 -64, i64 -128, i64 -256>		%V8i64 = udiv <8 x i64> undef, <i64 -2, i64 -4, i64 -8, i64 -16, i64 -32, i64 -64, i64 -128, i64 -256>

%I32 = udiv i32 undef, -16		%I32 = udiv i32 undef, -16
▲ Show 20 Lines • Show All 102 Lines • Show Last 20 Lines

llvm/test/Analysis/CostModel/AArch64/fptoi_sat.ll

	Show All 28 Lines
	; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v2f32s1 = call <2 x i1> @llvm.fptosi.sat.v2i1.v2f32(<2 x float> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v2f32s1 = call <2 x i1> @llvm.fptosi.sat.v2i1.v2f32(<2 x float> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v2f32u1 = call <2 x i1> @llvm.fptoui.sat.v2i1.v2f32(<2 x float> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v2f32u1 = call <2 x i1> @llvm.fptoui.sat.v2i1.v2f32(<2 x float> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v2f32s8 = call <2 x i8> @llvm.fptosi.sat.v2i8.v2f32(<2 x float> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v2f32s8 = call <2 x i8> @llvm.fptosi.sat.v2i8.v2f32(<2 x float> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v2f32u8 = call <2 x i8> @llvm.fptoui.sat.v2i8.v2f32(<2 x float> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v2f32u8 = call <2 x i8> @llvm.fptoui.sat.v2i8.v2f32(<2 x float> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v2f32s16 = call <2 x i16> @llvm.fptosi.sat.v2i16.v2f32(<2 x float> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v2f32s16 = call <2 x i16> @llvm.fptosi.sat.v2i16.v2f32(<2 x float> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v2f32u16 = call <2 x i16> @llvm.fptoui.sat.v2i16.v2f32(<2 x float> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v2f32u16 = call <2 x i16> @llvm.fptoui.sat.v2i16.v2f32(<2 x float> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2f32s32 = call <2 x i32> @llvm.fptosi.sat.v2i32.v2f32(<2 x float> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2f32s32 = call <2 x i32> @llvm.fptosi.sat.v2i32.v2f32(<2 x float> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2f32u32 = call <2 x i32> @llvm.fptoui.sat.v2i32.v2f32(<2 x float> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2f32u32 = call <2 x i32> @llvm.fptoui.sat.v2i32.v2f32(<2 x float> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %v2f32s64 = call <2 x i64> @llvm.fptosi.sat.v2i64.v2f32(<2 x float> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %v2f32s64 = call <2 x i64> @llvm.fptosi.sat.v2i64.v2f32(<2 x float> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v2f32u64 = call <2 x i64> @llvm.fptoui.sat.v2i64.v2f32(<2 x float> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v2f32u64 = call <2 x i64> @llvm.fptoui.sat.v2i64.v2f32(<2 x float> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %v2f64s1 = call <2 x i1> @llvm.fptosi.sat.v2i1.v2f64(<2 x double> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %v2f64s1 = call <2 x i1> @llvm.fptosi.sat.v2i1.v2f64(<2 x double> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %v2f64u1 = call <2 x i1> @llvm.fptoui.sat.v2i1.v2f64(<2 x double> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %v2f64u1 = call <2 x i1> @llvm.fptoui.sat.v2i1.v2f64(<2 x double> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %v2f64s8 = call <2 x i8> @llvm.fptosi.sat.v2i8.v2f64(<2 x double> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %v2f64s8 = call <2 x i8> @llvm.fptosi.sat.v2i8.v2f64(<2 x double> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %v2f64u8 = call <2 x i8> @llvm.fptoui.sat.v2i8.v2f64(<2 x double> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %v2f64u8 = call <2 x i8> @llvm.fptoui.sat.v2i8.v2f64(<2 x double> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %v2f64s16 = call <2 x i16> @llvm.fptosi.sat.v2i16.v2f64(<2 x double> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %v2f64s16 = call <2 x i16> @llvm.fptosi.sat.v2i16.v2f64(<2 x double> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %v2f64u16 = call <2 x i16> @llvm.fptoui.sat.v2i16.v2f64(<2 x double> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %v2f64u16 = call <2 x i16> @llvm.fptoui.sat.v2i16.v2f64(<2 x double> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %v2f64s32 = call <2 x i32> @llvm.fptosi.sat.v2i32.v2f64(<2 x double> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %v2f64s32 = call <2 x i32> @llvm.fptosi.sat.v2i32.v2f64(<2 x double> undef)
	▲ Show 20 Lines • Show All 182 Lines • ▼ Show 20 Lines
	; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %f16s8 = call i8 @llvm.fptosi.sat.i8.f16(half undef)			; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %f16s8 = call i8 @llvm.fptosi.sat.i8.f16(half undef)
	; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %f16u8 = call i8 @llvm.fptoui.sat.i8.f16(half undef)			; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %f16u8 = call i8 @llvm.fptoui.sat.i8.f16(half undef)
	; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %f16s16 = call i16 @llvm.fptosi.sat.i16.f16(half undef)			; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %f16s16 = call i16 @llvm.fptosi.sat.i16.f16(half undef)
	; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %f16u16 = call i16 @llvm.fptoui.sat.i16.f16(half undef)			; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %f16u16 = call i16 @llvm.fptoui.sat.i16.f16(half undef)
	; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %f16s32 = call i32 @llvm.fptosi.sat.i32.f16(half undef)			; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %f16s32 = call i32 @llvm.fptosi.sat.i32.f16(half undef)
	; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %f16u32 = call i32 @llvm.fptoui.sat.i32.f16(half undef)			; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %f16u32 = call i32 @llvm.fptoui.sat.i32.f16(half undef)
	; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %f16s64 = call i64 @llvm.fptosi.sat.i64.f16(half undef)			; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %f16s64 = call i64 @llvm.fptosi.sat.i64.f16(half undef)
	; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %f16u64 = call i64 @llvm.fptoui.sat.i64.f16(half undef)			; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %f16u64 = call i64 @llvm.fptoui.sat.i64.f16(half undef)
	; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 33 for instruction: %v2f16s1 = call <2 x i1> @llvm.fptosi.sat.v2i1.v2f16(<2 x half> undef)			; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 27 for instruction: %v2f16s1 = call <2 x i1> @llvm.fptosi.sat.v2i1.v2f16(<2 x half> undef)
	; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 23 for instruction: %v2f16u1 = call <2 x i1> @llvm.fptoui.sat.v2i1.v2f16(<2 x half> undef)			; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 17 for instruction: %v2f16u1 = call <2 x i1> @llvm.fptoui.sat.v2i1.v2f16(<2 x half> undef)
	; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %v2f16s8 = call <2 x i8> @llvm.fptosi.sat.v2i8.v2f16(<2 x half> undef)			; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 27 for instruction: %v2f16s8 = call <2 x i8> @llvm.fptosi.sat.v2i8.v2f16(<2 x half> undef)
	; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 23 for instruction: %v2f16u8 = call <2 x i8> @llvm.fptoui.sat.v2i8.v2f16(<2 x half> undef)			; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 17 for instruction: %v2f16u8 = call <2 x i8> @llvm.fptoui.sat.v2i8.v2f16(<2 x half> undef)
	; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %v2f16s16 = call <2 x i16> @llvm.fptosi.sat.v2i16.v2f16(<2 x half> undef)			; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 27 for instruction: %v2f16s16 = call <2 x i16> @llvm.fptosi.sat.v2i16.v2f16(<2 x half> undef)
	; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 23 for instruction: %v2f16u16 = call <2 x i16> @llvm.fptoui.sat.v2i16.v2f16(<2 x half> undef)			; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 17 for instruction: %v2f16u16 = call <2 x i16> @llvm.fptoui.sat.v2i16.v2f16(<2 x half> undef)
	; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %v2f16s32 = call <2 x i32> @llvm.fptosi.sat.v2i32.v2f16(<2 x half> undef)			; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 27 for instruction: %v2f16s32 = call <2 x i32> @llvm.fptosi.sat.v2i32.v2f16(<2 x half> undef)
	; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 23 for instruction: %v2f16u32 = call <2 x i32> @llvm.fptoui.sat.v2i32.v2f16(<2 x half> undef)			; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 17 for instruction: %v2f16u32 = call <2 x i32> @llvm.fptoui.sat.v2i32.v2f16(<2 x half> undef)
	; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 39 for instruction: %v2f16s64 = call <2 x i64> @llvm.fptosi.sat.v2i64.v2f16(<2 x half> undef)			; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 36 for instruction: %v2f16s64 = call <2 x i64> @llvm.fptosi.sat.v2i64.v2f16(<2 x half> undef)
	; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 30 for instruction: %v2f16u64 = call <2 x i64> @llvm.fptoui.sat.v2i64.v2f16(<2 x half> undef)			; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 26 for instruction: %v2f16u64 = call <2 x i64> @llvm.fptoui.sat.v2i64.v2f16(<2 x half> undef)
	; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 83 for instruction: %v4f16s1 = call <4 x i1> @llvm.fptosi.sat.v4i1.v4f16(<4 x half> undef)			; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 61 for instruction: %v4f16s1 = call <4 x i1> @llvm.fptosi.sat.v4i1.v4f16(<4 x half> undef)
	; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 63 for instruction: %v4f16u1 = call <4 x i1> @llvm.fptoui.sat.v4i1.v4f16(<4 x half> undef)			; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 45 for instruction: %v4f16u1 = call <4 x i1> @llvm.fptoui.sat.v4i1.v4f16(<4 x half> undef)
	; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v4f16s8 = call <4 x i8> @llvm.fptosi.sat.v4i8.v4f16(<4 x half> undef)			; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 61 for instruction: %v4f16s8 = call <4 x i8> @llvm.fptosi.sat.v4i8.v4f16(<4 x half> undef)
	; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 63 for instruction: %v4f16u8 = call <4 x i8> @llvm.fptoui.sat.v4i8.v4f16(<4 x half> undef)			; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 45 for instruction: %v4f16u8 = call <4 x i8> @llvm.fptoui.sat.v4i8.v4f16(<4 x half> undef)
	; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v4f16s16 = call <4 x i16> @llvm.fptosi.sat.v4i16.v4f16(<4 x half> undef)			; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 61 for instruction: %v4f16s16 = call <4 x i16> @llvm.fptosi.sat.v4i16.v4f16(<4 x half> undef)
	; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 63 for instruction: %v4f16u16 = call <4 x i16> @llvm.fptoui.sat.v4i16.v4f16(<4 x half> undef)			; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 45 for instruction: %v4f16u16 = call <4 x i16> @llvm.fptoui.sat.v4i16.v4f16(<4 x half> undef)
	; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 101 for instruction: %v4f16s32 = call <4 x i32> @llvm.fptosi.sat.v4i32.v4f16(<4 x half> undef)			; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %v4f16s32 = call <4 x i32> @llvm.fptosi.sat.v4i32.v4f16(<4 x half> undef)
	; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 84 for instruction: %v4f16u32 = call <4 x i32> @llvm.fptoui.sat.v4i32.v4f16(<4 x half> undef)			; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 64 for instruction: %v4f16u32 = call <4 x i32> @llvm.fptoui.sat.v4i32.v4f16(<4 x half> undef)
	; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 163 for instruction: %v4f16s64 = call <4 x i64> @llvm.fptosi.sat.v4i64.v4f16(<4 x half> undef)			; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 149 for instruction: %v4f16s64 = call <4 x i64> @llvm.fptosi.sat.v4i64.v4f16(<4 x half> undef)
	; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 79 for instruction: %v4f16u64 = call <4 x i64> @llvm.fptoui.sat.v4i64.v4f16(<4 x half> undef)			; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 65 for instruction: %v4f16u64 = call <4 x i64> @llvm.fptoui.sat.v4i64.v4f16(<4 x half> undef)
	; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 271 for instruction: %v8f16s1 = call <8 x i1> @llvm.fptosi.sat.v8i1.v8f16(<8 x half> undef)			; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 186 for instruction: %v8f16s1 = call <8 x i1> @llvm.fptosi.sat.v8i1.v8f16(<8 x half> undef)
	; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 206 for instruction: %v8f16u1 = call <8 x i1> @llvm.fptoui.sat.v8i1.v8f16(<8 x half> undef)			; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 140 for instruction: %v8f16u1 = call <8 x i1> @llvm.fptoui.sat.v8i1.v8f16(<8 x half> undef)
	; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 250 for instruction: %v8f16s8 = call <8 x i8> @llvm.fptosi.sat.v8i8.v8f16(<8 x half> undef)			; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 186 for instruction: %v8f16s8 = call <8 x i8> @llvm.fptosi.sat.v8i8.v8f16(<8 x half> undef)
	; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 192 for instruction: %v8f16u8 = call <8 x i8> @llvm.fptoui.sat.v8i8.v8f16(<8 x half> undef)			; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 140 for instruction: %v8f16u8 = call <8 x i8> @llvm.fptoui.sat.v8i8.v8f16(<8 x half> undef)
	; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 201 for instruction: %v8f16s16 = call <8 x i16> @llvm.fptosi.sat.v8i16.v8f16(<8 x half> undef)			; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 147 for instruction: %v8f16s16 = call <8 x i16> @llvm.fptosi.sat.v8i16.v8f16(<8 x half> undef)
	; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 143 for instruction: %v8f16u16 = call <8 x i16> @llvm.fptoui.sat.v8i16.v8f16(<8 x half> undef)			; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 101 for instruction: %v8f16u16 = call <8 x i16> @llvm.fptoui.sat.v8i16.v8f16(<8 x half> undef)
	; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 224 for instruction: %v8f16s32 = call <8 x i32> @llvm.fptosi.sat.v8i32.v8f16(<8 x half> undef)			; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 171 for instruction: %v8f16s32 = call <8 x i32> @llvm.fptosi.sat.v8i32.v8f16(<8 x half> undef)
	; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 187 for instruction: %v8f16u32 = call <8 x i32> @llvm.fptoui.sat.v8i32.v8f16(<8 x half> undef)			; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 141 for instruction: %v8f16u32 = call <8 x i32> @llvm.fptoui.sat.v8i32.v8f16(<8 x half> undef)
	; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 366 for instruction: %v8f16s64 = call <8 x i64> @llvm.fptosi.sat.v8i64.v8f16(<8 x half> undef)			; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 325 for instruction: %v8f16s64 = call <8 x i64> @llvm.fptosi.sat.v8i64.v8f16(<8 x half> undef)
	; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 177 for instruction: %v8f16u64 = call <8 x i64> @llvm.fptoui.sat.v8i64.v8f16(<8 x half> undef)			; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 143 for instruction: %v8f16u64 = call <8 x i64> @llvm.fptoui.sat.v8i64.v8f16(<8 x half> undef)
	; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 547 for instruction: %v16f16s1 = call <16 x i1> @llvm.fptosi.sat.v16i1.v16f16(<16 x half> undef)			; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 373 for instruction: %v16f16s1 = call <16 x i1> @llvm.fptosi.sat.v16i1.v16f16(<16 x half> undef)
	; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 413 for instruction: %v16f16u1 = call <16 x i1> @llvm.fptoui.sat.v16i1.v16f16(<16 x half> undef)			; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 281 for instruction: %v16f16u1 = call <16 x i1> @llvm.fptoui.sat.v16i1.v16f16(<16 x half> undef)
	; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 504 for instruction: %v16f16s8 = call <16 x i8> @llvm.fptosi.sat.v16i8.v16f16(<16 x half> undef)			; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 373 for instruction: %v16f16s8 = call <16 x i8> @llvm.fptosi.sat.v16i8.v16f16(<16 x half> undef)
	; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 385 for instruction: %v16f16u8 = call <16 x i8> @llvm.fptoui.sat.v16i8.v16f16(<16 x half> undef)			; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 281 for instruction: %v16f16u8 = call <16 x i8> @llvm.fptoui.sat.v16i8.v16f16(<16 x half> undef)
	; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 360 for instruction: %v16f16s16 = call <16 x i16> @llvm.fptosi.sat.v16i16.v16f16(<16 x half> undef)			; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 262 for instruction: %v16f16s16 = call <16 x i16> @llvm.fptosi.sat.v16i16.v16f16(<16 x half> undef)
	; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 286 for instruction: %v16f16u16 = call <16 x i16> @llvm.fptoui.sat.v16i16.v16f16(<16 x half> undef)			; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 202 for instruction: %v16f16u16 = call <16 x i16> @llvm.fptoui.sat.v16i16.v16f16(<16 x half> undef)
	; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 448 for instruction: %v16f16s32 = call <16 x i32> @llvm.fptosi.sat.v16i32.v16f16(<16 x half> undef)			; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 342 for instruction: %v16f16s32 = call <16 x i32> @llvm.fptosi.sat.v16i32.v16f16(<16 x half> undef)
	; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 374 for instruction: %v16f16u32 = call <16 x i32> @llvm.fptoui.sat.v16i32.v16f16(<16 x half> undef)			; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 282 for instruction: %v16f16u32 = call <16 x i32> @llvm.fptoui.sat.v16i32.v16f16(<16 x half> undef)
	; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 732 for instruction: %v16f16s64 = call <16 x i64> @llvm.fptosi.sat.v16i64.v16f16(<16 x half> undef)			; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 650 for instruction: %v16f16s64 = call <16 x i64> @llvm.fptosi.sat.v16i64.v16f16(<16 x half> undef)
	; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 354 for instruction: %v16f16u64 = call <16 x i64> @llvm.fptoui.sat.v16i64.v16f16(<16 x half> undef)			; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 286 for instruction: %v16f16u64 = call <16 x i64> @llvm.fptoui.sat.v16i64.v16f16(<16 x half> undef)
	; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void			; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	;			;
	; CHECK-FP16-LABEL: 'fp16'			; CHECK-FP16-LABEL: 'fp16'
	; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %f16s1 = call i1 @llvm.fptosi.sat.i1.f16(half undef)			; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %f16s1 = call i1 @llvm.fptosi.sat.i1.f16(half undef)
	; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %f16u1 = call i1 @llvm.fptoui.sat.i1.f16(half undef)			; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %f16u1 = call i1 @llvm.fptoui.sat.i1.f16(half undef)
	; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %f16s8 = call i8 @llvm.fptosi.sat.i8.f16(half undef)			; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %f16s8 = call i8 @llvm.fptosi.sat.i8.f16(half undef)
	; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %f16u8 = call i8 @llvm.fptoui.sat.i8.f16(half undef)			; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %f16u8 = call i8 @llvm.fptoui.sat.i8.f16(half undef)
	; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %f16s16 = call i16 @llvm.fptosi.sat.i16.f16(half undef)			; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %f16s16 = call i16 @llvm.fptosi.sat.i16.f16(half undef)
	; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %f16u16 = call i16 @llvm.fptoui.sat.i16.f16(half undef)			; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %f16u16 = call i16 @llvm.fptoui.sat.i16.f16(half undef)
	; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %f16s32 = call i32 @llvm.fptosi.sat.i32.f16(half undef)			; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %f16s32 = call i32 @llvm.fptosi.sat.i32.f16(half undef)
	; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %f16u32 = call i32 @llvm.fptoui.sat.i32.f16(half undef)			; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %f16u32 = call i32 @llvm.fptoui.sat.i32.f16(half undef)
	; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %f16s64 = call i64 @llvm.fptosi.sat.i64.f16(half undef)			; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %f16s64 = call i64 @llvm.fptosi.sat.i64.f16(half undef)
	; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %f16u64 = call i64 @llvm.fptoui.sat.i64.f16(half undef)			; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %f16u64 = call i64 @llvm.fptoui.sat.i64.f16(half undef)
	; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v2f16s1 = call <2 x i1> @llvm.fptosi.sat.v2i1.v2f16(<2 x half> undef)			; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v2f16s1 = call <2 x i1> @llvm.fptosi.sat.v2i1.v2f16(<2 x half> undef)
	; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v2f16u1 = call <2 x i1> @llvm.fptoui.sat.v2i1.v2f16(<2 x half> undef)			; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v2f16u1 = call <2 x i1> @llvm.fptoui.sat.v2i1.v2f16(<2 x half> undef)
	; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v2f16s8 = call <2 x i8> @llvm.fptosi.sat.v2i8.v2f16(<2 x half> undef)			; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v2f16s8 = call <2 x i8> @llvm.fptosi.sat.v2i8.v2f16(<2 x half> undef)
	; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v2f16u8 = call <2 x i8> @llvm.fptoui.sat.v2i8.v2f16(<2 x half> undef)			; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v2f16u8 = call <2 x i8> @llvm.fptoui.sat.v2i8.v2f16(<2 x half> undef)
	; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2f16s16 = call <2 x i16> @llvm.fptosi.sat.v2i16.v2f16(<2 x half> undef)			; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2f16s16 = call <2 x i16> @llvm.fptosi.sat.v2i16.v2f16(<2 x half> undef)
	; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2f16u16 = call <2 x i16> @llvm.fptoui.sat.v2i16.v2f16(<2 x half> undef)			; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2f16u16 = call <2 x i16> @llvm.fptoui.sat.v2i16.v2f16(<2 x half> undef)
	; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %v2f16s32 = call <2 x i32> @llvm.fptosi.sat.v2i32.v2f16(<2 x half> undef)			; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %v2f16s32 = call <2 x i32> @llvm.fptosi.sat.v2i32.v2f16(<2 x half> undef)
	; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v2f16u32 = call <2 x i32> @llvm.fptoui.sat.v2i32.v2f16(<2 x half> undef)			; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v2f16u32 = call <2 x i32> @llvm.fptoui.sat.v2i32.v2f16(<2 x half> undef)
	; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %v2f16s64 = call <2 x i64> @llvm.fptosi.sat.v2i64.v2f16(<2 x half> undef)			; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 19 for instruction: %v2f16s64 = call <2 x i64> @llvm.fptosi.sat.v2i64.v2f16(<2 x half> undef)
	; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %v2f16u64 = call <2 x i64> @llvm.fptoui.sat.v2i64.v2f16(<2 x half> undef)			; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %v2f16u64 = call <2 x i64> @llvm.fptoui.sat.v2i64.v2f16(<2 x half> undef)
	; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v4f16s1 = call <4 x i1> @llvm.fptosi.sat.v4i1.v4f16(<4 x half> undef)			; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v4f16s1 = call <4 x i1> @llvm.fptosi.sat.v4i1.v4f16(<4 x half> undef)
	; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v4f16u1 = call <4 x i1> @llvm.fptoui.sat.v4i1.v4f16(<4 x half> undef)			; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v4f16u1 = call <4 x i1> @llvm.fptoui.sat.v4i1.v4f16(<4 x half> undef)
	; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v4f16s8 = call <4 x i8> @llvm.fptosi.sat.v4i8.v4f16(<4 x half> undef)			; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v4f16s8 = call <4 x i8> @llvm.fptosi.sat.v4i8.v4f16(<4 x half> undef)
	; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v4f16u8 = call <4 x i8> @llvm.fptoui.sat.v4i8.v4f16(<4 x half> undef)			; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v4f16u8 = call <4 x i8> @llvm.fptoui.sat.v4i8.v4f16(<4 x half> undef)
	; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4f16s16 = call <4 x i16> @llvm.fptosi.sat.v4i16.v4f16(<4 x half> undef)			; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4f16s16 = call <4 x i16> @llvm.fptosi.sat.v4i16.v4f16(<4 x half> undef)
	; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4f16u16 = call <4 x i16> @llvm.fptoui.sat.v4i16.v4f16(<4 x half> undef)			; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4f16u16 = call <4 x i16> @llvm.fptoui.sat.v4i16.v4f16(<4 x half> undef)
	; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 18 for instruction: %v4f16s32 = call <4 x i32> @llvm.fptosi.sat.v4i32.v4f16(<4 x half> undef)			; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 17 for instruction: %v4f16s32 = call <4 x i32> @llvm.fptosi.sat.v4i32.v4f16(<4 x half> undef)
	; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v4f16u32 = call <4 x i32> @llvm.fptoui.sat.v4i32.v4f16(<4 x half> undef)			; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v4f16u32 = call <4 x i32> @llvm.fptoui.sat.v4i32.v4f16(<4 x half> undef)
	; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 100 for instruction: %v4f16s64 = call <4 x i64> @llvm.fptosi.sat.v4i64.v4f16(<4 x half> undef)			; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 104 for instruction: %v4f16s64 = call <4 x i64> @llvm.fptosi.sat.v4i64.v4f16(<4 x half> undef)
	; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 19 for instruction: %v4f16u64 = call <4 x i64> @llvm.fptoui.sat.v4i64.v4f16(<4 x half> undef)			; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 23 for instruction: %v4f16u64 = call <4 x i64> @llvm.fptoui.sat.v4i64.v4f16(<4 x half> undef)
	; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v8f16s1 = call <8 x i1> @llvm.fptosi.sat.v8i1.v8f16(<8 x half> undef)			; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v8f16s1 = call <8 x i1> @llvm.fptosi.sat.v8i1.v8f16(<8 x half> undef)
	; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v8f16u1 = call <8 x i1> @llvm.fptoui.sat.v8i1.v8f16(<8 x half> undef)			; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v8f16u1 = call <8 x i1> @llvm.fptoui.sat.v8i1.v8f16(<8 x half> undef)
	; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v8f16s8 = call <8 x i8> @llvm.fptosi.sat.v8i8.v8f16(<8 x half> undef)			; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v8f16s8 = call <8 x i8> @llvm.fptosi.sat.v8i8.v8f16(<8 x half> undef)
	; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v8f16u8 = call <8 x i8> @llvm.fptoui.sat.v8i8.v8f16(<8 x half> undef)			; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v8f16u8 = call <8 x i8> @llvm.fptoui.sat.v8i8.v8f16(<8 x half> undef)
	; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8f16s16 = call <8 x i16> @llvm.fptosi.sat.v8i16.v8f16(<8 x half> undef)			; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8f16s16 = call <8 x i16> @llvm.fptosi.sat.v8i16.v8f16(<8 x half> undef)
	; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8f16u16 = call <8 x i16> @llvm.fptoui.sat.v8i16.v8f16(<8 x half> undef)			; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8f16u16 = call <8 x i16> @llvm.fptoui.sat.v8i16.v8f16(<8 x half> undef)
	; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 15 for instruction: %v8f16s32 = call <8 x i32> @llvm.fptosi.sat.v8i32.v8f16(<8 x half> undef)			; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 15 for instruction: %v8f16s32 = call <8 x i32> @llvm.fptosi.sat.v8i32.v8f16(<8 x half> undef)
	; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %v8f16u32 = call <8 x i32> @llvm.fptoui.sat.v8i32.v8f16(<8 x half> undef)			; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %v8f16u32 = call <8 x i32> @llvm.fptoui.sat.v8i32.v8f16(<8 x half> undef)
	; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 198 for instruction: %v8f16s64 = call <8 x i64> @llvm.fptosi.sat.v8i64.v8f16(<8 x half> undef)			; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 206 for instruction: %v8f16s64 = call <8 x i64> @llvm.fptosi.sat.v8i64.v8f16(<8 x half> undef)
	; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 37 for instruction: %v8f16u64 = call <8 x i64> @llvm.fptoui.sat.v8i64.v8f16(<8 x half> undef)			; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 45 for instruction: %v8f16u64 = call <8 x i64> @llvm.fptoui.sat.v8i64.v8f16(<8 x half> undef)
	; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %v16f16s1 = call <16 x i1> @llvm.fptosi.sat.v16i1.v16f16(<16 x half> undef)			; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %v16f16s1 = call <16 x i1> @llvm.fptosi.sat.v16i1.v16f16(<16 x half> undef)
	; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %v16f16u1 = call <16 x i1> @llvm.fptoui.sat.v16i1.v16f16(<16 x half> undef)			; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %v16f16u1 = call <16 x i1> @llvm.fptoui.sat.v16i1.v16f16(<16 x half> undef)
	; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %v16f16s8 = call <16 x i8> @llvm.fptosi.sat.v16i8.v16f16(<16 x half> undef)			; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %v16f16s8 = call <16 x i8> @llvm.fptosi.sat.v16i8.v16f16(<16 x half> undef)
	; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %v16f16u8 = call <16 x i8> @llvm.fptoui.sat.v16i8.v16f16(<16 x half> undef)			; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %v16f16u8 = call <16 x i8> @llvm.fptoui.sat.v16i8.v16f16(<16 x half> undef)
	; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v16f16s16 = call <16 x i16> @llvm.fptosi.sat.v16i16.v16f16(<16 x half> undef)			; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v16f16s16 = call <16 x i16> @llvm.fptosi.sat.v16i16.v16f16(<16 x half> undef)
	; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v16f16u16 = call <16 x i16> @llvm.fptoui.sat.v16i16.v16f16(<16 x half> undef)			; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v16f16u16 = call <16 x i16> @llvm.fptoui.sat.v16i16.v16f16(<16 x half> undef)
	; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 34 for instruction: %v16f16s32 = call <16 x i32> @llvm.fptosi.sat.v16i32.v16f16(<16 x half> undef)			; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 34 for instruction: %v16f16s32 = call <16 x i32> @llvm.fptosi.sat.v16i32.v16f16(<16 x half> undef)
	; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %v16f16u32 = call <16 x i32> @llvm.fptoui.sat.v16i32.v16f16(<16 x half> undef)			; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %v16f16u32 = call <16 x i32> @llvm.fptoui.sat.v16i32.v16f16(<16 x half> undef)
	; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 400 for instruction: %v16f16s64 = call <16 x i64> @llvm.fptosi.sat.v16i64.v16f16(<16 x half> undef)			; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 416 for instruction: %v16f16s64 = call <16 x i64> @llvm.fptosi.sat.v16i64.v16f16(<16 x half> undef)
	; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 78 for instruction: %v16f16u64 = call <16 x i64> @llvm.fptoui.sat.v16i64.v16f16(<16 x half> undef)			; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 94 for instruction: %v16f16u64 = call <16 x i64> @llvm.fptoui.sat.v16i64.v16f16(<16 x half> undef)
	; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void			; CHECK-FP16-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	;			;
	%f16s1 = call i1 @llvm.fptosi.sat.i1.f16(half undef)			%f16s1 = call i1 @llvm.fptosi.sat.i1.f16(half undef)
	%f16u1 = call i1 @llvm.fptoui.sat.i1.f16(half undef)			%f16u1 = call i1 @llvm.fptoui.sat.i1.f16(half undef)
	%f16s8 = call i8 @llvm.fptosi.sat.i8.f16(half undef)			%f16s8 = call i8 @llvm.fptosi.sat.i8.f16(half undef)
	%f16u8 = call i8 @llvm.fptoui.sat.i8.f16(half undef)			%f16u8 = call i8 @llvm.fptoui.sat.i8.f16(half undef)
	%f16s16 = call i16 @llvm.fptosi.sat.i16.f16(half undef)			%f16s16 = call i16 @llvm.fptosi.sat.i16.f16(half undef)
	%f16u16 = call i16 @llvm.fptoui.sat.i16.f16(half undef)			%f16u16 = call i16 @llvm.fptoui.sat.i16.f16(half undef)
	▲ Show 20 Lines • Show All 217 Lines • Show Last 20 Lines

llvm/test/Analysis/CostModel/AArch64/free-widening-casts.ll

	Show First 20 Lines • Show All 610 Lines • ▼ Show 20 Lines
	; COST-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %tmp0 = zext <4 x i16> %a to <4 x i64>			; COST-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %tmp0 = zext <4 x i16> %a to <4 x i64>
	define <4 x i64> @neg_llegal_vector_type_2(<4 x i16> %a, <4 x i64> %b) {			define <4 x i64> @neg_llegal_vector_type_2(<4 x i16> %a, <4 x i64> %b) {
	%tmp0 = zext <4 x i16> %a to <4 x i64>			%tmp0 = zext <4 x i16> %a to <4 x i64>
	%tmp1 = add <4 x i64> %b, %tmp0			%tmp1 = add <4 x i64> %b, %tmp0
	ret <4 x i64> %tmp1			ret <4 x i64> %tmp1
	}			}

	; COST-LABEL: neg_llegal_vector_type_3			; COST-LABEL: neg_llegal_vector_type_3
	; COST-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %tmp0 = zext <3 x i34> %a to <3 x i68>			; COST-NEXT: Cost Model: Found an estimated cost of 15 for instruction: %tmp0 = zext <3 x i34> %a to <3 x i68>
	define <3 x i68> @neg_llegal_vector_type_3(<3 x i34> %a, <3 x i68> %b) {			define <3 x i68> @neg_llegal_vector_type_3(<3 x i34> %a, <3 x i68> %b) {
	%tmp0 = zext <3 x i34> %a to <3 x i68>			%tmp0 = zext <3 x i34> %a to <3 x i68>
	%tmp1 = add <3 x i68> %b, %tmp0			%tmp1 = add <3 x i68> %b, %tmp0
	ret <3 x i68> %tmp1			ret <3 x i68> %tmp1
	}			}

llvm/test/Analysis/CostModel/AArch64/fshl.ll

	Show First 20 Lines • Show All 110 Lines • ▼ Show 20 Lines
	;			;
	entry:			entry:
	%fshl = tail call <16 x i8> @llvm.fshl.v16i8(<16 x i8> %a, <16 x i8> %b, <16 x i8> <i8 9, i8 1, i8 13, i8 7, i8 31, i8 23, i8 43, i8 51, i8 3, i8 3, i8 17, i8 3, i8 11, i8 15, i8 3, i8 3>)			%fshl = tail call <16 x i8> @llvm.fshl.v16i8(<16 x i8> %a, <16 x i8> %b, <16 x i8> <i8 9, i8 1, i8 13, i8 7, i8 31, i8 23, i8 43, i8 51, i8 3, i8 3, i8 17, i8 3, i8 11, i8 15, i8 3, i8 3>)
	ret <16 x i8> %fshl			ret <16 x i8> %fshl
	}			}

	define <16 x i8> @fshl_v16i8_3rd_arg_var(<16 x i8> %a, <16 x i8> %b, <16 x i8> %c) {			define <16 x i8> @fshl_v16i8_3rd_arg_var(<16 x i8> %a, <16 x i8> %b, <16 x i8> %c) {
	; CHECK-LABEL: 'fshl_v16i8_3rd_arg_var'			; CHECK-LABEL: 'fshl_v16i8_3rd_arg_var'
	; CHECK-NEXT: Cost Model: Found an estimated cost of 144 for instruction: %fshl = tail call <16 x i8> @llvm.fshl.v16i8(<16 x i8> %a, <16 x i8> %b, <16 x i8> %c)			; CHECK-NEXT: Cost Model: Found an estimated cost of 118 for instruction: %fshl = tail call <16 x i8> @llvm.fshl.v16i8(<16 x i8> %a, <16 x i8> %b, <16 x i8> %c)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <16 x i8> %fshl			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <16 x i8> %fshl
	;			;
	entry:			entry:
	%fshl = tail call <16 x i8> @llvm.fshl.v16i8(<16 x i8> %a, <16 x i8> %b, <16 x i8> %c)			%fshl = tail call <16 x i8> @llvm.fshl.v16i8(<16 x i8> %a, <16 x i8> %b, <16 x i8> %c)
	ret <16 x i8> %fshl			ret <16 x i8> %fshl
	}			}

	declare <16 x i8> @llvm.fshl.v16i8(<16 x i8>, <16 x i8>, <16 x i8>)			declare <16 x i8> @llvm.fshl.v16i8(<16 x i8>, <16 x i8>, <16 x i8>)
	Show All 15 Lines
	;			;
	entry:			entry:
	%fshl = tail call <8 x i16> @llvm.fshl.v8i16(<8 x i16> %a, <8 x i16> %b, <8 x i16> <i16 3, i16 1, i16 13, i16 8, i16 7, i16 31, i16 43, i16 51>)			%fshl = tail call <8 x i16> @llvm.fshl.v8i16(<8 x i16> %a, <8 x i16> %b, <8 x i16> <i16 3, i16 1, i16 13, i16 8, i16 7, i16 31, i16 43, i16 51>)
	ret <8 x i16> %fshl			ret <8 x i16> %fshl
	}			}

	define <8 x i16> @fshl_v8i16_3rd_arg_var(<8 x i16> %a, <8 x i16> %b, <8 x i16> %c) {			define <8 x i16> @fshl_v8i16_3rd_arg_var(<8 x i16> %a, <8 x i16> %b, <8 x i16> %c) {
	; CHECK-LABEL: 'fshl_v8i16_3rd_arg_var'			; CHECK-LABEL: 'fshl_v8i16_3rd_arg_var'
	; CHECK-NEXT: Cost Model: Found an estimated cost of 72 for instruction: %fshl = tail call <8 x i16> @llvm.fshl.v8i16(<8 x i16> %a, <8 x i16> %b, <8 x i16> %c)			; CHECK-NEXT: Cost Model: Found an estimated cost of 62 for instruction: %fshl = tail call <8 x i16> @llvm.fshl.v8i16(<8 x i16> %a, <8 x i16> %b, <8 x i16> %c)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <8 x i16> %fshl			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <8 x i16> %fshl
	;			;
	entry:			entry:
	%fshl = tail call <8 x i16> @llvm.fshl.v8i16(<8 x i16> %a, <8 x i16> %b, <8 x i16> %c)			%fshl = tail call <8 x i16> @llvm.fshl.v8i16(<8 x i16> %a, <8 x i16> %b, <8 x i16> %c)
	ret <8 x i16> %fshl			ret <8 x i16> %fshl
	}			}

	declare <8 x i16> @llvm.fshl.v8i16(<8 x i16>, <8 x i16>, <8 x i16>)			declare <8 x i16> @llvm.fshl.v8i16(<8 x i16>, <8 x i16>, <8 x i16>)
	Show All 15 Lines
	;			;
	entry:			entry:
	%fshl = tail call <4 x i32> @llvm.fshl.v4i32(<4 x i32> %a, <4 x i32> %b, <4 x i32> <i32 1, i32 3, i32 11, i32 2>)			%fshl = tail call <4 x i32> @llvm.fshl.v4i32(<4 x i32> %a, <4 x i32> %b, <4 x i32> <i32 1, i32 3, i32 11, i32 2>)
	ret <4 x i32> %fshl			ret <4 x i32> %fshl
	}			}

	define <4 x i32> @fshl_v4i32_3rd_arg_var(<4 x i32> %a, <4 x i32> %b, <4 x i32> %c) {			define <4 x i32> @fshl_v4i32_3rd_arg_var(<4 x i32> %a, <4 x i32> %b, <4 x i32> %c) {
	; CHECK-LABEL: 'fshl_v4i32_3rd_arg_var'			; CHECK-LABEL: 'fshl_v4i32_3rd_arg_var'
	; CHECK-NEXT: Cost Model: Found an estimated cost of 36 for instruction: %fshl = tail call <4 x i32> @llvm.fshl.v4i32(<4 x i32> %a, <4 x i32> %b, <4 x i32> %c)			; CHECK-NEXT: Cost Model: Found an estimated cost of 34 for instruction: %fshl = tail call <4 x i32> @llvm.fshl.v4i32(<4 x i32> %a, <4 x i32> %b, <4 x i32> %c)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i32> %fshl			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i32> %fshl
	;			;
	entry:			entry:
	%fshl = tail call <4 x i32> @llvm.fshl.v4i32(<4 x i32> %a, <4 x i32> %b, <4 x i32> %c)			%fshl = tail call <4 x i32> @llvm.fshl.v4i32(<4 x i32> %a, <4 x i32> %b, <4 x i32> %c)
	ret <4 x i32> %fshl			ret <4 x i32> %fshl
	}			}

	declare <4 x i32> @llvm.fshl.v4i32(<4 x i32>, <4 x i32>, <4 x i32>)			declare <4 x i32> @llvm.fshl.v4i32(<4 x i32>, <4 x i32>, <4 x i32>)
	Show All 15 Lines
	;			;
	entry:			entry:
	%fshl = tail call <2 x i64> @llvm.fshl.v4i64(<2 x i64> %a, <2 x i64> %b, <2 x i64> <i64 1, i64 2>)			%fshl = tail call <2 x i64> @llvm.fshl.v4i64(<2 x i64> %a, <2 x i64> %b, <2 x i64> <i64 1, i64 2>)
	ret <2 x i64> %fshl			ret <2 x i64> %fshl
	}			}

	define <2 x i64> @fshl_v2i64_3rd_arg_var(<2 x i64> %a, <2 x i64> %b, <2 x i64> %c) {			define <2 x i64> @fshl_v2i64_3rd_arg_var(<2 x i64> %a, <2 x i64> %b, <2 x i64> %c) {
	; CHECK-LABEL: 'fshl_v2i64_3rd_arg_var'			; CHECK-LABEL: 'fshl_v2i64_3rd_arg_var'
	; CHECK-NEXT: Cost Model: Found an estimated cost of 30 for instruction: %fshl = tail call <2 x i64> @llvm.fshl.v2i64(<2 x i64> %a, <2 x i64> %b, <2 x i64> %c)			; CHECK-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %fshl = tail call <2 x i64> @llvm.fshl.v2i64(<2 x i64> %a, <2 x i64> %b, <2 x i64> %c)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <2 x i64> %fshl			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <2 x i64> %fshl
	;			;
	entry:			entry:
	%fshl = tail call <2 x i64> @llvm.fshl.v4i64(<2 x i64> %a, <2 x i64> %b, <2 x i64> %c)			%fshl = tail call <2 x i64> @llvm.fshl.v4i64(<2 x i64> %a, <2 x i64> %b, <2 x i64> %c)
	ret <2 x i64> %fshl			ret <2 x i64> %fshl
	}			}

	declare <2 x i64> @llvm.fshl.v4i64(<2 x i64>, <2 x i64>, <2 x i64>)			declare <2 x i64> @llvm.fshl.v4i64(<2 x i64>, <2 x i64>, <2 x i64>)

	define <4 x i30> @fshl_v4i30_3rd_arg_var(<4 x i30> %a, <4 x i30> %b, <4 x i30> %c) {			define <4 x i30> @fshl_v4i30_3rd_arg_var(<4 x i30> %a, <4 x i30> %b, <4 x i30> %c) {
	; CHECK-LABEL: 'fshl_v4i30_3rd_arg_var'			; CHECK-LABEL: 'fshl_v4i30_3rd_arg_var'
	; CHECK-NEXT: Cost Model: Found an estimated cost of 36 for instruction: %fshl = tail call <4 x i30> @llvm.fshl.v4i30(<4 x i30> %a, <4 x i30> %b, <4 x i30> %c)			; CHECK-NEXT: Cost Model: Found an estimated cost of 34 for instruction: %fshl = tail call <4 x i30> @llvm.fshl.v4i30(<4 x i30> %a, <4 x i30> %b, <4 x i30> %c)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i30> %fshl			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i30> %fshl
	;			;
	entry:			entry:
	%fshl = tail call <4 x i30> @llvm.fshl.v4i30(<4 x i30> %a, <4 x i30> %b, <4 x i30> %c)			%fshl = tail call <4 x i30> @llvm.fshl.v4i30(<4 x i30> %a, <4 x i30> %b, <4 x i30> %c)
	ret <4 x i30> %fshl			ret <4 x i30> %fshl
	}			}

	declare <4 x i30> @llvm.fshl.v4i30(<4 x i30>, <4 x i30>, <4 x i30>)			declare <4 x i30> @llvm.fshl.v4i30(<4 x i30>, <4 x i30>, <4 x i30>)

	define <2 x i66> @fshl_v2i66_3rd_arg_vec_const_lanes_different(<2 x i66> %a, <2 x i66> %b) {			define <2 x i66> @fshl_v2i66_3rd_arg_vec_const_lanes_different(<2 x i66> %a, <2 x i66> %b) {
	; CHECK-LABEL: 'fshl_v2i66_3rd_arg_vec_const_lanes_different'			; CHECK-LABEL: 'fshl_v2i66_3rd_arg_vec_const_lanes_different'
	; CHECK-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %fshl = tail call <2 x i66> @llvm.fshl.v2i66(<2 x i66> %a, <2 x i66> %b, <2 x i66> <i66 1, i66 2>)			; CHECK-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %fshl = tail call <2 x i66> @llvm.fshl.v2i66(<2 x i66> %a, <2 x i66> %b, <2 x i66> <i66 1, i66 2>)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <2 x i66> %fshl			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <2 x i66> %fshl
	;			;
	entry:			entry:
	%fshl = tail call <2 x i66> @llvm.fshl.v4i66(<2 x i66> %a, <2 x i66> %b, <2 x i66> <i66 1, i66 2>)			%fshl = tail call <2 x i66> @llvm.fshl.v4i66(<2 x i66> %a, <2 x i66> %b, <2 x i66> <i66 1, i66 2>)
	ret <2 x i66> %fshl			ret <2 x i66> %fshl
	}			}
	declare <2 x i66> @llvm.fshl.v4i66(<2 x i66>, <2 x i66>, <2 x i66>)			declare <2 x i66> @llvm.fshl.v4i66(<2 x i66>, <2 x i66>, <2 x i66>)

	define i66 @fshl_i66(i66 %a, i66 %b) {			define i66 @fshl_i66(i66 %a, i66 %b) {
	; CHECK-LABEL: 'fshl_i66'			; CHECK-LABEL: 'fshl_i66'
	; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %fshl = tail call i66 @llvm.fshl.i66(i66 %a, i66 %b, i66 9)			; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %fshl = tail call i66 @llvm.fshl.i66(i66 %a, i66 %b, i66 9)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i66 %fshl			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i66 %fshl
	;			;
	entry:			entry:
	%fshl = tail call i66 @llvm.fshl.i66(i66 %a, i66 %b, i66 9)			%fshl = tail call i66 @llvm.fshl.i66(i66 %a, i66 %b, i66 9)
	ret i66 %fshl			ret i66 %fshl
	}			}

	declare i66 @llvm.fshl.i66(i66, i66, i66)			declare i66 @llvm.fshl.i66(i66, i66, i66)

	define <2 x i128> @fshl_v2i128_3rd_arg_vec_const_lanes_different(<2 x i128> %a, <2 x i128> %b) {			define <2 x i128> @fshl_v2i128_3rd_arg_vec_const_lanes_different(<2 x i128> %a, <2 x i128> %b) {
	; CHECK-LABEL: 'fshl_v2i128_3rd_arg_vec_const_lanes_different'			; CHECK-LABEL: 'fshl_v2i128_3rd_arg_vec_const_lanes_different'
	; CHECK-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %fshl = tail call <2 x i128> @llvm.fshl.v2i128(<2 x i128> %a, <2 x i128> %b, <2 x i128> <i128 1, i128 2>)			; CHECK-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %fshl = tail call <2 x i128> @llvm.fshl.v2i128(<2 x i128> %a, <2 x i128> %b, <2 x i128> <i128 1, i128 2>)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <2 x i128> %fshl			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <2 x i128> %fshl
	;			;
	entry:			entry:
	%fshl = tail call <2 x i128> @llvm.fshl.v4i128(<2 x i128> %a, <2 x i128> %b, <2 x i128> <i128 1, i128 2>)			%fshl = tail call <2 x i128> @llvm.fshl.v4i128(<2 x i128> %a, <2 x i128> %b, <2 x i128> <i128 1, i128 2>)
	ret <2 x i128> %fshl			ret <2 x i128> %fshl
	}			}
	declare <2 x i128> @llvm.fshl.v4i128(<2 x i128>, <2 x i128>, <2 x i128>)			declare <2 x i128> @llvm.fshl.v4i128(<2 x i128>, <2 x i128>, <2 x i128>)

	Show All 11 Lines

llvm/test/Analysis/CostModel/AArch64/fshr.ll

	Show First 20 Lines • Show All 110 Lines • ▼ Show 20 Lines
	;			;
	entry:			entry:
	%fshr = tail call <16 x i8> @llvm.fshr.v16i8(<16 x i8> %a, <16 x i8> %b, <16 x i8> <i8 9, i8 1, i8 13, i8 7, i8 31, i8 23, i8 43, i8 51, i8 3, i8 3, i8 17, i8 3, i8 11, i8 15, i8 3, i8 3>)			%fshr = tail call <16 x i8> @llvm.fshr.v16i8(<16 x i8> %a, <16 x i8> %b, <16 x i8> <i8 9, i8 1, i8 13, i8 7, i8 31, i8 23, i8 43, i8 51, i8 3, i8 3, i8 17, i8 3, i8 11, i8 15, i8 3, i8 3>)
	ret <16 x i8> %fshr			ret <16 x i8> %fshr
	}			}

	define <16 x i8> @fshr_v16i8_3rd_arg_var(<16 x i8> %a, <16 x i8> %b, <16 x i8> %c) {			define <16 x i8> @fshr_v16i8_3rd_arg_var(<16 x i8> %a, <16 x i8> %b, <16 x i8> %c) {
	; CHECK-LABEL: 'fshr_v16i8_3rd_arg_var'			; CHECK-LABEL: 'fshr_v16i8_3rd_arg_var'
	; CHECK-NEXT: Cost Model: Found an estimated cost of 144 for instruction: %fshr = tail call <16 x i8> @llvm.fshr.v16i8(<16 x i8> %a, <16 x i8> %b, <16 x i8> %c)			; CHECK-NEXT: Cost Model: Found an estimated cost of 118 for instruction: %fshr = tail call <16 x i8> @llvm.fshr.v16i8(<16 x i8> %a, <16 x i8> %b, <16 x i8> %c)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <16 x i8> %fshr			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <16 x i8> %fshr
	;			;
	entry:			entry:
	%fshr = tail call <16 x i8> @llvm.fshr.v16i8(<16 x i8> %a, <16 x i8> %b, <16 x i8> %c)			%fshr = tail call <16 x i8> @llvm.fshr.v16i8(<16 x i8> %a, <16 x i8> %b, <16 x i8> %c)
	ret <16 x i8> %fshr			ret <16 x i8> %fshr
	}			}

	declare <16 x i8> @llvm.fshr.v16i8(<16 x i8>, <16 x i8>, <16 x i8>)			declare <16 x i8> @llvm.fshr.v16i8(<16 x i8>, <16 x i8>, <16 x i8>)
	Show All 15 Lines
	;			;
	entry:			entry:
	%fshr = tail call <8 x i16> @llvm.fshr.v8i16(<8 x i16> %a, <8 x i16> %b, <8 x i16> <i16 3, i16 1, i16 13, i16 8, i16 7, i16 31, i16 43, i16 51>)			%fshr = tail call <8 x i16> @llvm.fshr.v8i16(<8 x i16> %a, <8 x i16> %b, <8 x i16> <i16 3, i16 1, i16 13, i16 8, i16 7, i16 31, i16 43, i16 51>)
	ret <8 x i16> %fshr			ret <8 x i16> %fshr
	}			}

	define <8 x i16> @fshr_v8i16_3rd_arg_var(<8 x i16> %a, <8 x i16> %b, <8 x i16> %c) {			define <8 x i16> @fshr_v8i16_3rd_arg_var(<8 x i16> %a, <8 x i16> %b, <8 x i16> %c) {
	; CHECK-LABEL: 'fshr_v8i16_3rd_arg_var'			; CHECK-LABEL: 'fshr_v8i16_3rd_arg_var'
	; CHECK-NEXT: Cost Model: Found an estimated cost of 72 for instruction: %fshr = tail call <8 x i16> @llvm.fshr.v8i16(<8 x i16> %a, <8 x i16> %b, <8 x i16> %c)			; CHECK-NEXT: Cost Model: Found an estimated cost of 62 for instruction: %fshr = tail call <8 x i16> @llvm.fshr.v8i16(<8 x i16> %a, <8 x i16> %b, <8 x i16> %c)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <8 x i16> %fshr			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <8 x i16> %fshr
	;			;
	entry:			entry:
	%fshr = tail call <8 x i16> @llvm.fshr.v8i16(<8 x i16> %a, <8 x i16> %b, <8 x i16> %c)			%fshr = tail call <8 x i16> @llvm.fshr.v8i16(<8 x i16> %a, <8 x i16> %b, <8 x i16> %c)
	ret <8 x i16> %fshr			ret <8 x i16> %fshr
	}			}

	declare <8 x i16> @llvm.fshr.v8i16(<8 x i16>, <8 x i16>, <8 x i16>)			declare <8 x i16> @llvm.fshr.v8i16(<8 x i16>, <8 x i16>, <8 x i16>)
	Show All 15 Lines
	;			;
	entry:			entry:
	%fshr = tail call <4 x i32> @llvm.fshr.v4i32(<4 x i32> %a, <4 x i32> %b, <4 x i32> <i32 1, i32 3, i32 11, i32 2>)			%fshr = tail call <4 x i32> @llvm.fshr.v4i32(<4 x i32> %a, <4 x i32> %b, <4 x i32> <i32 1, i32 3, i32 11, i32 2>)
	ret <4 x i32> %fshr			ret <4 x i32> %fshr
	}			}

	define <4 x i32> @fshr_v4i32_3rd_arg_var(<4 x i32> %a, <4 x i32> %b, <4 x i32> %c) {			define <4 x i32> @fshr_v4i32_3rd_arg_var(<4 x i32> %a, <4 x i32> %b, <4 x i32> %c) {
	; CHECK-LABEL: 'fshr_v4i32_3rd_arg_var'			; CHECK-LABEL: 'fshr_v4i32_3rd_arg_var'
	; CHECK-NEXT: Cost Model: Found an estimated cost of 36 for instruction: %fshr = tail call <4 x i32> @llvm.fshr.v4i32(<4 x i32> %a, <4 x i32> %b, <4 x i32> %c)			; CHECK-NEXT: Cost Model: Found an estimated cost of 34 for instruction: %fshr = tail call <4 x i32> @llvm.fshr.v4i32(<4 x i32> %a, <4 x i32> %b, <4 x i32> %c)
				efriedmaUnsubmitted Not Done Reply Inline Actions Weird cost modeling. efriedma: Weird cost modeling.
				dmgreenAuthorUnsubmitted Done Reply Inline Actions Yep I think we have only looked at cost modelling for constant funnel shifts, and those were added fairly recently. I believe the codegen should also be improved for the variable case. dmgreen: Yep I think we have only looked at cost modelling for constant funnel shifts, and those were…
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i32> %fshr			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i32> %fshr
	;			;
	entry:			entry:
	%fshr = tail call <4 x i32> @llvm.fshr.v4i32(<4 x i32> %a, <4 x i32> %b, <4 x i32> %c)			%fshr = tail call <4 x i32> @llvm.fshr.v4i32(<4 x i32> %a, <4 x i32> %b, <4 x i32> %c)
	ret <4 x i32> %fshr			ret <4 x i32> %fshr
	}			}

	declare <4 x i32> @llvm.fshr.v4i32(<4 x i32>, <4 x i32>, <4 x i32>)			declare <4 x i32> @llvm.fshr.v4i32(<4 x i32>, <4 x i32>, <4 x i32>)
	Show All 15 Lines
	;			;
	entry:			entry:
	%fshr = tail call <2 x i64> @llvm.fshr.v4i64(<2 x i64> %a, <2 x i64> %b, <2 x i64> <i64 1, i64 2>)			%fshr = tail call <2 x i64> @llvm.fshr.v4i64(<2 x i64> %a, <2 x i64> %b, <2 x i64> <i64 1, i64 2>)
	ret <2 x i64> %fshr			ret <2 x i64> %fshr
	}			}

	define <2 x i64> @fshr_v2i64_3rd_arg_var(<2 x i64> %a, <2 x i64> %b, <2 x i64> %c) {			define <2 x i64> @fshr_v2i64_3rd_arg_var(<2 x i64> %a, <2 x i64> %b, <2 x i64> %c) {
	; CHECK-LABEL: 'fshr_v2i64_3rd_arg_var'			; CHECK-LABEL: 'fshr_v2i64_3rd_arg_var'
	; CHECK-NEXT: Cost Model: Found an estimated cost of 30 for instruction: %fshr = tail call <2 x i64> @llvm.fshr.v2i64(<2 x i64> %a, <2 x i64> %b, <2 x i64> %c)			; CHECK-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %fshr = tail call <2 x i64> @llvm.fshr.v2i64(<2 x i64> %a, <2 x i64> %b, <2 x i64> %c)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <2 x i64> %fshr			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <2 x i64> %fshr
	;			;
	entry:			entry:
	%fshr = tail call <2 x i64> @llvm.fshr.v4i64(<2 x i64> %a, <2 x i64> %b, <2 x i64> %c)			%fshr = tail call <2 x i64> @llvm.fshr.v4i64(<2 x i64> %a, <2 x i64> %b, <2 x i64> %c)
	ret <2 x i64> %fshr			ret <2 x i64> %fshr
	}			}

	declare <2 x i64> @llvm.fshr.v4i64(<2 x i64>, <2 x i64>, <2 x i64>)			declare <2 x i64> @llvm.fshr.v4i64(<2 x i64>, <2 x i64>, <2 x i64>)

	define <4 x i30> @fshr_v4i30_3rd_arg_var(<4 x i30> %a, <4 x i30> %b, <4 x i30> %c) {			define <4 x i30> @fshr_v4i30_3rd_arg_var(<4 x i30> %a, <4 x i30> %b, <4 x i30> %c) {
	; CHECK-LABEL: 'fshr_v4i30_3rd_arg_var'			; CHECK-LABEL: 'fshr_v4i30_3rd_arg_var'
	; CHECK-NEXT: Cost Model: Found an estimated cost of 36 for instruction: %fshr = tail call <4 x i30> @llvm.fshr.v4i30(<4 x i30> %a, <4 x i30> %b, <4 x i30> %c)			; CHECK-NEXT: Cost Model: Found an estimated cost of 34 for instruction: %fshr = tail call <4 x i30> @llvm.fshr.v4i30(<4 x i30> %a, <4 x i30> %b, <4 x i30> %c)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i30> %fshr			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i30> %fshr
	;			;
	entry:			entry:
	%fshr = tail call <4 x i30> @llvm.fshr.v4i30(<4 x i30> %a, <4 x i30> %b, <4 x i30> %c)			%fshr = tail call <4 x i30> @llvm.fshr.v4i30(<4 x i30> %a, <4 x i30> %b, <4 x i30> %c)
	ret <4 x i30> %fshr			ret <4 x i30> %fshr
	}			}

	declare <4 x i30> @llvm.fshr.v4i30(<4 x i30>, <4 x i30>, <4 x i30>)			declare <4 x i30> @llvm.fshr.v4i30(<4 x i30>, <4 x i30>, <4 x i30>)

	define <2 x i66> @fshr_v2i66_3rd_arg_vec_const_lanes_different(<2 x i66> %a, <2 x i66> %b) {			define <2 x i66> @fshr_v2i66_3rd_arg_vec_const_lanes_different(<2 x i66> %a, <2 x i66> %b) {
	; CHECK-LABEL: 'fshr_v2i66_3rd_arg_vec_const_lanes_different'			; CHECK-LABEL: 'fshr_v2i66_3rd_arg_vec_const_lanes_different'
	; CHECK-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %fshr = tail call <2 x i66> @llvm.fshr.v2i66(<2 x i66> %a, <2 x i66> %b, <2 x i66> <i66 1, i66 2>)			; CHECK-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %fshr = tail call <2 x i66> @llvm.fshr.v2i66(<2 x i66> %a, <2 x i66> %b, <2 x i66> <i66 1, i66 2>)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <2 x i66> %fshr			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <2 x i66> %fshr
	;			;
	entry:			entry:
	%fshr = tail call <2 x i66> @llvm.fshr.v4i66(<2 x i66> %a, <2 x i66> %b, <2 x i66> <i66 1, i66 2>)			%fshr = tail call <2 x i66> @llvm.fshr.v4i66(<2 x i66> %a, <2 x i66> %b, <2 x i66> <i66 1, i66 2>)
	ret <2 x i66> %fshr			ret <2 x i66> %fshr
	}			}
	declare <2 x i66> @llvm.fshr.v4i66(<2 x i66>, <2 x i66>, <2 x i66>)			declare <2 x i66> @llvm.fshr.v4i66(<2 x i66>, <2 x i66>, <2 x i66>)

	define i66 @fshr_i66(i66 %a, i66 %b) {			define i66 @fshr_i66(i66 %a, i66 %b) {
	; CHECK-LABEL: 'fshr_i66'			; CHECK-LABEL: 'fshr_i66'
	; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %fshr = tail call i66 @llvm.fshr.i66(i66 %a, i66 %b, i66 9)			; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %fshr = tail call i66 @llvm.fshr.i66(i66 %a, i66 %b, i66 9)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i66 %fshr			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i66 %fshr
	;			;
	entry:			entry:
	%fshr = tail call i66 @llvm.fshr.i66(i66 %a, i66 %b, i66 9)			%fshr = tail call i66 @llvm.fshr.i66(i66 %a, i66 %b, i66 9)
	ret i66 %fshr			ret i66 %fshr
	}			}

	declare i66 @llvm.fshr.i66(i66, i66, i66)			declare i66 @llvm.fshr.i66(i66, i66, i66)

	define <2 x i128> @fshr_v2i128_3rd_arg_vec_const_lanes_different(<2 x i128> %a, <2 x i128> %b) {			define <2 x i128> @fshr_v2i128_3rd_arg_vec_const_lanes_different(<2 x i128> %a, <2 x i128> %b) {
	; CHECK-LABEL: 'fshr_v2i128_3rd_arg_vec_const_lanes_different'			; CHECK-LABEL: 'fshr_v2i128_3rd_arg_vec_const_lanes_different'
	; CHECK-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %fshr = tail call <2 x i128> @llvm.fshr.v2i128(<2 x i128> %a, <2 x i128> %b, <2 x i128> <i128 1, i128 2>)			; CHECK-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %fshr = tail call <2 x i128> @llvm.fshr.v2i128(<2 x i128> %a, <2 x i128> %b, <2 x i128> <i128 1, i128 2>)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <2 x i128> %fshr			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <2 x i128> %fshr
	;			;
	entry:			entry:
	%fshr = tail call <2 x i128> @llvm.fshr.v4i128(<2 x i128> %a, <2 x i128> %b, <2 x i128> <i128 1, i128 2>)			%fshr = tail call <2 x i128> @llvm.fshr.v4i128(<2 x i128> %a, <2 x i128> %b, <2 x i128> <i128 1, i128 2>)
	ret <2 x i128> %fshr			ret <2 x i128> %fshr
	}			}
	declare <2 x i128> @llvm.fshr.v4i128(<2 x i128>, <2 x i128>, <2 x i128>)			declare <2 x i128> @llvm.fshr.v4i128(<2 x i128>, <2 x i128>, <2 x i128>)

	Show All 11 Lines

llvm/test/Analysis/CostModel/AArch64/getIntrinsicInstrCost-vector-reverse.ll

	Show All 15 Lines
	; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %7 = call <2 x i64> @llvm.experimental.vector.reverse.v2i64(<2 x i64> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %7 = call <2 x i64> @llvm.experimental.vector.reverse.v2i64(<2 x i64> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %8 = call <4 x i64> @llvm.experimental.vector.reverse.v4i64(<4 x i64> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %8 = call <4 x i64> @llvm.experimental.vector.reverse.v4i64(<4 x i64> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %9 = call <8 x half> @llvm.experimental.vector.reverse.v8f16(<8 x half> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %9 = call <8 x half> @llvm.experimental.vector.reverse.v8f16(<8 x half> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %10 = call <16 x half> @llvm.experimental.vector.reverse.v16f16(<16 x half> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %10 = call <16 x half> @llvm.experimental.vector.reverse.v16f16(<16 x half> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %11 = call <4 x float> @llvm.experimental.vector.reverse.v4f32(<4 x float> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %11 = call <4 x float> @llvm.experimental.vector.reverse.v4f32(<4 x float> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %12 = call <8 x float> @llvm.experimental.vector.reverse.v8f32(<8 x float> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %12 = call <8 x float> @llvm.experimental.vector.reverse.v8f32(<8 x float> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %13 = call <2 x double> @llvm.experimental.vector.reverse.v2f64(<2 x double> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %13 = call <2 x double> @llvm.experimental.vector.reverse.v2f64(<2 x double> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %14 = call <4 x double> @llvm.experimental.vector.reverse.v4f64(<4 x double> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %14 = call <4 x double> @llvm.experimental.vector.reverse.v4f64(<4 x double> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 42 for instruction: %15 = call <8 x bfloat> @llvm.experimental.vector.reverse.v8bf16(<8 x bfloat> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %15 = call <8 x bfloat> @llvm.experimental.vector.reverse.v8bf16(<8 x bfloat> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 84 for instruction: %16 = call <16 x bfloat> @llvm.experimental.vector.reverse.v16bf16(<16 x bfloat> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 56 for instruction: %16 = call <16 x bfloat> @llvm.experimental.vector.reverse.v16bf16(<16 x bfloat> undef)
				efriedmaUnsubmitted Not Done Reply Inline Actions Weird cost modeling. efriedma: Weird cost modeling.
				dmgreenAuthorUnsubmitted Done Reply Inline Actions I agree, but the codegen looks odd without +bf16 too: https://godbolt.org/z/oTG5ae6nP dmgreen: I agree, but the codegen looks odd without +bf16 too: https://godbolt.org/z/oTG5ae6nP
				efriedmaUnsubmitted Not Done Reply Inline Actions Still way overestimating the cost... but yes. efriedma: Still way overestimating the cost... but yes.
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	;			;

	call <16 x i8> @llvm.experimental.vector.reverse.v16i8(<16 x i8> undef)			call <16 x i8> @llvm.experimental.vector.reverse.v16i8(<16 x i8> undef)
	call <32 x i8> @llvm.experimental.vector.reverse.v32i8(<32 x i8> undef)			call <32 x i8> @llvm.experimental.vector.reverse.v32i8(<32 x i8> undef)
	call <8 x i16> @llvm.experimental.vector.reverse.v8i16(<8 x i16> undef)			call <8 x i16> @llvm.experimental.vector.reverse.v8i16(<8 x i16> undef)
	call <16 x i16> @llvm.experimental.vector.reverse.v16i16(<16 x i16> undef)			call <16 x i16> @llvm.experimental.vector.reverse.v16i16(<16 x i16> undef)
	call <4 x i32> @llvm.experimental.vector.reverse.v4i32(<4 x i32> undef)			call <4 x i32> @llvm.experimental.vector.reverse.v4i32(<4 x i32> undef)
	Show All 31 Lines

llvm/test/Analysis/CostModel/AArch64/insert-extract.ll

; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py
; RUN: opt < %s -passes="print<cost-model>" 2>&1 -disable-output \| FileCheck %s		; RUN: opt < %s -passes="print<cost-model>" 2>&1 -disable-output \| FileCheck %s
; RUN: opt < %s -passes="print<cost-model>" 2>&1 -disable-output -mcpu=neoverse-n1 \| FileCheck %s		; RUN: opt < %s -passes="print<cost-model>" 2>&1 -disable-output -mcpu=neoverse-n1 \| FileCheck %s
; RUN: opt < %s -passes="print<cost-model>" 2>&1 -disable-output -mcpu=neoverse-n2 \| FileCheck %s		; RUN: opt < %s -passes="print<cost-model>" 2>&1 -disable-output -mcpu=neoverse-n2 \| FileCheck %s
; RUN: opt < %s -passes="print<cost-model>" 2>&1 -disable-output -mcpu=neoverse-v1 \| FileCheck %s		; RUN: opt < %s -passes="print<cost-model>" 2>&1 -disable-output -mcpu=neoverse-v1 \| FileCheck %s
; RUN: opt < %s -passes="print<cost-model>" 2>&1 -disable-output -mcpu=neoverse-v2 \| FileCheck %s		; RUN: opt < %s -passes="print<cost-model>" 2>&1 -disable-output -mcpu=neoverse-v2 \| FileCheck %s
; RUN: opt < %s -passes="print<cost-model>" 2>&1 -disable-output -mcpu=kryo \| FileCheck %s --check-prefix=KRYO		; RUN: opt < %s -passes="print<cost-model>" 2>&1 -disable-output -mcpu=kryo \| FileCheck %s --check-prefix=KRYO

target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"		target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"
target triple = "aarch64--linux-gnu"		target triple = "aarch64--linux-gnu"

define void @vectorInstrCost() {		define void @vectorInstrCost() {
; CHECK-LABEL: 'vectorInstrCost'		; CHECK-LABEL: 'vectorInstrCost'
; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %ta0 = extractelement <8 x i1> undef, i32 0		; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %ta0 = extractelement <8 x i1> undef, i32 0
; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %ta1 = extractelement <8 x i1> undef, i32 1		; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %ta1 = extractelement <8 x i1> undef, i32 1
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %t1 = extractelement <8 x i8> undef, i32 0		; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %t1 = extractelement <8 x i8> undef, i32 0
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %t2 = extractelement <8 x i8> undef, i32 1		; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %t2 = extractelement <8 x i8> undef, i32 1
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %t3 = extractelement <4 x i16> undef, i32 0		; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %t3 = extractelement <4 x i16> undef, i32 0
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %t4 = extractelement <4 x i16> undef, i32 1		; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %t4 = extractelement <4 x i16> undef, i32 1
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %t5 = extractelement <2 x i32> undef, i32 0		; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %t5 = extractelement <2 x i32> undef, i32 0
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %t6 = extractelement <2 x i32> undef, i32 1		; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %t6 = extractelement <2 x i32> undef, i32 1
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %t7 = extractelement <2 x i64> undef, i32 0		; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %t7 = extractelement <2 x i64> undef, i32 0
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %t8 = extractelement <2 x i64> undef, i32 1		; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %t8 = extractelement <2 x i64> undef, i32 1
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %t9 = extractelement <4 x half> undef, i32 0		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %t9 = extractelement <4 x half> undef, i32 0
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %t10 = extractelement <4 x half> undef, i32 1		; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %t10 = extractelement <4 x half> undef, i32 1
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %t11 = extractelement <2 x float> undef, i32 0		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %t11 = extractelement <2 x float> undef, i32 0
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %t12 = extractelement <2 x float> undef, i32 1		; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %t12 = extractelement <2 x float> undef, i32 1
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %t13 = extractelement <2 x double> undef, i32 0		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %t13 = extractelement <2 x double> undef, i32 0
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %t14 = extractelement <2 x double> undef, i32 1		; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %t14 = extractelement <2 x double> undef, i32 1
; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %t31 = insertelement <8 x i1> undef, i1 false, i32 0		; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %t31 = insertelement <8 x i1> undef, i1 false, i32 0
; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %t41 = insertelement <8 x i1> undef, i1 true, i32 1		; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %t41 = insertelement <8 x i1> undef, i1 true, i32 1
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %t30 = insertelement <8 x i8> undef, i8 0, i32 0		; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %t30 = insertelement <8 x i8> undef, i8 0, i32 0
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %t40 = insertelement <8 x i8> undef, i8 1, i32 1		; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %t40 = insertelement <8 x i8> undef, i8 1, i32 1
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %t50 = insertelement <4 x i16> undef, i16 2, i32 0		; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %t50 = insertelement <4 x i16> undef, i16 2, i32 0
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %t60 = insertelement <4 x i16> undef, i16 3, i32 1		; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %t60 = insertelement <4 x i16> undef, i16 3, i32 1
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %t70 = insertelement <2 x i32> undef, i32 4, i32 0		; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %t70 = insertelement <2 x i32> undef, i32 4, i32 0
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %t80 = insertelement <2 x i32> undef, i32 5, i32 1		; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %t80 = insertelement <2 x i32> undef, i32 5, i32 1
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %t90 = insertelement <2 x i64> undef, i64 6, i32 0		; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %t90 = insertelement <2 x i64> undef, i64 6, i32 0
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %t100 = insertelement <2 x i64> undef, i64 7, i32 1		; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %t100 = insertelement <2 x i64> undef, i64 7, i32 1
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %t110 = insertelement <4 x half> zeroinitializer, half 0xH0000, i64 0		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %t110 = insertelement <4 x half> zeroinitializer, half 0xH0000, i64 0
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %t120 = insertelement <4 x half> zeroinitializer, half 0xH0000, i64 1		; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %t120 = insertelement <4 x half> zeroinitializer, half 0xH0000, i64 1
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %t130 = insertelement <2 x float> zeroinitializer, float 0.000000e+00, i64 0		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %t130 = insertelement <2 x float> zeroinitializer, float 0.000000e+00, i64 0
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %t140 = insertelement <2 x float> zeroinitializer, float 0.000000e+00, i64 1		; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %t140 = insertelement <2 x float> zeroinitializer, float 0.000000e+00, i64 1
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %t150 = insertelement <2 x double> zeroinitializer, double 0.000000e+00, i64 0		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %t150 = insertelement <2 x double> zeroinitializer, double 0.000000e+00, i64 0
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %t160 = insertelement <2 x double> zeroinitializer, double 0.000000e+00, i64 1		; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %t160 = insertelement <2 x double> zeroinitializer, double 0.000000e+00, i64 1
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
; KRYO-LABEL: 'vectorInstrCost'		; KRYO-LABEL: 'vectorInstrCost'
; KRYO-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %ta0 = extractelement <8 x i1> undef, i32 0		; KRYO-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %ta0 = extractelement <8 x i1> undef, i32 0
; KRYO-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %ta1 = extractelement <8 x i1> undef, i32 1		; KRYO-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %ta1 = extractelement <8 x i1> undef, i32 1
; KRYO-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %t1 = extractelement <8 x i8> undef, i32 0		; KRYO-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %t1 = extractelement <8 x i8> undef, i32 0
; KRYO-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %t2 = extractelement <8 x i8> undef, i32 1		; KRYO-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %t2 = extractelement <8 x i8> undef, i32 1
; KRYO-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %t3 = extractelement <4 x i16> undef, i32 0		; KRYO-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %t3 = extractelement <4 x i16> undef, i32 0
▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines	;
ret void		ret void
}		}

;; LD1: Load one single-element structure to one lane of one register.		;; LD1: Load one single-element structure to one lane of one register.

define <8 x i8> @LD1_B(<8 x i8> %vec, ptr noundef %i) {		define <8 x i8> @LD1_B(<8 x i8> %vec, ptr noundef %i) {
; CHECK-LABEL: 'LD1_B'		; CHECK-LABEL: 'LD1_B'
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v1 = load i8, ptr %i, align 1		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v1 = load i8, ptr %i, align 1
; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v2 = insertelement <8 x i8> %vec, i8 %v1, i32 1		; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v2 = insertelement <8 x i8> %vec, i8 %v1, i32 1
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <8 x i8> %v2		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <8 x i8> %v2
;		;
; KRYO-LABEL: 'LD1_B'		; KRYO-LABEL: 'LD1_B'
; KRYO-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v1 = load i8, ptr %i, align 1		; KRYO-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v1 = load i8, ptr %i, align 1
; KRYO-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v2 = insertelement <8 x i8> %vec, i8 %v1, i32 1		; KRYO-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v2 = insertelement <8 x i8> %vec, i8 %v1, i32 1
; KRYO-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <8 x i8> %v2		; KRYO-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <8 x i8> %v2
;		;
entry:		entry:
%v1 = load i8, ptr %i, align 1		%v1 = load i8, ptr %i, align 1
%v2 = insertelement <8 x i8> %vec, i8 %v1, i32 1		%v2 = insertelement <8 x i8> %vec, i8 %v1, i32 1
ret <8x i8> %v2		ret <8x i8> %v2
}		}

define <4 x i16> @LD1_H(<4 x i16> %vec, ptr noundef %i) {		define <4 x i16> @LD1_H(<4 x i16> %vec, ptr noundef %i) {
; CHECK-LABEL: 'LD1_H'		; CHECK-LABEL: 'LD1_H'
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v1 = load i16, ptr %i, align 2		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v1 = load i16, ptr %i, align 2
; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v2 = insertelement <4 x i16> %vec, i16 %v1, i32 2		; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v2 = insertelement <4 x i16> %vec, i16 %v1, i32 2
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i16> %v2		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i16> %v2
;		;
; KRYO-LABEL: 'LD1_H'		; KRYO-LABEL: 'LD1_H'
; KRYO-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v1 = load i16, ptr %i, align 2		; KRYO-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v1 = load i16, ptr %i, align 2
; KRYO-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v2 = insertelement <4 x i16> %vec, i16 %v1, i32 2		; KRYO-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v2 = insertelement <4 x i16> %vec, i16 %v1, i32 2
; KRYO-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i16> %v2		; KRYO-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i16> %v2
;		;
entry:		entry:
%v1 = load i16, ptr %i, align 2		%v1 = load i16, ptr %i, align 2
%v2 = insertelement <4 x i16> %vec, i16 %v1, i32 2		%v2 = insertelement <4 x i16> %vec, i16 %v1, i32 2
ret <4 x i16> %v2		ret <4 x i16> %v2
}		}

define <4 x i32> @LD1_W(<4 x i32> %vec, ptr noundef %i) {		define <4 x i32> @LD1_W(<4 x i32> %vec, ptr noundef %i) {
; CHECK-LABEL: 'LD1_W'		; CHECK-LABEL: 'LD1_W'
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v1 = load i32, ptr %i, align 4		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v1 = load i32, ptr %i, align 4
; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v2 = insertelement <4 x i32> %vec, i32 %v1, i32 3		; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v2 = insertelement <4 x i32> %vec, i32 %v1, i32 3
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i32> %v2		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i32> %v2
;		;
; KRYO-LABEL: 'LD1_W'		; KRYO-LABEL: 'LD1_W'
; KRYO-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v1 = load i32, ptr %i, align 4		; KRYO-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v1 = load i32, ptr %i, align 4
; KRYO-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v2 = insertelement <4 x i32> %vec, i32 %v1, i32 3		; KRYO-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v2 = insertelement <4 x i32> %vec, i32 %v1, i32 3
; KRYO-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i32> %v2		; KRYO-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i32> %v2
;		;
entry:		entry:
%v1 = load i32, ptr %i, align 4		%v1 = load i32, ptr %i, align 4
%v2 = insertelement <4 x i32> %vec, i32 %v1, i32 3		%v2 = insertelement <4 x i32> %vec, i32 %v1, i32 3
ret <4 x i32> %v2		ret <4 x i32> %v2
}		}

define <2 x i64> @LD1_X(<2 x i64> %vec, ptr noundef %i) {		define <2 x i64> @LD1_X(<2 x i64> %vec, ptr noundef %i) {
; CHECK-LABEL: 'LD1_X'		; CHECK-LABEL: 'LD1_X'
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v1 = load i64, ptr %i, align 8		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v1 = load i64, ptr %i, align 8
; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v2 = insertelement <2 x i64> %vec, i64 %v1, i32 0		; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v2 = insertelement <2 x i64> %vec, i64 %v1, i32 0
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <2 x i64> %v2		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <2 x i64> %v2
;		;
; KRYO-LABEL: 'LD1_X'		; KRYO-LABEL: 'LD1_X'
; KRYO-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v1 = load i64, ptr %i, align 8		; KRYO-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v1 = load i64, ptr %i, align 8
; KRYO-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v2 = insertelement <2 x i64> %vec, i64 %v1, i32 0		; KRYO-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v2 = insertelement <2 x i64> %vec, i64 %v1, i32 0
; KRYO-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <2 x i64> %v2		; KRYO-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <2 x i64> %v2
;		;
entry:		entry:
%v1 = load i64, ptr %i, align 8		%v1 = load i64, ptr %i, align 8
%v2 = insertelement <2 x i64> %vec, i64 %v1, i32 0		%v2 = insertelement <2 x i64> %vec, i64 %v1, i32 0
ret <2 x i64> %v2		ret <2 x i64> %v2
}		}

llvm/test/Analysis/CostModel/AArch64/masked_ldst.ll

	; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py
	; RUN: opt < %s -passes="print<cost-model>" 2>&1 -disable-output -mtriple=aarch64-linux-gnu -mattr=+sve \| FileCheck %s			; RUN: opt < %s -passes="print<cost-model>" 2>&1 -disable-output -mtriple=aarch64-linux-gnu -mattr=+sve \| FileCheck %s

	target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"

	define void @fixed() {			define void @fixed() {
	; CHECK-LABEL: 'fixed'			; CHECK-LABEL: 'fixed'
	; CHECK-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %v2i8 = call <2 x i8> @llvm.masked.load.v2i8.p0(ptr undef, i32 8, <2 x i1> undef, <2 x i8> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %v2i8 = call <2 x i8> @llvm.masked.load.v2i8.p0(ptr undef, i32 8, <2 x i1> undef, <2 x i8> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 25 for instruction: %v4i8 = call <4 x i8> @llvm.masked.load.v4i8.p0(ptr undef, i32 8, <4 x i1> undef, <4 x i8> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %v4i8 = call <4 x i8> @llvm.masked.load.v4i8.p0(ptr undef, i32 8, <4 x i1> undef, <4 x i8> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 53 for instruction: %v8i8 = call <8 x i8> @llvm.masked.load.v8i8.p0(ptr undef, i32 8, <8 x i1> undef, <8 x i8> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 48 for instruction: %v8i8 = call <8 x i8> @llvm.masked.load.v8i8.p0(ptr undef, i32 8, <8 x i1> undef, <8 x i8> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 109 for instruction: %v16i8 = call <16 x i8> @llvm.masked.load.v16i8.p0(ptr undef, i32 8, <16 x i1> undef, <16 x i8> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 96 for instruction: %v16i8 = call <16 x i8> @llvm.masked.load.v16i8.p0(ptr undef, i32 8, <16 x i1> undef, <16 x i8> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %v2i16 = call <2 x i16> @llvm.masked.load.v2i16.p0(ptr undef, i32 8, <2 x i1> undef, <2 x i16> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %v2i16 = call <2 x i16> @llvm.masked.load.v2i16.p0(ptr undef, i32 8, <2 x i1> undef, <2 x i16> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 25 for instruction: %v4i16 = call <4 x i16> @llvm.masked.load.v4i16.p0(ptr undef, i32 8, <4 x i1> undef, <4 x i16> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %v4i16 = call <4 x i16> @llvm.masked.load.v4i16.p0(ptr undef, i32 8, <4 x i1> undef, <4 x i16> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 53 for instruction: %v8i16 = call <8 x i16> @llvm.masked.load.v8i16.p0(ptr undef, i32 8, <8 x i1> undef, <8 x i16> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 48 for instruction: %v8i16 = call <8 x i16> @llvm.masked.load.v8i16.p0(ptr undef, i32 8, <8 x i1> undef, <8 x i16> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %v2i32 = call <2 x i32> @llvm.masked.load.v2i32.p0(ptr undef, i32 8, <2 x i1> undef, <2 x i32> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %v2i32 = call <2 x i32> @llvm.masked.load.v2i32.p0(ptr undef, i32 8, <2 x i1> undef, <2 x i32> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 25 for instruction: %v4i32 = call <4 x i32> @llvm.masked.load.v4i32.p0(ptr undef, i32 8, <4 x i1> undef, <4 x i32> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %v4i32 = call <4 x i32> @llvm.masked.load.v4i32.p0(ptr undef, i32 8, <4 x i1> undef, <4 x i32> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %v2i64 = call <2 x i64> @llvm.masked.load.v2i64.p0(ptr undef, i32 8, <2 x i1> undef, <2 x i64> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %v2i64 = call <2 x i64> @llvm.masked.load.v2i64.p0(ptr undef, i32 8, <2 x i1> undef, <2 x i64> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %v2f16 = call <2 x half> @llvm.masked.load.v2f16.p0(ptr undef, i32 8, <2 x i1> undef, <2 x half> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %v2f16 = call <2 x half> @llvm.masked.load.v2f16.p0(ptr undef, i32 8, <2 x i1> undef, <2 x half> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 25 for instruction: %v4f16 = call <4 x half> @llvm.masked.load.v4f16.p0(ptr undef, i32 8, <4 x i1> undef, <4 x half> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 22 for instruction: %v4f16 = call <4 x half> @llvm.masked.load.v4f16.p0(ptr undef, i32 8, <4 x i1> undef, <4 x half> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 53 for instruction: %v8f16 = call <8 x half> @llvm.masked.load.v8f16.p0(ptr undef, i32 8, <8 x i1> undef, <8 x half> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 46 for instruction: %v8f16 = call <8 x half> @llvm.masked.load.v8f16.p0(ptr undef, i32 8, <8 x i1> undef, <8 x half> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %v2f32 = call <2 x float> @llvm.masked.load.v2f32.p0(ptr undef, i32 8, <2 x i1> undef, <2 x float> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %v2f32 = call <2 x float> @llvm.masked.load.v2f32.p0(ptr undef, i32 8, <2 x i1> undef, <2 x float> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 25 for instruction: %v4f32 = call <4 x float> @llvm.masked.load.v4f32.p0(ptr undef, i32 8, <4 x i1> undef, <4 x float> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 22 for instruction: %v4f32 = call <4 x float> @llvm.masked.load.v4f32.p0(ptr undef, i32 8, <4 x i1> undef, <4 x float> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %v2f64 = call <2 x double> @llvm.masked.load.v2f64.p0(ptr undef, i32 8, <2 x i1> undef, <2 x double> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %v2f64 = call <2 x double> @llvm.masked.load.v2f64.p0(ptr undef, i32 8, <2 x i1> undef, <2 x double> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 22 for instruction: %v4i64 = call <4 x i64> @llvm.masked.load.v4i64.p0(ptr undef, i32 8, <4 x i1> undef, <4 x i64> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %v4i64 = call <4 x i64> @llvm.masked.load.v4i64.p0(ptr undef, i32 8, <4 x i1> undef, <4 x i64> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 212 for instruction: %v32f16 = call <32 x half> @llvm.masked.load.v32f16.p0(ptr undef, i32 8, <32 x i1> undef, <32 x half> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 184 for instruction: %v32f16 = call <32 x half> @llvm.masked.load.v32f16.p0(ptr undef, i32 8, <32 x i1> undef, <32 x half> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	;			;
	entry:			entry:
	; Legal fixed-width integer types			; Legal fixed-width integer types
	%v2i8 = call <2 x i8> @llvm.masked.load.v2i8.p0(ptr undef, i32 8, <2 x i1> undef, <2 x i8> undef)			%v2i8 = call <2 x i8> @llvm.masked.load.v2i8.p0(ptr undef, i32 8, <2 x i1> undef, <2 x i8> undef)
	%v4i8 = call <4 x i8> @llvm.masked.load.v4i8.p0(ptr undef, i32 8, <4 x i1> undef, <4 x i8> undef)			%v4i8 = call <4 x i8> @llvm.masked.load.v4i8.p0(ptr undef, i32 8, <4 x i1> undef, <4 x i8> undef)
	%v8i8 = call <8 x i8> @llvm.masked.load.v8i8.p0(ptr undef, i32 8, <8 x i1> undef, <8 x i8> undef)			%v8i8 = call <8 x i8> @llvm.masked.load.v8i8.p0(ptr undef, i32 8, <8 x i1> undef, <8 x i8> undef)
	%v16i8 = call <16 x i8> @llvm.masked.load.v16i8.p0(ptr undef, i32 8, <16 x i1> undef, <16 x i8> undef)			%v16i8 = call <16 x i8> @llvm.masked.load.v16i8.p0(ptr undef, i32 8, <16 x i1> undef, <16 x i8> undef)
	▲ Show 20 Lines • Show All 223 Lines • Show Last 20 Lines

llvm/test/Analysis/CostModel/AArch64/mem-op-cost-model.ll

Show First 20 Lines • Show All 184 Lines • ▼ Show 20 Lines	;
%out = load <8 x i64>, ptr %ptr		%out = load <8 x i64>, ptr %ptr
ret <8 x i64> %out		ret <8 x i64> %out
}		}

declare <4 x i8> @llvm.masked.gather.v4i8.v4p0(<4 x ptr>, i32 immarg, <4 x i1>, <4 x i8>)		declare <4 x i8> @llvm.masked.gather.v4i8.v4p0(<4 x ptr>, i32 immarg, <4 x i1>, <4 x i8>)
define <4 x i8> @gather_load_4xi8_constant_mask(<4 x ptr> %ptrs) {		define <4 x i8> @gather_load_4xi8_constant_mask(<4 x ptr> %ptrs) {
; CHECK: gather_load_4xi8_constant_mask		; CHECK: gather_load_4xi8_constant_mask
; CHECK-NEON-LABEL: 'gather_load_4xi8_constant_mask'		; CHECK-NEON-LABEL: 'gather_load_4xi8_constant_mask'
; CHECK-NEON-NEXT: Cost Model: Found an estimated cost of 17 for instruction: %lv = call <4 x i8> @llvm.masked.gather.v4i8.v4p0(<4 x ptr> %ptrs, i32 1, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i8> undef)		; CHECK-NEON-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %lv = call <4 x i8> @llvm.masked.gather.v4i8.v4p0(<4 x ptr> %ptrs, i32 1, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i8> undef)
; CHECK-NEON-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i8> %lv		; CHECK-NEON-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i8> %lv
;		;
; CHECK-SVE-128-LABEL: 'gather_load_4xi8_constant_mask'		; CHECK-SVE-128-LABEL: 'gather_load_4xi8_constant_mask'
; CHECK-SVE-128-NEXT: Cost Model: Found an estimated cost of 17 for instruction: %lv = call <4 x i8> @llvm.masked.gather.v4i8.v4p0(<4 x ptr> %ptrs, i32 1, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i8> undef)		; CHECK-SVE-128-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %lv = call <4 x i8> @llvm.masked.gather.v4i8.v4p0(<4 x ptr> %ptrs, i32 1, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i8> undef)
; CHECK-SVE-128-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i8> %lv		; CHECK-SVE-128-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i8> %lv
;		;
; CHECK-SVE-256-LABEL: 'gather_load_4xi8_constant_mask'		; CHECK-SVE-256-LABEL: 'gather_load_4xi8_constant_mask'
; CHECK-SVE-256-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %lv = call <4 x i8> @llvm.masked.gather.v4i8.v4p0(<4 x ptr> %ptrs, i32 1, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i8> undef)		; CHECK-SVE-256-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %lv = call <4 x i8> @llvm.masked.gather.v4i8.v4p0(<4 x ptr> %ptrs, i32 1, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i8> undef)
; CHECK-SVE-256-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i8> %lv		; CHECK-SVE-256-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i8> %lv
;		;
; CHECK-SVE-512-LABEL: 'gather_load_4xi8_constant_mask'		; CHECK-SVE-512-LABEL: 'gather_load_4xi8_constant_mask'
; CHECK-SVE-512-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %lv = call <4 x i8> @llvm.masked.gather.v4i8.v4p0(<4 x ptr> %ptrs, i32 1, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i8> undef)		; CHECK-SVE-512-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %lv = call <4 x i8> @llvm.masked.gather.v4i8.v4p0(<4 x ptr> %ptrs, i32 1, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i8> undef)
; CHECK-SVE-512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i8> %lv		; CHECK-SVE-512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i8> %lv
;		;
%lv = call <4 x i8> @llvm.masked.gather.v4i8.v4p0(<4 x ptr> %ptrs, i32 1, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i8> undef)		%lv = call <4 x i8> @llvm.masked.gather.v4i8.v4p0(<4 x ptr> %ptrs, i32 1, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i8> undef)
ret <4 x i8> %lv		ret <4 x i8> %lv
}		}

define <4 x i8> @gather_load_4xi8_variable_mask(<4 x ptr> %ptrs, <4 x i1> %cond) {		define <4 x i8> @gather_load_4xi8_variable_mask(<4 x ptr> %ptrs, <4 x i1> %cond) {
; CHECK: gather_load_4xi8_variable_mask		; CHECK: gather_load_4xi8_variable_mask
; CHECK-NEON-LABEL: 'gather_load_4xi8_variable_mask'		; CHECK-NEON-LABEL: 'gather_load_4xi8_variable_mask'
; CHECK-NEON-NEXT: Cost Model: Found an estimated cost of 29 for instruction: %lv = call <4 x i8> @llvm.masked.gather.v4i8.v4p0(<4 x ptr> %ptrs, i32 1, <4 x i1> %cond, <4 x i8> undef)		; CHECK-NEON-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %lv = call <4 x i8> @llvm.masked.gather.v4i8.v4p0(<4 x ptr> %ptrs, i32 1, <4 x i1> %cond, <4 x i8> undef)
; CHECK-NEON-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i8> %lv		; CHECK-NEON-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i8> %lv
;		;
; CHECK-SVE-128-LABEL: 'gather_load_4xi8_variable_mask'		; CHECK-SVE-128-LABEL: 'gather_load_4xi8_variable_mask'
; CHECK-SVE-128-NEXT: Cost Model: Found an estimated cost of 29 for instruction: %lv = call <4 x i8> @llvm.masked.gather.v4i8.v4p0(<4 x ptr> %ptrs, i32 1, <4 x i1> %cond, <4 x i8> undef)		; CHECK-SVE-128-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %lv = call <4 x i8> @llvm.masked.gather.v4i8.v4p0(<4 x ptr> %ptrs, i32 1, <4 x i1> %cond, <4 x i8> undef)
; CHECK-SVE-128-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i8> %lv		; CHECK-SVE-128-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i8> %lv
;		;
; CHECK-SVE-256-LABEL: 'gather_load_4xi8_variable_mask'		; CHECK-SVE-256-LABEL: 'gather_load_4xi8_variable_mask'
; CHECK-SVE-256-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %lv = call <4 x i8> @llvm.masked.gather.v4i8.v4p0(<4 x ptr> %ptrs, i32 1, <4 x i1> %cond, <4 x i8> undef)		; CHECK-SVE-256-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %lv = call <4 x i8> @llvm.masked.gather.v4i8.v4p0(<4 x ptr> %ptrs, i32 1, <4 x i1> %cond, <4 x i8> undef)
; CHECK-SVE-256-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i8> %lv		; CHECK-SVE-256-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i8> %lv
;		;
; CHECK-SVE-512-LABEL: 'gather_load_4xi8_variable_mask'		; CHECK-SVE-512-LABEL: 'gather_load_4xi8_variable_mask'
; CHECK-SVE-512-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %lv = call <4 x i8> @llvm.masked.gather.v4i8.v4p0(<4 x ptr> %ptrs, i32 1, <4 x i1> %cond, <4 x i8> undef)		; CHECK-SVE-512-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %lv = call <4 x i8> @llvm.masked.gather.v4i8.v4p0(<4 x ptr> %ptrs, i32 1, <4 x i1> %cond, <4 x i8> undef)
; CHECK-SVE-512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i8> %lv		; CHECK-SVE-512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i8> %lv
;		;
%lv = call <4 x i8> @llvm.masked.gather.v4i8.v4p0(<4 x ptr> %ptrs, i32 1, <4 x i1> %cond, <4 x i8> undef)		%lv = call <4 x i8> @llvm.masked.gather.v4i8.v4p0(<4 x ptr> %ptrs, i32 1, <4 x i1> %cond, <4 x i8> undef)
ret <4 x i8> %lv		ret <4 x i8> %lv
}		}

declare void @llvm.masked.scatter.v4i8.v4p0(<4 x i8>, <4 x ptr>, i32 immarg, <4 x i1>)		declare void @llvm.masked.scatter.v4i8.v4p0(<4 x i8>, <4 x ptr>, i32 immarg, <4 x i1>)
define void @scatter_store_4xi8_constant_mask(<4 x i8> %val, <4 x ptr> %ptrs) {		define void @scatter_store_4xi8_constant_mask(<4 x i8> %val, <4 x ptr> %ptrs) {
; CHECK: scatter_store_4xi8_constant_mask		; CHECK: scatter_store_4xi8_constant_mask
; CHECK-NEON-LABEL: 'scatter_store_4xi8_constant_mask'		; CHECK-NEON-LABEL: 'scatter_store_4xi8_constant_mask'
; CHECK-NEON-NEXT: Cost Model: Found an estimated cost of 17 for instruction: call void @llvm.masked.scatter.v4i8.v4p0(<4 x i8> %val, <4 x ptr> %ptrs, i32 1, <4 x i1> <i1 true, i1 true, i1 true, i1 true>)		; CHECK-NEON-NEXT: Cost Model: Found an estimated cost of 16 for instruction: call void @llvm.masked.scatter.v4i8.v4p0(<4 x i8> %val, <4 x ptr> %ptrs, i32 1, <4 x i1> <i1 true, i1 true, i1 true, i1 true>)
; CHECK-NEON-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; CHECK-NEON-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
; CHECK-SVE-128-LABEL: 'scatter_store_4xi8_constant_mask'		; CHECK-SVE-128-LABEL: 'scatter_store_4xi8_constant_mask'
; CHECK-SVE-128-NEXT: Cost Model: Found an estimated cost of 17 for instruction: call void @llvm.masked.scatter.v4i8.v4p0(<4 x i8> %val, <4 x ptr> %ptrs, i32 1, <4 x i1> <i1 true, i1 true, i1 true, i1 true>)		; CHECK-SVE-128-NEXT: Cost Model: Found an estimated cost of 16 for instruction: call void @llvm.masked.scatter.v4i8.v4p0(<4 x i8> %val, <4 x ptr> %ptrs, i32 1, <4 x i1> <i1 true, i1 true, i1 true, i1 true>)
; CHECK-SVE-128-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; CHECK-SVE-128-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
; CHECK-SVE-256-LABEL: 'scatter_store_4xi8_constant_mask'		; CHECK-SVE-256-LABEL: 'scatter_store_4xi8_constant_mask'
; CHECK-SVE-256-NEXT: Cost Model: Found an estimated cost of 40 for instruction: call void @llvm.masked.scatter.v4i8.v4p0(<4 x i8> %val, <4 x ptr> %ptrs, i32 1, <4 x i1> <i1 true, i1 true, i1 true, i1 true>)		; CHECK-SVE-256-NEXT: Cost Model: Found an estimated cost of 40 for instruction: call void @llvm.masked.scatter.v4i8.v4p0(<4 x i8> %val, <4 x ptr> %ptrs, i32 1, <4 x i1> <i1 true, i1 true, i1 true, i1 true>)
; CHECK-SVE-256-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; CHECK-SVE-256-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
; CHECK-SVE-512-LABEL: 'scatter_store_4xi8_constant_mask'		; CHECK-SVE-512-LABEL: 'scatter_store_4xi8_constant_mask'
; CHECK-SVE-512-NEXT: Cost Model: Found an estimated cost of 40 for instruction: call void @llvm.masked.scatter.v4i8.v4p0(<4 x i8> %val, <4 x ptr> %ptrs, i32 1, <4 x i1> <i1 true, i1 true, i1 true, i1 true>)		; CHECK-SVE-512-NEXT: Cost Model: Found an estimated cost of 40 for instruction: call void @llvm.masked.scatter.v4i8.v4p0(<4 x i8> %val, <4 x ptr> %ptrs, i32 1, <4 x i1> <i1 true, i1 true, i1 true, i1 true>)
; CHECK-SVE-512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; CHECK-SVE-512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
call void @llvm.masked.scatter.v4i8.v4p0(<4 x i8> %val, <4 x ptr> %ptrs, i32 1, <4 x i1> <i1 true, i1 true, i1 true, i1 true>)		call void @llvm.masked.scatter.v4i8.v4p0(<4 x i8> %val, <4 x ptr> %ptrs, i32 1, <4 x i1> <i1 true, i1 true, i1 true, i1 true>)
ret void		ret void
}		}

define void @scatter_store_4xi8_variable_mask(<4 x i8> %val, <4 x ptr> %ptrs, <4 x i1> %cond) {		define void @scatter_store_4xi8_variable_mask(<4 x i8> %val, <4 x ptr> %ptrs, <4 x i1> %cond) {
; CHECK: scatter_store_4xi8_variable_mask		; CHECK: scatter_store_4xi8_variable_mask
; CHECK-NEON-LABEL: 'scatter_store_4xi8_variable_mask'		; CHECK-NEON-LABEL: 'scatter_store_4xi8_variable_mask'
; CHECK-NEON-NEXT: Cost Model: Found an estimated cost of 29 for instruction: call void @llvm.masked.scatter.v4i8.v4p0(<4 x i8> %val, <4 x ptr> %ptrs, i32 1, <4 x i1> %cond)		; CHECK-NEON-NEXT: Cost Model: Found an estimated cost of 28 for instruction: call void @llvm.masked.scatter.v4i8.v4p0(<4 x i8> %val, <4 x ptr> %ptrs, i32 1, <4 x i1> %cond)
; CHECK-NEON-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; CHECK-NEON-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
; CHECK-SVE-128-LABEL: 'scatter_store_4xi8_variable_mask'		; CHECK-SVE-128-LABEL: 'scatter_store_4xi8_variable_mask'
; CHECK-SVE-128-NEXT: Cost Model: Found an estimated cost of 29 for instruction: call void @llvm.masked.scatter.v4i8.v4p0(<4 x i8> %val, <4 x ptr> %ptrs, i32 1, <4 x i1> %cond)		; CHECK-SVE-128-NEXT: Cost Model: Found an estimated cost of 28 for instruction: call void @llvm.masked.scatter.v4i8.v4p0(<4 x i8> %val, <4 x ptr> %ptrs, i32 1, <4 x i1> %cond)
; CHECK-SVE-128-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; CHECK-SVE-128-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
; CHECK-SVE-256-LABEL: 'scatter_store_4xi8_variable_mask'		; CHECK-SVE-256-LABEL: 'scatter_store_4xi8_variable_mask'
; CHECK-SVE-256-NEXT: Cost Model: Found an estimated cost of 40 for instruction: call void @llvm.masked.scatter.v4i8.v4p0(<4 x i8> %val, <4 x ptr> %ptrs, i32 1, <4 x i1> %cond)		; CHECK-SVE-256-NEXT: Cost Model: Found an estimated cost of 40 for instruction: call void @llvm.masked.scatter.v4i8.v4p0(<4 x i8> %val, <4 x ptr> %ptrs, i32 1, <4 x i1> %cond)
; CHECK-SVE-256-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; CHECK-SVE-256-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
; CHECK-SVE-512-LABEL: 'scatter_store_4xi8_variable_mask'		; CHECK-SVE-512-LABEL: 'scatter_store_4xi8_variable_mask'
; CHECK-SVE-512-NEXT: Cost Model: Found an estimated cost of 40 for instruction: call void @llvm.masked.scatter.v4i8.v4p0(<4 x i8> %val, <4 x ptr> %ptrs, i32 1, <4 x i1> %cond)		; CHECK-SVE-512-NEXT: Cost Model: Found an estimated cost of 40 for instruction: call void @llvm.masked.scatter.v4i8.v4p0(<4 x i8> %val, <4 x ptr> %ptrs, i32 1, <4 x i1> %cond)
; CHECK-SVE-512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; CHECK-SVE-512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
call void @llvm.masked.scatter.v4i8.v4p0(<4 x i8> %val, <4 x ptr> %ptrs, i32 1, <4 x i1> %cond)		call void @llvm.masked.scatter.v4i8.v4p0(<4 x i8> %val, <4 x ptr> %ptrs, i32 1, <4 x i1> %cond)
ret void		ret void
}		}

declare <4 x i32> @llvm.masked.gather.v4i32.v4p0(<4 x ptr>, i32 immarg, <4 x i1>, <4 x i32>)		declare <4 x i32> @llvm.masked.gather.v4i32.v4p0(<4 x ptr>, i32 immarg, <4 x i1>, <4 x i32>)
define <4 x i32> @gather_load_4xi32_constant_mask(<4 x ptr> %ptrs) {		define <4 x i32> @gather_load_4xi32_constant_mask(<4 x ptr> %ptrs) {
; CHECK: gather_load_4xi32_constant_mask		; CHECK: gather_load_4xi32_constant_mask
; CHECK-NEON-LABEL: 'gather_load_4xi32_constant_mask'		; CHECK-NEON-LABEL: 'gather_load_4xi32_constant_mask'
; CHECK-NEON-NEXT: Cost Model: Found an estimated cost of 17 for instruction: %lv = call <4 x i32> @llvm.masked.gather.v4i32.v4p0(<4 x ptr> %ptrs, i32 1, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i32> undef)		; CHECK-NEON-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %lv = call <4 x i32> @llvm.masked.gather.v4i32.v4p0(<4 x ptr> %ptrs, i32 1, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i32> undef)
; CHECK-NEON-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i32> %lv		; CHECK-NEON-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i32> %lv
;		;
; CHECK-SVE-128-LABEL: 'gather_load_4xi32_constant_mask'		; CHECK-SVE-128-LABEL: 'gather_load_4xi32_constant_mask'
; CHECK-SVE-128-NEXT: Cost Model: Found an estimated cost of 17 for instruction: %lv = call <4 x i32> @llvm.masked.gather.v4i32.v4p0(<4 x ptr> %ptrs, i32 1, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i32> undef)		; CHECK-SVE-128-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %lv = call <4 x i32> @llvm.masked.gather.v4i32.v4p0(<4 x ptr> %ptrs, i32 1, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i32> undef)
; CHECK-SVE-128-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i32> %lv		; CHECK-SVE-128-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i32> %lv
;		;
; CHECK-SVE-256-LABEL: 'gather_load_4xi32_constant_mask'		; CHECK-SVE-256-LABEL: 'gather_load_4xi32_constant_mask'
; CHECK-SVE-256-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %lv = call <4 x i32> @llvm.masked.gather.v4i32.v4p0(<4 x ptr> %ptrs, i32 1, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i32> undef)		; CHECK-SVE-256-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %lv = call <4 x i32> @llvm.masked.gather.v4i32.v4p0(<4 x ptr> %ptrs, i32 1, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i32> undef)
; CHECK-SVE-256-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i32> %lv		; CHECK-SVE-256-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i32> %lv
;		;
; CHECK-SVE-512-LABEL: 'gather_load_4xi32_constant_mask'		; CHECK-SVE-512-LABEL: 'gather_load_4xi32_constant_mask'
; CHECK-SVE-512-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %lv = call <4 x i32> @llvm.masked.gather.v4i32.v4p0(<4 x ptr> %ptrs, i32 1, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i32> undef)		; CHECK-SVE-512-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %lv = call <4 x i32> @llvm.masked.gather.v4i32.v4p0(<4 x ptr> %ptrs, i32 1, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i32> undef)
; CHECK-SVE-512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i32> %lv		; CHECK-SVE-512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i32> %lv
;		;
%lv = call <4 x i32> @llvm.masked.gather.v4i32.v4p0(<4 x ptr> %ptrs, i32 1, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i32> undef)		%lv = call <4 x i32> @llvm.masked.gather.v4i32.v4p0(<4 x ptr> %ptrs, i32 1, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i32> undef)
ret <4 x i32> %lv		ret <4 x i32> %lv
}		}

define <4 x i32> @gather_load_4xi32_variable_mask(<4 x ptr> %ptrs, <4 x i1> %cond) {		define <4 x i32> @gather_load_4xi32_variable_mask(<4 x ptr> %ptrs, <4 x i1> %cond) {
; CHECK: gather_load_4xi32_variable_mask		; CHECK: gather_load_4xi32_variable_mask
; CHECK-NEON-LABEL: 'gather_load_4xi32_variable_mask'		; CHECK-NEON-LABEL: 'gather_load_4xi32_variable_mask'
; CHECK-NEON-NEXT: Cost Model: Found an estimated cost of 29 for instruction: %lv = call <4 x i32> @llvm.masked.gather.v4i32.v4p0(<4 x ptr> %ptrs, i32 1, <4 x i1> %cond, <4 x i32> undef)		; CHECK-NEON-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %lv = call <4 x i32> @llvm.masked.gather.v4i32.v4p0(<4 x ptr> %ptrs, i32 1, <4 x i1> %cond, <4 x i32> undef)
; CHECK-NEON-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i32> %lv		; CHECK-NEON-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i32> %lv
;		;
; CHECK-SVE-128-LABEL: 'gather_load_4xi32_variable_mask'		; CHECK-SVE-128-LABEL: 'gather_load_4xi32_variable_mask'
; CHECK-SVE-128-NEXT: Cost Model: Found an estimated cost of 29 for instruction: %lv = call <4 x i32> @llvm.masked.gather.v4i32.v4p0(<4 x ptr> %ptrs, i32 1, <4 x i1> %cond, <4 x i32> undef)		; CHECK-SVE-128-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %lv = call <4 x i32> @llvm.masked.gather.v4i32.v4p0(<4 x ptr> %ptrs, i32 1, <4 x i1> %cond, <4 x i32> undef)
; CHECK-SVE-128-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i32> %lv		; CHECK-SVE-128-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i32> %lv
;		;
; CHECK-SVE-256-LABEL: 'gather_load_4xi32_variable_mask'		; CHECK-SVE-256-LABEL: 'gather_load_4xi32_variable_mask'
; CHECK-SVE-256-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %lv = call <4 x i32> @llvm.masked.gather.v4i32.v4p0(<4 x ptr> %ptrs, i32 1, <4 x i1> %cond, <4 x i32> undef)		; CHECK-SVE-256-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %lv = call <4 x i32> @llvm.masked.gather.v4i32.v4p0(<4 x ptr> %ptrs, i32 1, <4 x i1> %cond, <4 x i32> undef)
; CHECK-SVE-256-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i32> %lv		; CHECK-SVE-256-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i32> %lv
;		;
; CHECK-SVE-512-LABEL: 'gather_load_4xi32_variable_mask'		; CHECK-SVE-512-LABEL: 'gather_load_4xi32_variable_mask'
; CHECK-SVE-512-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %lv = call <4 x i32> @llvm.masked.gather.v4i32.v4p0(<4 x ptr> %ptrs, i32 1, <4 x i1> %cond, <4 x i32> undef)		; CHECK-SVE-512-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %lv = call <4 x i32> @llvm.masked.gather.v4i32.v4p0(<4 x ptr> %ptrs, i32 1, <4 x i1> %cond, <4 x i32> undef)
; CHECK-SVE-512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i32> %lv		; CHECK-SVE-512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i32> %lv
;		;
%lv = call <4 x i32> @llvm.masked.gather.v4i32.v4p0(<4 x ptr> %ptrs, i32 1, <4 x i1> %cond, <4 x i32> undef)		%lv = call <4 x i32> @llvm.masked.gather.v4i32.v4p0(<4 x ptr> %ptrs, i32 1, <4 x i1> %cond, <4 x i32> undef)
ret <4 x i32> %lv		ret <4 x i32> %lv
}		}

declare void @llvm.masked.scatter.v4i32.v4p0(<4 x i32>, <4 x ptr>, i32 immarg, <4 x i1>)		declare void @llvm.masked.scatter.v4i32.v4p0(<4 x i32>, <4 x ptr>, i32 immarg, <4 x i1>)
define void @scatter_store_4xi32_constant_mask(<4 x i32> %val, <4 x ptr> %ptrs) {		define void @scatter_store_4xi32_constant_mask(<4 x i32> %val, <4 x ptr> %ptrs) {
; CHECK: scatter_store_4xi32_constant_mask		; CHECK: scatter_store_4xi32_constant_mask
; CHECK-NEON-LABEL: 'scatter_store_4xi32_constant_mask'		; CHECK-NEON-LABEL: 'scatter_store_4xi32_constant_mask'
; CHECK-NEON-NEXT: Cost Model: Found an estimated cost of 17 for instruction: call void @llvm.masked.scatter.v4i32.v4p0(<4 x i32> %val, <4 x ptr> %ptrs, i32 1, <4 x i1> <i1 true, i1 true, i1 true, i1 true>)		; CHECK-NEON-NEXT: Cost Model: Found an estimated cost of 16 for instruction: call void @llvm.masked.scatter.v4i32.v4p0(<4 x i32> %val, <4 x ptr> %ptrs, i32 1, <4 x i1> <i1 true, i1 true, i1 true, i1 true>)
; CHECK-NEON-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; CHECK-NEON-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
; CHECK-SVE-128-LABEL: 'scatter_store_4xi32_constant_mask'		; CHECK-SVE-128-LABEL: 'scatter_store_4xi32_constant_mask'
; CHECK-SVE-128-NEXT: Cost Model: Found an estimated cost of 17 for instruction: call void @llvm.masked.scatter.v4i32.v4p0(<4 x i32> %val, <4 x ptr> %ptrs, i32 1, <4 x i1> <i1 true, i1 true, i1 true, i1 true>)		; CHECK-SVE-128-NEXT: Cost Model: Found an estimated cost of 16 for instruction: call void @llvm.masked.scatter.v4i32.v4p0(<4 x i32> %val, <4 x ptr> %ptrs, i32 1, <4 x i1> <i1 true, i1 true, i1 true, i1 true>)
; CHECK-SVE-128-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; CHECK-SVE-128-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
; CHECK-SVE-256-LABEL: 'scatter_store_4xi32_constant_mask'		; CHECK-SVE-256-LABEL: 'scatter_store_4xi32_constant_mask'
; CHECK-SVE-256-NEXT: Cost Model: Found an estimated cost of 40 for instruction: call void @llvm.masked.scatter.v4i32.v4p0(<4 x i32> %val, <4 x ptr> %ptrs, i32 1, <4 x i1> <i1 true, i1 true, i1 true, i1 true>)		; CHECK-SVE-256-NEXT: Cost Model: Found an estimated cost of 40 for instruction: call void @llvm.masked.scatter.v4i32.v4p0(<4 x i32> %val, <4 x ptr> %ptrs, i32 1, <4 x i1> <i1 true, i1 true, i1 true, i1 true>)
; CHECK-SVE-256-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; CHECK-SVE-256-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
; CHECK-SVE-512-LABEL: 'scatter_store_4xi32_constant_mask'		; CHECK-SVE-512-LABEL: 'scatter_store_4xi32_constant_mask'
; CHECK-SVE-512-NEXT: Cost Model: Found an estimated cost of 40 for instruction: call void @llvm.masked.scatter.v4i32.v4p0(<4 x i32> %val, <4 x ptr> %ptrs, i32 1, <4 x i1> <i1 true, i1 true, i1 true, i1 true>)		; CHECK-SVE-512-NEXT: Cost Model: Found an estimated cost of 40 for instruction: call void @llvm.masked.scatter.v4i32.v4p0(<4 x i32> %val, <4 x ptr> %ptrs, i32 1, <4 x i1> <i1 true, i1 true, i1 true, i1 true>)
; CHECK-SVE-512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; CHECK-SVE-512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
call void @llvm.masked.scatter.v4i32.v4p0(<4 x i32> %val, <4 x ptr> %ptrs, i32 1, <4 x i1> <i1 true, i1 true, i1 true, i1 true>)		call void @llvm.masked.scatter.v4i32.v4p0(<4 x i32> %val, <4 x ptr> %ptrs, i32 1, <4 x i1> <i1 true, i1 true, i1 true, i1 true>)
ret void		ret void
}		}

define void @scatter_store_4xi32_variable_mask(<4 x i32> %val, <4 x ptr> %ptrs, <4 x i1> %cond) {		define void @scatter_store_4xi32_variable_mask(<4 x i32> %val, <4 x ptr> %ptrs, <4 x i1> %cond) {
; CHECK: scatter_store_4xi32_variable_mask		; CHECK: scatter_store_4xi32_variable_mask
; CHECK-NEON-LABEL: 'scatter_store_4xi32_variable_mask'		; CHECK-NEON-LABEL: 'scatter_store_4xi32_variable_mask'
; CHECK-NEON-NEXT: Cost Model: Found an estimated cost of 29 for instruction: call void @llvm.masked.scatter.v4i32.v4p0(<4 x i32> %val, <4 x ptr> %ptrs, i32 1, <4 x i1> %cond)		; CHECK-NEON-NEXT: Cost Model: Found an estimated cost of 28 for instruction: call void @llvm.masked.scatter.v4i32.v4p0(<4 x i32> %val, <4 x ptr> %ptrs, i32 1, <4 x i1> %cond)
; CHECK-NEON-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; CHECK-NEON-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
; CHECK-SVE-128-LABEL: 'scatter_store_4xi32_variable_mask'		; CHECK-SVE-128-LABEL: 'scatter_store_4xi32_variable_mask'
; CHECK-SVE-128-NEXT: Cost Model: Found an estimated cost of 29 for instruction: call void @llvm.masked.scatter.v4i32.v4p0(<4 x i32> %val, <4 x ptr> %ptrs, i32 1, <4 x i1> %cond)		; CHECK-SVE-128-NEXT: Cost Model: Found an estimated cost of 28 for instruction: call void @llvm.masked.scatter.v4i32.v4p0(<4 x i32> %val, <4 x ptr> %ptrs, i32 1, <4 x i1> %cond)
; CHECK-SVE-128-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; CHECK-SVE-128-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
; CHECK-SVE-256-LABEL: 'scatter_store_4xi32_variable_mask'		; CHECK-SVE-256-LABEL: 'scatter_store_4xi32_variable_mask'
; CHECK-SVE-256-NEXT: Cost Model: Found an estimated cost of 40 for instruction: call void @llvm.masked.scatter.v4i32.v4p0(<4 x i32> %val, <4 x ptr> %ptrs, i32 1, <4 x i1> %cond)		; CHECK-SVE-256-NEXT: Cost Model: Found an estimated cost of 40 for instruction: call void @llvm.masked.scatter.v4i32.v4p0(<4 x i32> %val, <4 x ptr> %ptrs, i32 1, <4 x i1> %cond)
; CHECK-SVE-256-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; CHECK-SVE-256-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
; CHECK-SVE-512-LABEL: 'scatter_store_4xi32_variable_mask'		; CHECK-SVE-512-LABEL: 'scatter_store_4xi32_variable_mask'
; CHECK-SVE-512-NEXT: Cost Model: Found an estimated cost of 40 for instruction: call void @llvm.masked.scatter.v4i32.v4p0(<4 x i32> %val, <4 x ptr> %ptrs, i32 1, <4 x i1> %cond)		; CHECK-SVE-512-NEXT: Cost Model: Found an estimated cost of 40 for instruction: call void @llvm.masked.scatter.v4i32.v4p0(<4 x i32> %val, <4 x ptr> %ptrs, i32 1, <4 x i1> %cond)
; CHECK-SVE-512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; CHECK-SVE-512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
call void @llvm.masked.scatter.v4i32.v4p0(<4 x i32> %val, <4 x ptr> %ptrs, i32 1, <4 x i1> %cond)		call void @llvm.masked.scatter.v4i32.v4p0(<4 x i32> %val, <4 x ptr> %ptrs, i32 1, <4 x i1> %cond)
ret void		ret void
}		}

declare <256 x i16> @llvm.masked.gather.v256i16.v256p0(<256 x ptr>, i32, <256 x i1>, <256 x i16>)		declare <256 x i16> @llvm.masked.gather.v256i16.v256p0(<256 x ptr>, i32, <256 x i1>, <256 x i16>)
define void @sve_gather_vls(<256 x i1> %v256i1mask) {		define void @sve_gather_vls(<256 x i1> %v256i1mask) {
; CHECK-LABEL: 'sve_scatter_vls'		; CHECK-LABEL: 'sve_scatter_vls'
; CHECK-NEON-LABEL: 'sve_gather_vls'		; CHECK-NEON-LABEL: 'sve_gather_vls'
; CHECK-NEON-NEXT: Cost Model: Found an estimated cost of 1952 for instruction: %res.v256i16 = call <256 x i16> @llvm.masked.gather.v256i16.v256p0(<256 x ptr> undef, i32 0, <256 x i1> %v256i1mask, <256 x i16> zeroinitializer)		; CHECK-NEON-NEXT: Cost Model: Found an estimated cost of 1792 for instruction: %res.v256i16 = call <256 x i16> @llvm.masked.gather.v256i16.v256p0(<256 x ptr> undef, i32 0, <256 x i1> %v256i1mask, <256 x i16> zeroinitializer)
; CHECK-NEON-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; CHECK-NEON-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
; CHECK-SVE-128-LABEL: 'sve_gather_vls'		; CHECK-SVE-128-LABEL: 'sve_gather_vls'
; CHECK-SVE-128-NEXT: Cost Model: Found an estimated cost of 1952 for instruction: %res.v256i16 = call <256 x i16> @llvm.masked.gather.v256i16.v256p0(<256 x ptr> undef, i32 0, <256 x i1> %v256i1mask, <256 x i16> zeroinitializer)		; CHECK-SVE-128-NEXT: Cost Model: Found an estimated cost of 1792 for instruction: %res.v256i16 = call <256 x i16> @llvm.masked.gather.v256i16.v256p0(<256 x ptr> undef, i32 0, <256 x i1> %v256i1mask, <256 x i16> zeroinitializer)
; CHECK-SVE-128-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; CHECK-SVE-128-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
; CHECK-SVE-256-LABEL: 'sve_gather_vls'		; CHECK-SVE-256-LABEL: 'sve_gather_vls'
; CHECK-SVE-256-NEXT: Cost Model: Found an estimated cost of 2560 for instruction: %res.v256i16 = call <256 x i16> @llvm.masked.gather.v256i16.v256p0(<256 x ptr> undef, i32 0, <256 x i1> %v256i1mask, <256 x i16> zeroinitializer)		; CHECK-SVE-256-NEXT: Cost Model: Found an estimated cost of 2560 for instruction: %res.v256i16 = call <256 x i16> @llvm.masked.gather.v256i16.v256p0(<256 x ptr> undef, i32 0, <256 x i1> %v256i1mask, <256 x i16> zeroinitializer)
; CHECK-SVE-256-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; CHECK-SVE-256-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
; CHECK-SVE-512-LABEL: 'sve_gather_vls'		; CHECK-SVE-512-LABEL: 'sve_gather_vls'
; CHECK-SVE-512-NEXT: Cost Model: Found an estimated cost of 2560 for instruction: %res.v256i16 = call <256 x i16> @llvm.masked.gather.v256i16.v256p0(<256 x ptr> undef, i32 0, <256 x i1> %v256i1mask, <256 x i16> zeroinitializer)		; CHECK-SVE-512-NEXT: Cost Model: Found an estimated cost of 2560 for instruction: %res.v256i16 = call <256 x i16> @llvm.masked.gather.v256i16.v256p0(<256 x ptr> undef, i32 0, <256 x i1> %v256i1mask, <256 x i16> zeroinitializer)
; CHECK-SVE-512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; CHECK-SVE-512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
entry:		entry:
%res.v256i16 = call <256 x i16> @llvm.masked.gather.v256i16.v256p0(<256 x ptr> undef, i32 0, <256 x i1> %v256i1mask, <256 x i16> zeroinitializer)		%res.v256i16 = call <256 x i16> @llvm.masked.gather.v256i16.v256p0(<256 x ptr> undef, i32 0, <256 x i1> %v256i1mask, <256 x i16> zeroinitializer)
ret void		ret void
}		}

declare <256 x float> @llvm.masked.gather.v256f32.v256p0(<256 x ptr>, i32, <256 x i1>, <256 x float>)		declare <256 x float> @llvm.masked.gather.v256f32.v256p0(<256 x ptr>, i32, <256 x i1>, <256 x float>)
define void @sve_gather_vls_float(<256 x i1> %v256i1mask) {		define void @sve_gather_vls_float(<256 x i1> %v256i1mask) {
; CHECK-LABEL: 'sve_gather_vls_float'		; CHECK-LABEL: 'sve_gather_vls_float'
; CHECK-NEON-LABEL: 'sve_gather_vls_float'		; CHECK-NEON-LABEL: 'sve_gather_vls_float'
; CHECK-NEON-NEXT: Cost Model: Found an estimated cost of 1856 for instruction: %res.v256f32 = call <256 x float> @llvm.masked.gather.v256f32.v256p0(<256 x ptr> undef, i32 0, <256 x i1> %v256i1mask, <256 x float> zeroinitializer)		; CHECK-NEON-NEXT: Cost Model: Found an estimated cost of 1664 for instruction: %res.v256f32 = call <256 x float> @llvm.masked.gather.v256f32.v256p0(<256 x ptr> undef, i32 0, <256 x i1> %v256i1mask, <256 x float> zeroinitializer)
; CHECK-NEON-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; CHECK-NEON-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
; CHECK-SVE-128-LABEL: 'sve_gather_vls_float'		; CHECK-SVE-128-LABEL: 'sve_gather_vls_float'
; CHECK-SVE-128-NEXT: Cost Model: Found an estimated cost of 1856 for instruction: %res.v256f32 = call <256 x float> @llvm.masked.gather.v256f32.v256p0(<256 x ptr> undef, i32 0, <256 x i1> %v256i1mask, <256 x float> zeroinitializer)		; CHECK-SVE-128-NEXT: Cost Model: Found an estimated cost of 1664 for instruction: %res.v256f32 = call <256 x float> @llvm.masked.gather.v256f32.v256p0(<256 x ptr> undef, i32 0, <256 x i1> %v256i1mask, <256 x float> zeroinitializer)
; CHECK-SVE-128-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; CHECK-SVE-128-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
; CHECK-SVE-256-LABEL: 'sve_gather_vls_float'		; CHECK-SVE-256-LABEL: 'sve_gather_vls_float'
; CHECK-SVE-256-NEXT: Cost Model: Found an estimated cost of 2560 for instruction: %res.v256f32 = call <256 x float> @llvm.masked.gather.v256f32.v256p0(<256 x ptr> undef, i32 0, <256 x i1> %v256i1mask, <256 x float> zeroinitializer)		; CHECK-SVE-256-NEXT: Cost Model: Found an estimated cost of 2560 for instruction: %res.v256f32 = call <256 x float> @llvm.masked.gather.v256f32.v256p0(<256 x ptr> undef, i32 0, <256 x i1> %v256i1mask, <256 x float> zeroinitializer)
; CHECK-SVE-256-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; CHECK-SVE-256-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
; CHECK-SVE-512-LABEL: 'sve_gather_vls_float'		; CHECK-SVE-512-LABEL: 'sve_gather_vls_float'
; CHECK-SVE-512-NEXT: Cost Model: Found an estimated cost of 2560 for instruction: %res.v256f32 = call <256 x float> @llvm.masked.gather.v256f32.v256p0(<256 x ptr> undef, i32 0, <256 x i1> %v256i1mask, <256 x float> zeroinitializer)		; CHECK-SVE-512-NEXT: Cost Model: Found an estimated cost of 2560 for instruction: %res.v256f32 = call <256 x float> @llvm.masked.gather.v256f32.v256p0(<256 x ptr> undef, i32 0, <256 x i1> %v256i1mask, <256 x float> zeroinitializer)
; CHECK-SVE-512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; CHECK-SVE-512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
entry:		entry:
%res.v256f32 = call <256 x float> @llvm.masked.gather.v256f32.v256p0(<256 x ptr> undef, i32 0, <256 x i1> %v256i1mask, <256 x float> zeroinitializer)		%res.v256f32 = call <256 x float> @llvm.masked.gather.v256f32.v256p0(<256 x ptr> undef, i32 0, <256 x i1> %v256i1mask, <256 x float> zeroinitializer)
ret void		ret void
}		}

declare void @llvm.masked.scatter.v256i8.v256p0(<256 x i8>, <256 x ptr>, i32, <256 x i1>)		declare void @llvm.masked.scatter.v256i8.v256p0(<256 x i8>, <256 x ptr>, i32, <256 x i1>)
define void @sve_scatter_vls(<256 x i1> %v256i1mask){		define void @sve_scatter_vls(<256 x i1> %v256i1mask){
; CHECK-LABEL: 'sve_scatter_vls'		; CHECK-LABEL: 'sve_scatter_vls'
; CHECK-NEON-LABEL: 'sve_scatter_vls'		; CHECK-NEON-LABEL: 'sve_scatter_vls'
; CHECK-NEON-NEXT: Cost Model: Found an estimated cost of 2000 for instruction: call void @llvm.masked.scatter.v256i8.v256p0(<256 x i8> undef, <256 x ptr> undef, i32 0, <256 x i1> %v256i1mask)		; CHECK-NEON-NEXT: Cost Model: Found an estimated cost of 1792 for instruction: call void @llvm.masked.scatter.v256i8.v256p0(<256 x i8> undef, <256 x ptr> undef, i32 0, <256 x i1> %v256i1mask)
; CHECK-NEON-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; CHECK-NEON-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
; CHECK-SVE-128-LABEL: 'sve_scatter_vls'		; CHECK-SVE-128-LABEL: 'sve_scatter_vls'
; CHECK-SVE-128-NEXT: Cost Model: Found an estimated cost of 2000 for instruction: call void @llvm.masked.scatter.v256i8.v256p0(<256 x i8> undef, <256 x ptr> undef, i32 0, <256 x i1> %v256i1mask)		; CHECK-SVE-128-NEXT: Cost Model: Found an estimated cost of 1792 for instruction: call void @llvm.masked.scatter.v256i8.v256p0(<256 x i8> undef, <256 x ptr> undef, i32 0, <256 x i1> %v256i1mask)
; CHECK-SVE-128-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; CHECK-SVE-128-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
; CHECK-SVE-256-LABEL: 'sve_scatter_vls'		; CHECK-SVE-256-LABEL: 'sve_scatter_vls'
; CHECK-SVE-256-NEXT: Cost Model: Found an estimated cost of 2560 for instruction: call void @llvm.masked.scatter.v256i8.v256p0(<256 x i8> undef, <256 x ptr> undef, i32 0, <256 x i1> %v256i1mask)		; CHECK-SVE-256-NEXT: Cost Model: Found an estimated cost of 2560 for instruction: call void @llvm.masked.scatter.v256i8.v256p0(<256 x i8> undef, <256 x ptr> undef, i32 0, <256 x i1> %v256i1mask)
; CHECK-SVE-256-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; CHECK-SVE-256-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
; CHECK-SVE-512-LABEL: 'sve_scatter_vls'		; CHECK-SVE-512-LABEL: 'sve_scatter_vls'
; CHECK-SVE-512-NEXT: Cost Model: Found an estimated cost of 2560 for instruction: call void @llvm.masked.scatter.v256i8.v256p0(<256 x i8> undef, <256 x ptr> undef, i32 0, <256 x i1> %v256i1mask)		; CHECK-SVE-512-NEXT: Cost Model: Found an estimated cost of 2560 for instruction: call void @llvm.masked.scatter.v256i8.v256p0(<256 x i8> undef, <256 x ptr> undef, i32 0, <256 x i1> %v256i1mask)
; CHECK-SVE-512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; CHECK-SVE-512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
entry:		entry:
call void @llvm.masked.scatter.v256i8.v256p0(<256 x i8> undef, <256 x ptr> undef, i32 0, <256 x i1> %v256i1mask)		call void @llvm.masked.scatter.v256i8.v256p0(<256 x i8> undef, <256 x ptr> undef, i32 0, <256 x i1> %v256i1mask)
ret void		ret void
}		}

declare void @llvm.masked.scatter.v512f16.v512p0(<512 x half>, <512 x ptr>, i32, <512 x i1>)		declare void @llvm.masked.scatter.v512f16.v512p0(<512 x half>, <512 x ptr>, i32, <512 x i1>)
define void @sve_scatter_vls_float(<512 x i1> %v512i1mask){		define void @sve_scatter_vls_float(<512 x i1> %v512i1mask){
; CHECK-LABEL: 'sve_scatter_vls_float'		; CHECK-LABEL: 'sve_scatter_vls_float'
; CHECK-NEON-LABEL: 'sve_scatter_vls_float'		; CHECK-NEON-LABEL: 'sve_scatter_vls_float'
; CHECK-NEON-NEXT: Cost Model: Found an estimated cost of 3904 for instruction: call void @llvm.masked.scatter.v512f16.v512p0(<512 x half> undef, <512 x ptr> undef, i32 0, <512 x i1> %v512i1mask)		; CHECK-NEON-NEXT: Cost Model: Found an estimated cost of 3456 for instruction: call void @llvm.masked.scatter.v512f16.v512p0(<512 x half> undef, <512 x ptr> undef, i32 0, <512 x i1> %v512i1mask)
; CHECK-NEON-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; CHECK-NEON-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
; CHECK-SVE-128-LABEL: 'sve_scatter_vls_float'		; CHECK-SVE-128-LABEL: 'sve_scatter_vls_float'
; CHECK-SVE-128-NEXT: Cost Model: Found an estimated cost of 3904 for instruction: call void @llvm.masked.scatter.v512f16.v512p0(<512 x half> undef, <512 x ptr> undef, i32 0, <512 x i1> %v512i1mask)		; CHECK-SVE-128-NEXT: Cost Model: Found an estimated cost of 3456 for instruction: call void @llvm.masked.scatter.v512f16.v512p0(<512 x half> undef, <512 x ptr> undef, i32 0, <512 x i1> %v512i1mask)
; CHECK-SVE-128-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; CHECK-SVE-128-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
; CHECK-SVE-256-LABEL: 'sve_scatter_vls_float'		; CHECK-SVE-256-LABEL: 'sve_scatter_vls_float'
; CHECK-SVE-256-NEXT: Cost Model: Found an estimated cost of 5120 for instruction: call void @llvm.masked.scatter.v512f16.v512p0(<512 x half> undef, <512 x ptr> undef, i32 0, <512 x i1> %v512i1mask)		; CHECK-SVE-256-NEXT: Cost Model: Found an estimated cost of 5120 for instruction: call void @llvm.masked.scatter.v512f16.v512p0(<512 x half> undef, <512 x ptr> undef, i32 0, <512 x i1> %v512i1mask)
; CHECK-SVE-256-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; CHECK-SVE-256-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
; CHECK-SVE-512-LABEL: 'sve_scatter_vls_float'		; CHECK-SVE-512-LABEL: 'sve_scatter_vls_float'
; CHECK-SVE-512-NEXT: Cost Model: Found an estimated cost of 5120 for instruction: call void @llvm.masked.scatter.v512f16.v512p0(<512 x half> undef, <512 x ptr> undef, i32 0, <512 x i1> %v512i1mask)		; CHECK-SVE-512-NEXT: Cost Model: Found an estimated cost of 5120 for instruction: call void @llvm.masked.scatter.v512f16.v512p0(<512 x half> undef, <512 x ptr> undef, i32 0, <512 x i1> %v512i1mask)
; CHECK-SVE-512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; CHECK-SVE-512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
call void @llvm.masked.scatter.v512f16.v512p0(<512 x half> undef, <512 x ptr> undef, i32 0, <512 x i1> %v512i1mask)		call void @llvm.masked.scatter.v512f16.v512p0(<512 x half> undef, <512 x ptr> undef, i32 0, <512 x i1> %v512i1mask)
ret void		ret void
}		}

llvm/test/Analysis/CostModel/AArch64/min-max.ll

Show First 20 Lines • Show All 189 Lines • ▼ Show 20 Lines	;
%V2i64 = call <2 x i64> @llvm.smax.v2i64(<2 x i64> undef, <2 x i64> undef)		%V2i64 = call <2 x i64> @llvm.smax.v2i64(<2 x i64> undef, <2 x i64> undef)
%V4i64 = call <4 x i64> @llvm.smax.v4i64(<4 x i64> undef, <4 x i64> undef)		%V4i64 = call <4 x i64> @llvm.smax.v4i64(<4 x i64> undef, <4 x i64> undef)
ret void		ret void
}		}

define void @minnum16() {		define void @minnum16() {
; CHECK-NOF16-LABEL: 'minnum16'		; CHECK-NOF16-LABEL: 'minnum16'
; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %f16 = call half @llvm.minnum.f16(half undef, half undef)		; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %f16 = call half @llvm.minnum.f16(half undef, half undef)
; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %V2f16 = call <2 x half> @llvm.minnum.v2f16(<2 x half> undef, <2 x half> undef)		; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V2f16 = call <2 x half> @llvm.minnum.v2f16(<2 x half> undef, <2 x half> undef)
; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 13 for instruction: %V4f16 = call <4 x half> @llvm.minnum.v4f16(<4 x half> undef, <4 x half> undef)		; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %V4f16 = call <4 x half> @llvm.minnum.v4f16(<4 x half> undef, <4 x half> undef)
; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 29 for instruction: %V8f16 = call <8 x half> @llvm.minnum.v8f16(<8 x half> undef, <8 x half> undef)		; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 22 for instruction: %V8f16 = call <8 x half> @llvm.minnum.v8f16(<8 x half> undef, <8 x half> undef)
; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 58 for instruction: %V16f16 = call <16 x half> @llvm.minnum.v16f16(<16 x half> undef, <16 x half> undef)		; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 44 for instruction: %V16f16 = call <16 x half> @llvm.minnum.v16f16(<16 x half> undef, <16 x half> undef)
; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
; CHECK-F16-LABEL: 'minnum16'		; CHECK-F16-LABEL: 'minnum16'
; CHECK-F16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %f16 = call half @llvm.minnum.f16(half undef, half undef)		; CHECK-F16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %f16 = call half @llvm.minnum.f16(half undef, half undef)
; CHECK-F16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V2f16 = call <2 x half> @llvm.minnum.v2f16(<2 x half> undef, <2 x half> undef)		; CHECK-F16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V2f16 = call <2 x half> @llvm.minnum.v2f16(<2 x half> undef, <2 x half> undef)
; CHECK-F16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4f16 = call <4 x half> @llvm.minnum.v4f16(<4 x half> undef, <4 x half> undef)		; CHECK-F16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4f16 = call <4 x half> @llvm.minnum.v4f16(<4 x half> undef, <4 x half> undef)
; CHECK-F16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V8f16 = call <8 x half> @llvm.minnum.v8f16(<8 x half> undef, <8 x half> undef)		; CHECK-F16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V8f16 = call <8 x half> @llvm.minnum.v8f16(<8 x half> undef, <8 x half> undef)
; CHECK-F16-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V16f16 = call <16 x half> @llvm.minnum.v16f16(<16 x half> undef, <16 x half> undef)		; CHECK-F16-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V16f16 = call <16 x half> @llvm.minnum.v16f16(<16 x half> undef, <16 x half> undef)
; CHECK-F16-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; CHECK-F16-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
%f16 = call half @llvm.minnum.f16(half undef, half undef)		%f16 = call half @llvm.minnum.f16(half undef, half undef)
%V2f16 = call <2 x half> @llvm.minnum.v2f16(<2 x half> undef, <2 x half> undef)		%V2f16 = call <2 x half> @llvm.minnum.v2f16(<2 x half> undef, <2 x half> undef)
%V4f16 = call <4 x half> @llvm.minnum.v4f16(<4 x half> undef, <4 x half> undef)		%V4f16 = call <4 x half> @llvm.minnum.v4f16(<4 x half> undef, <4 x half> undef)
%V8f16 = call <8 x half> @llvm.minnum.v8f16(<8 x half> undef, <8 x half> undef)		%V8f16 = call <8 x half> @llvm.minnum.v8f16(<8 x half> undef, <8 x half> undef)
%V16f16 = call <16 x half> @llvm.minnum.v16f16(<16 x half> undef, <16 x half> undef)		%V16f16 = call <16 x half> @llvm.minnum.v16f16(<16 x half> undef, <16 x half> undef)
ret void		ret void
}		}

define void @maxnum16() {		define void @maxnum16() {
; CHECK-NOF16-LABEL: 'maxnum16'		; CHECK-NOF16-LABEL: 'maxnum16'
; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %f16 = call half @llvm.maxnum.f16(half undef, half undef)		; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %f16 = call half @llvm.maxnum.f16(half undef, half undef)
; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %V2f16 = call <2 x half> @llvm.maxnum.v2f16(<2 x half> undef, <2 x half> undef)		; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V2f16 = call <2 x half> @llvm.maxnum.v2f16(<2 x half> undef, <2 x half> undef)
; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 13 for instruction: %V4f16 = call <4 x half> @llvm.maxnum.v4f16(<4 x half> undef, <4 x half> undef)		; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %V4f16 = call <4 x half> @llvm.maxnum.v4f16(<4 x half> undef, <4 x half> undef)
; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 29 for instruction: %V8f16 = call <8 x half> @llvm.maxnum.v8f16(<8 x half> undef, <8 x half> undef)		; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 22 for instruction: %V8f16 = call <8 x half> @llvm.maxnum.v8f16(<8 x half> undef, <8 x half> undef)
; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 58 for instruction: %V16f16 = call <16 x half> @llvm.maxnum.v16f16(<16 x half> undef, <16 x half> undef)		; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 44 for instruction: %V16f16 = call <16 x half> @llvm.maxnum.v16f16(<16 x half> undef, <16 x half> undef)
; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
; CHECK-F16-LABEL: 'maxnum16'		; CHECK-F16-LABEL: 'maxnum16'
; CHECK-F16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %f16 = call half @llvm.maxnum.f16(half undef, half undef)		; CHECK-F16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %f16 = call half @llvm.maxnum.f16(half undef, half undef)
; CHECK-F16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V2f16 = call <2 x half> @llvm.maxnum.v2f16(<2 x half> undef, <2 x half> undef)		; CHECK-F16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V2f16 = call <2 x half> @llvm.maxnum.v2f16(<2 x half> undef, <2 x half> undef)
; CHECK-F16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4f16 = call <4 x half> @llvm.maxnum.v4f16(<4 x half> undef, <4 x half> undef)		; CHECK-F16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4f16 = call <4 x half> @llvm.maxnum.v4f16(<4 x half> undef, <4 x half> undef)
; CHECK-F16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V8f16 = call <8 x half> @llvm.maxnum.v8f16(<8 x half> undef, <8 x half> undef)		; CHECK-F16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V8f16 = call <8 x half> @llvm.maxnum.v8f16(<8 x half> undef, <8 x half> undef)
; CHECK-F16-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V16f16 = call <16 x half> @llvm.maxnum.v16f16(<16 x half> undef, <16 x half> undef)		; CHECK-F16-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V16f16 = call <16 x half> @llvm.maxnum.v16f16(<16 x half> undef, <16 x half> undef)
▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	;
%V4f64 = call <4 x double> @llvm.maxnum.v4f64(<4 x double> undef, <4 x double> undef)		%V4f64 = call <4 x double> @llvm.maxnum.v4f64(<4 x double> undef, <4 x double> undef)
ret void		ret void
}		}


define void @minimum16() {		define void @minimum16() {
; CHECK-NOF16-LABEL: 'minimum16'		; CHECK-NOF16-LABEL: 'minimum16'
; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %f16 = call half @llvm.minimum.f16(half undef, half undef)		; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %f16 = call half @llvm.minimum.f16(half undef, half undef)
; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %V2f16 = call <2 x half> @llvm.minimum.v2f16(<2 x half> undef, <2 x half> undef)		; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V2f16 = call <2 x half> @llvm.minimum.v2f16(<2 x half> undef, <2 x half> undef)
; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 13 for instruction: %V4f16 = call <4 x half> @llvm.minimum.v4f16(<4 x half> undef, <4 x half> undef)		; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %V4f16 = call <4 x half> @llvm.minimum.v4f16(<4 x half> undef, <4 x half> undef)
; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 29 for instruction: %V8f16 = call <8 x half> @llvm.minimum.v8f16(<8 x half> undef, <8 x half> undef)		; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 22 for instruction: %V8f16 = call <8 x half> @llvm.minimum.v8f16(<8 x half> undef, <8 x half> undef)
; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 58 for instruction: %V16f16 = call <16 x half> @llvm.minimum.v16f16(<16 x half> undef, <16 x half> undef)		; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 44 for instruction: %V16f16 = call <16 x half> @llvm.minimum.v16f16(<16 x half> undef, <16 x half> undef)
; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
; CHECK-F16-LABEL: 'minimum16'		; CHECK-F16-LABEL: 'minimum16'
; CHECK-F16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %f16 = call half @llvm.minimum.f16(half undef, half undef)		; CHECK-F16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %f16 = call half @llvm.minimum.f16(half undef, half undef)
; CHECK-F16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V2f16 = call <2 x half> @llvm.minimum.v2f16(<2 x half> undef, <2 x half> undef)		; CHECK-F16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V2f16 = call <2 x half> @llvm.minimum.v2f16(<2 x half> undef, <2 x half> undef)
; CHECK-F16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4f16 = call <4 x half> @llvm.minimum.v4f16(<4 x half> undef, <4 x half> undef)		; CHECK-F16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4f16 = call <4 x half> @llvm.minimum.v4f16(<4 x half> undef, <4 x half> undef)
; CHECK-F16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V8f16 = call <8 x half> @llvm.minimum.v8f16(<8 x half> undef, <8 x half> undef)		; CHECK-F16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V8f16 = call <8 x half> @llvm.minimum.v8f16(<8 x half> undef, <8 x half> undef)
; CHECK-F16-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V16f16 = call <16 x half> @llvm.minimum.v16f16(<16 x half> undef, <16 x half> undef)		; CHECK-F16-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V16f16 = call <16 x half> @llvm.minimum.v16f16(<16 x half> undef, <16 x half> undef)
; CHECK-F16-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; CHECK-F16-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
%f16 = call half @llvm.minimum.f16(half undef, half undef)		%f16 = call half @llvm.minimum.f16(half undef, half undef)
%V2f16 = call <2 x half> @llvm.minimum.v2f16(<2 x half> undef, <2 x half> undef)		%V2f16 = call <2 x half> @llvm.minimum.v2f16(<2 x half> undef, <2 x half> undef)
%V4f16 = call <4 x half> @llvm.minimum.v4f16(<4 x half> undef, <4 x half> undef)		%V4f16 = call <4 x half> @llvm.minimum.v4f16(<4 x half> undef, <4 x half> undef)
%V8f16 = call <8 x half> @llvm.minimum.v8f16(<8 x half> undef, <8 x half> undef)		%V8f16 = call <8 x half> @llvm.minimum.v8f16(<8 x half> undef, <8 x half> undef)
%V16f16 = call <16 x half> @llvm.minimum.v16f16(<16 x half> undef, <16 x half> undef)		%V16f16 = call <16 x half> @llvm.minimum.v16f16(<16 x half> undef, <16 x half> undef)
ret void		ret void
}		}

define void @maximum16() {		define void @maximum16() {
; CHECK-NOF16-LABEL: 'maximum16'		; CHECK-NOF16-LABEL: 'maximum16'
; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %f16 = call half @llvm.maximum.f16(half undef, half undef)		; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %f16 = call half @llvm.maximum.f16(half undef, half undef)
; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %V2f16 = call <2 x half> @llvm.maximum.v2f16(<2 x half> undef, <2 x half> undef)		; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V2f16 = call <2 x half> @llvm.maximum.v2f16(<2 x half> undef, <2 x half> undef)
; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 13 for instruction: %V4f16 = call <4 x half> @llvm.maximum.v4f16(<4 x half> undef, <4 x half> undef)		; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %V4f16 = call <4 x half> @llvm.maximum.v4f16(<4 x half> undef, <4 x half> undef)
; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 29 for instruction: %V8f16 = call <8 x half> @llvm.maximum.v8f16(<8 x half> undef, <8 x half> undef)		; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 22 for instruction: %V8f16 = call <8 x half> @llvm.maximum.v8f16(<8 x half> undef, <8 x half> undef)
; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 58 for instruction: %V16f16 = call <16 x half> @llvm.maximum.v16f16(<16 x half> undef, <16 x half> undef)		; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 44 for instruction: %V16f16 = call <16 x half> @llvm.maximum.v16f16(<16 x half> undef, <16 x half> undef)
; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
; CHECK-F16-LABEL: 'maximum16'		; CHECK-F16-LABEL: 'maximum16'
; CHECK-F16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %f16 = call half @llvm.maximum.f16(half undef, half undef)		; CHECK-F16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %f16 = call half @llvm.maximum.f16(half undef, half undef)
; CHECK-F16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V2f16 = call <2 x half> @llvm.maximum.v2f16(<2 x half> undef, <2 x half> undef)		; CHECK-F16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V2f16 = call <2 x half> @llvm.maximum.v2f16(<2 x half> undef, <2 x half> undef)
; CHECK-F16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4f16 = call <4 x half> @llvm.maximum.v4f16(<4 x half> undef, <4 x half> undef)		; CHECK-F16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4f16 = call <4 x half> @llvm.maximum.v4f16(<4 x half> undef, <4 x half> undef)
; CHECK-F16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V8f16 = call <8 x half> @llvm.maximum.v8f16(<8 x half> undef, <8 x half> undef)		; CHECK-F16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V8f16 = call <8 x half> @llvm.maximum.v8f16(<8 x half> undef, <8 x half> undef)
; CHECK-F16-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V16f16 = call <16 x half> @llvm.maximum.v16f16(<16 x half> undef, <16 x half> undef)		; CHECK-F16-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V16f16 = call <16 x half> @llvm.maximum.v16f16(<16 x half> undef, <16 x half> undef)
▲ Show 20 Lines • Show All 187 Lines • Show Last 20 Lines

llvm/test/Analysis/CostModel/AArch64/reduce-fadd.ll

	; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py
	; RUN: opt -passes='print<cost-model>' 2>&1 -disable-output -mtriple=aarch64--linux-gnu < %s \| FileCheck %s			; RUN: opt -passes='print<cost-model>' 2>&1 -disable-output -mtriple=aarch64--linux-gnu < %s \| FileCheck %s
	; RUN: opt -passes='print<cost-model>' 2>&1 -disable-output -mtriple=aarch64--linux-gnu -mattr=+fullfp16 < %s \| FileCheck %s --check-prefix=FP16			; RUN: opt -passes='print<cost-model>' 2>&1 -disable-output -mtriple=aarch64--linux-gnu -mattr=+fullfp16 < %s \| FileCheck %s --check-prefix=FP16
	; RUN: opt -passes='print<cost-model>' 2>&1 -disable-output -mtriple=aarch64--linux-gnu -mattr=+bf16 < %s \| FileCheck %s --check-prefix=BF16			; RUN: opt -passes='print<cost-model>' 2>&1 -disable-output -mtriple=aarch64--linux-gnu -mattr=+bf16 < %s \| FileCheck %s --check-prefix=BF16

	target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"

	define void @strict_fp_reductions() {			define void @strict_fp_reductions() {
	; CHECK-LABEL: 'strict_fp_reductions'			; CHECK-LABEL: 'strict_fp_reductions'
	; CHECK-NEXT: Cost Model: Found an estimated cost of 21 for instruction: %fadd_v4f16 = call half @llvm.vector.reduce.fadd.v4f16(half 0xH0000, <4 x half> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 18 for instruction: %fadd_v4f16 = call half @llvm.vector.reduce.fadd.v4f16(half 0xH0000, <4 x half> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 45 for instruction: %fadd_v8f16 = call half @llvm.vector.reduce.fadd.v8f16(half 0xH0000, <8 x half> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 38 for instruction: %fadd_v8f16 = call half @llvm.vector.reduce.fadd.v8f16(half 0xH0000, <8 x half> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 17 for instruction: %fadd_v4f32 = call float @llvm.vector.reduce.fadd.v4f32(float 0.000000e+00, <4 x float> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %fadd_v4f32 = call float @llvm.vector.reduce.fadd.v4f32(float 0.000000e+00, <4 x float> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 34 for instruction: %fadd_v8f32 = call float @llvm.vector.reduce.fadd.v8f32(float 0.000000e+00, <8 x float> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %fadd_v8f32 = call float @llvm.vector.reduce.fadd.v8f32(float 0.000000e+00, <8 x float> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %fadd_v2f64 = call double @llvm.vector.reduce.fadd.v2f64(double 0.000000e+00, <2 x double> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %fadd_v2f64 = call double @llvm.vector.reduce.fadd.v2f64(double 0.000000e+00, <2 x double> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %fadd_v4f64 = call double @llvm.vector.reduce.fadd.v4f64(double 0.000000e+00, <4 x double> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %fadd_v4f64 = call double @llvm.vector.reduce.fadd.v4f64(double 0.000000e+00, <4 x double> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %fadd_v4f8 = call bfloat @llvm.vector.reduce.fadd.v4bf16(bfloat 0xR0000, <4 x bfloat> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %fadd_v4f8 = call bfloat @llvm.vector.reduce.fadd.v4bf16(bfloat 0xR0000, <4 x bfloat> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %fadd_v4f128 = call fp128 @llvm.vector.reduce.fadd.v4f128(fp128 undef, <4 x fp128> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %fadd_v4f128 = call fp128 @llvm.vector.reduce.fadd.v4f128(fp128 undef, <4 x fp128> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	;			;
	; FP16-LABEL: 'strict_fp_reductions'			; FP16-LABEL: 'strict_fp_reductions'
	; FP16-NEXT: Cost Model: Found an estimated cost of 17 for instruction: %fadd_v4f16 = call half @llvm.vector.reduce.fadd.v4f16(half 0xH0000, <4 x half> undef)			; FP16-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %fadd_v4f16 = call half @llvm.vector.reduce.fadd.v4f16(half 0xH0000, <4 x half> undef)
	; FP16-NEXT: Cost Model: Found an estimated cost of 37 for instruction: %fadd_v8f16 = call half @llvm.vector.reduce.fadd.v8f16(half 0xH0000, <8 x half> undef)			; FP16-NEXT: Cost Model: Found an estimated cost of 30 for instruction: %fadd_v8f16 = call half @llvm.vector.reduce.fadd.v8f16(half 0xH0000, <8 x half> undef)
	; FP16-NEXT: Cost Model: Found an estimated cost of 17 for instruction: %fadd_v4f32 = call float @llvm.vector.reduce.fadd.v4f32(float 0.000000e+00, <4 x float> undef)			; FP16-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %fadd_v4f32 = call float @llvm.vector.reduce.fadd.v4f32(float 0.000000e+00, <4 x float> undef)
	; FP16-NEXT: Cost Model: Found an estimated cost of 34 for instruction: %fadd_v8f32 = call float @llvm.vector.reduce.fadd.v8f32(float 0.000000e+00, <8 x float> undef)			; FP16-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %fadd_v8f32 = call float @llvm.vector.reduce.fadd.v8f32(float 0.000000e+00, <8 x float> undef)
	; FP16-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %fadd_v2f64 = call double @llvm.vector.reduce.fadd.v2f64(double 0.000000e+00, <2 x double> undef)			; FP16-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %fadd_v2f64 = call double @llvm.vector.reduce.fadd.v2f64(double 0.000000e+00, <2 x double> undef)
	; FP16-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %fadd_v4f64 = call double @llvm.vector.reduce.fadd.v4f64(double 0.000000e+00, <4 x double> undef)			; FP16-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %fadd_v4f64 = call double @llvm.vector.reduce.fadd.v4f64(double 0.000000e+00, <4 x double> undef)
	; FP16-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %fadd_v4f8 = call bfloat @llvm.vector.reduce.fadd.v4bf16(bfloat 0xR0000, <4 x bfloat> undef)			; FP16-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %fadd_v4f8 = call bfloat @llvm.vector.reduce.fadd.v4bf16(bfloat 0xR0000, <4 x bfloat> undef)
	; FP16-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %fadd_v4f128 = call fp128 @llvm.vector.reduce.fadd.v4f128(fp128 undef, <4 x fp128> undef)			; FP16-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %fadd_v4f128 = call fp128 @llvm.vector.reduce.fadd.v4f128(fp128 undef, <4 x fp128> undef)
	; FP16-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void			; FP16-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	;			;
	; BF16-LABEL: 'strict_fp_reductions'			; BF16-LABEL: 'strict_fp_reductions'
	; BF16-NEXT: Cost Model: Found an estimated cost of 21 for instruction: %fadd_v4f16 = call half @llvm.vector.reduce.fadd.v4f16(half 0xH0000, <4 x half> undef)			; BF16-NEXT: Cost Model: Found an estimated cost of 18 for instruction: %fadd_v4f16 = call half @llvm.vector.reduce.fadd.v4f16(half 0xH0000, <4 x half> undef)
	; BF16-NEXT: Cost Model: Found an estimated cost of 45 for instruction: %fadd_v8f16 = call half @llvm.vector.reduce.fadd.v8f16(half 0xH0000, <8 x half> undef)			; BF16-NEXT: Cost Model: Found an estimated cost of 38 for instruction: %fadd_v8f16 = call half @llvm.vector.reduce.fadd.v8f16(half 0xH0000, <8 x half> undef)
	; BF16-NEXT: Cost Model: Found an estimated cost of 17 for instruction: %fadd_v4f32 = call float @llvm.vector.reduce.fadd.v4f32(float 0.000000e+00, <4 x float> undef)			; BF16-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %fadd_v4f32 = call float @llvm.vector.reduce.fadd.v4f32(float 0.000000e+00, <4 x float> undef)
	; BF16-NEXT: Cost Model: Found an estimated cost of 34 for instruction: %fadd_v8f32 = call float @llvm.vector.reduce.fadd.v8f32(float 0.000000e+00, <8 x float> undef)			; BF16-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %fadd_v8f32 = call float @llvm.vector.reduce.fadd.v8f32(float 0.000000e+00, <8 x float> undef)
	; BF16-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %fadd_v2f64 = call double @llvm.vector.reduce.fadd.v2f64(double 0.000000e+00, <2 x double> undef)			; BF16-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %fadd_v2f64 = call double @llvm.vector.reduce.fadd.v2f64(double 0.000000e+00, <2 x double> undef)
	; BF16-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %fadd_v4f64 = call double @llvm.vector.reduce.fadd.v4f64(double 0.000000e+00, <4 x double> undef)			; BF16-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %fadd_v4f64 = call double @llvm.vector.reduce.fadd.v4f64(double 0.000000e+00, <4 x double> undef)
	; BF16-NEXT: Cost Model: Found an estimated cost of 17 for instruction: %fadd_v4f8 = call bfloat @llvm.vector.reduce.fadd.v4bf16(bfloat 0xR0000, <4 x bfloat> undef)			; BF16-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %fadd_v4f8 = call bfloat @llvm.vector.reduce.fadd.v4bf16(bfloat 0xR0000, <4 x bfloat> undef)
	; BF16-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %fadd_v4f128 = call fp128 @llvm.vector.reduce.fadd.v4f128(fp128 undef, <4 x fp128> undef)			; BF16-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %fadd_v4f128 = call fp128 @llvm.vector.reduce.fadd.v4f128(fp128 undef, <4 x fp128> undef)
	; BF16-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void			; BF16-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	;			;
	%fadd_v4f16 = call half @llvm.vector.reduce.fadd.v4f16(half 0.0, <4 x half> undef)			%fadd_v4f16 = call half @llvm.vector.reduce.fadd.v4f16(half 0.0, <4 x half> undef)
	%fadd_v8f16 = call half @llvm.vector.reduce.fadd.v8f16(half 0.0, <8 x half> undef)			%fadd_v8f16 = call half @llvm.vector.reduce.fadd.v8f16(half 0.0, <8 x half> undef)
	%fadd_v4f32 = call float @llvm.vector.reduce.fadd.v4f32(float 0.0, <4 x float> undef)			%fadd_v4f32 = call float @llvm.vector.reduce.fadd.v4f32(float 0.0, <4 x float> undef)
	%fadd_v8f32 = call float @llvm.vector.reduce.fadd.v8f32(float 0.0, <8 x float> undef)			%fadd_v8f32 = call float @llvm.vector.reduce.fadd.v8f32(float 0.0, <8 x float> undef)
	%fadd_v2f64 = call double @llvm.vector.reduce.fadd.v2f64(double 0.0, <2 x double> undef)			%fadd_v2f64 = call double @llvm.vector.reduce.fadd.v2f64(double 0.0, <2 x double> undef)
	%fadd_v4f64 = call double @llvm.vector.reduce.fadd.v4f64(double 0.0, <4 x double> undef)			%fadd_v4f64 = call double @llvm.vector.reduce.fadd.v4f64(double 0.0, <4 x double> undef)
	%fadd_v4f8 = call bfloat @llvm.vector.reduce.fadd.v4f8(bfloat 0.0, <4 x bfloat> undef)			%fadd_v4f8 = call bfloat @llvm.vector.reduce.fadd.v4f8(bfloat 0.0, <4 x bfloat> undef)
	%fadd_v4f128 = call fp128 @llvm.vector.reduce.fadd.v4f128(fp128 undef, <4 x fp128> undef)			%fadd_v4f128 = call fp128 @llvm.vector.reduce.fadd.v4f128(fp128 undef, <4 x fp128> undef)

	ret void			ret void
	}			}


	define void @fast_fp_reductions() {			define void @fast_fp_reductions() {
	; CHECK-LABEL: 'fast_fp_reductions'			; CHECK-LABEL: 'fast_fp_reductions'
	; CHECK-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %fadd_v4f16_fast = call fast half @llvm.vector.reduce.fadd.v4f16(half 0xH0000, <4 x half> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %fadd_v4f16_fast = call fast half @llvm.vector.reduce.fadd.v4f16(half 0xH0000, <4 x half> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %fadd_v4f16_reassoc = call reassoc half @llvm.vector.reduce.fadd.v4f16(half 0xH0000, <4 x half> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %fadd_v4f16_reassoc = call reassoc half @llvm.vector.reduce.fadd.v4f16(half 0xH0000, <4 x half> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 30 for instruction: %fadd_v8f16 = call fast half @llvm.vector.reduce.fadd.v8f16(half 0xH0000, <8 x half> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 30 for instruction: %fadd_v8f16 = call fast half @llvm.vector.reduce.fadd.v8f16(half 0xH0000, <8 x half> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 30 for instruction: %fadd_v8f16_reassoc = call reassoc half @llvm.vector.reduce.fadd.v8f16(half 0xH0000, <8 x half> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 30 for instruction: %fadd_v8f16_reassoc = call reassoc half @llvm.vector.reduce.fadd.v8f16(half 0xH0000, <8 x half> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 46 for instruction: %fadd_v11f16 = call fast half @llvm.vector.reduce.fadd.v11f16(half 0xH0000, <11 x half> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 38 for instruction: %fadd_v11f16 = call fast half @llvm.vector.reduce.fadd.v11f16(half 0xH0000, <11 x half> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 52 for instruction: %fadd_v13f16_reassoc = call reassoc half @llvm.vector.reduce.fadd.v13f16(half 0xH0000, <13 x half> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 42 for instruction: %fadd_v13f16_reassoc = call reassoc half @llvm.vector.reduce.fadd.v13f16(half 0xH0000, <13 x half> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %fadd_v4f32 = call fast float @llvm.vector.reduce.fadd.v4f32(float 0.000000e+00, <4 x float> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %fadd_v4f32 = call fast float @llvm.vector.reduce.fadd.v4f32(float 0.000000e+00, <4 x float> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %fadd_v4f32_reassoc = call reassoc float @llvm.vector.reduce.fadd.v4f32(float 0.000000e+00, <4 x float> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %fadd_v4f32_reassoc = call reassoc float @llvm.vector.reduce.fadd.v4f32(float 0.000000e+00, <4 x float> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 27 for instruction: %fadd_v8f32 = call fast float @llvm.vector.reduce.fadd.v8f32(float 0.000000e+00, <8 x float> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 21 for instruction: %fadd_v8f32 = call fast float @llvm.vector.reduce.fadd.v8f32(float 0.000000e+00, <8 x float> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 27 for instruction: %fadd_v8f32_reassoc = call reassoc float @llvm.vector.reduce.fadd.v8f32(float 0.000000e+00, <8 x float> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 21 for instruction: %fadd_v8f32_reassoc = call reassoc float @llvm.vector.reduce.fadd.v8f32(float 0.000000e+00, <8 x float> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 46 for instruction: %fadd_v13f32 = call fast float @llvm.vector.reduce.fadd.v13f32(float 0.000000e+00, <13 x float> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 33 for instruction: %fadd_v13f32 = call fast float @llvm.vector.reduce.fadd.v13f32(float 0.000000e+00, <13 x float> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %fadd_v5f32_reassoc = call reassoc float @llvm.vector.reduce.fadd.v5f32(float 0.000000e+00, <5 x float> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %fadd_v5f32_reassoc = call reassoc float @llvm.vector.reduce.fadd.v5f32(float 0.000000e+00, <5 x float> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %fadd_v2f64 = call fast double @llvm.vector.reduce.fadd.v2f64(double 0.000000e+00, <2 x double> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %fadd_v2f64 = call fast double @llvm.vector.reduce.fadd.v2f64(double 0.000000e+00, <2 x double> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %fadd_v2f64_reassoc = call reassoc double @llvm.vector.reduce.fadd.v2f64(double 0.000000e+00, <2 x double> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %fadd_v2f64_reassoc = call reassoc double @llvm.vector.reduce.fadd.v2f64(double 0.000000e+00, <2 x double> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %fadd_v4f64 = call fast double @llvm.vector.reduce.fadd.v4f64(double 0.000000e+00, <4 x double> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %fadd_v4f64 = call fast double @llvm.vector.reduce.fadd.v4f64(double 0.000000e+00, <4 x double> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %fadd_v4f64_reassoc = call reassoc double @llvm.vector.reduce.fadd.v4f64(double 0.000000e+00, <4 x double> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %fadd_v4f64_reassoc = call reassoc double @llvm.vector.reduce.fadd.v4f64(double 0.000000e+00, <4 x double> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 15 for instruction: %fadd_v7f64 = call fast double @llvm.vector.reduce.fadd.v7f64(double 0.000000e+00, <7 x double> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %fadd_v7f64 = call fast double @llvm.vector.reduce.fadd.v7f64(double 0.000000e+00, <7 x double> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 23 for instruction: %fadd_v9f64_reassoc = call reassoc double @llvm.vector.reduce.fadd.v9f64(double 0.000000e+00, <9 x double> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 17 for instruction: %fadd_v9f64_reassoc = call reassoc double @llvm.vector.reduce.fadd.v9f64(double 0.000000e+00, <9 x double> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %fadd_v4f8 = call reassoc bfloat @llvm.vector.reduce.fadd.v4bf16(bfloat 0xR8000, <4 x bfloat> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %fadd_v4f8 = call reassoc bfloat @llvm.vector.reduce.fadd.v4bf16(bfloat 0xR8000, <4 x bfloat> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %fadd_v4f128 = call reassoc fp128 @llvm.vector.reduce.fadd.v4f128(fp128 undef, <4 x fp128> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %fadd_v4f128 = call reassoc fp128 @llvm.vector.reduce.fadd.v4f128(fp128 undef, <4 x fp128> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	;			;
	; FP16-LABEL: 'fast_fp_reductions'			; FP16-LABEL: 'fast_fp_reductions'
	; FP16-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %fadd_v4f16_fast = call fast half @llvm.vector.reduce.fadd.v4f16(half 0xH0000, <4 x half> undef)			; FP16-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %fadd_v4f16_fast = call fast half @llvm.vector.reduce.fadd.v4f16(half 0xH0000, <4 x half> undef)
	; FP16-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %fadd_v4f16_reassoc = call reassoc half @llvm.vector.reduce.fadd.v4f16(half 0xH0000, <4 x half> undef)			; FP16-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %fadd_v4f16_reassoc = call reassoc half @llvm.vector.reduce.fadd.v4f16(half 0xH0000, <4 x half> undef)
	; FP16-NEXT: Cost Model: Found an estimated cost of 27 for instruction: %fadd_v8f16 = call fast half @llvm.vector.reduce.fadd.v8f16(half 0xH0000, <8 x half> undef)			; FP16-NEXT: Cost Model: Found an estimated cost of 27 for instruction: %fadd_v8f16 = call fast half @llvm.vector.reduce.fadd.v8f16(half 0xH0000, <8 x half> undef)
	; FP16-NEXT: Cost Model: Found an estimated cost of 27 for instruction: %fadd_v8f16_reassoc = call reassoc half @llvm.vector.reduce.fadd.v8f16(half 0xH0000, <8 x half> undef)			; FP16-NEXT: Cost Model: Found an estimated cost of 27 for instruction: %fadd_v8f16_reassoc = call reassoc half @llvm.vector.reduce.fadd.v8f16(half 0xH0000, <8 x half> undef)
	; FP16-NEXT: Cost Model: Found an estimated cost of 43 for instruction: %fadd_v11f16 = call fast half @llvm.vector.reduce.fadd.v11f16(half 0xH0000, <11 x half> undef)			; FP16-NEXT: Cost Model: Found an estimated cost of 35 for instruction: %fadd_v11f16 = call fast half @llvm.vector.reduce.fadd.v11f16(half 0xH0000, <11 x half> undef)
	; FP16-NEXT: Cost Model: Found an estimated cost of 49 for instruction: %fadd_v13f16_reassoc = call reassoc half @llvm.vector.reduce.fadd.v13f16(half 0xH0000, <13 x half> undef)			; FP16-NEXT: Cost Model: Found an estimated cost of 39 for instruction: %fadd_v13f16_reassoc = call reassoc half @llvm.vector.reduce.fadd.v13f16(half 0xH0000, <13 x half> undef)
	; FP16-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %fadd_v4f32 = call fast float @llvm.vector.reduce.fadd.v4f32(float 0.000000e+00, <4 x float> undef)			; FP16-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %fadd_v4f32 = call fast float @llvm.vector.reduce.fadd.v4f32(float 0.000000e+00, <4 x float> undef)
	; FP16-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %fadd_v4f32_reassoc = call reassoc float @llvm.vector.reduce.fadd.v4f32(float 0.000000e+00, <4 x float> undef)			; FP16-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %fadd_v4f32_reassoc = call reassoc float @llvm.vector.reduce.fadd.v4f32(float 0.000000e+00, <4 x float> undef)
	; FP16-NEXT: Cost Model: Found an estimated cost of 27 for instruction: %fadd_v8f32 = call fast float @llvm.vector.reduce.fadd.v8f32(float 0.000000e+00, <8 x float> undef)			; FP16-NEXT: Cost Model: Found an estimated cost of 21 for instruction: %fadd_v8f32 = call fast float @llvm.vector.reduce.fadd.v8f32(float 0.000000e+00, <8 x float> undef)
	; FP16-NEXT: Cost Model: Found an estimated cost of 27 for instruction: %fadd_v8f32_reassoc = call reassoc float @llvm.vector.reduce.fadd.v8f32(float 0.000000e+00, <8 x float> undef)			; FP16-NEXT: Cost Model: Found an estimated cost of 21 for instruction: %fadd_v8f32_reassoc = call reassoc float @llvm.vector.reduce.fadd.v8f32(float 0.000000e+00, <8 x float> undef)
	; FP16-NEXT: Cost Model: Found an estimated cost of 46 for instruction: %fadd_v13f32 = call fast float @llvm.vector.reduce.fadd.v13f32(float 0.000000e+00, <13 x float> undef)			; FP16-NEXT: Cost Model: Found an estimated cost of 33 for instruction: %fadd_v13f32 = call fast float @llvm.vector.reduce.fadd.v13f32(float 0.000000e+00, <13 x float> undef)
	; FP16-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %fadd_v5f32_reassoc = call reassoc float @llvm.vector.reduce.fadd.v5f32(float 0.000000e+00, <5 x float> undef)			; FP16-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %fadd_v5f32_reassoc = call reassoc float @llvm.vector.reduce.fadd.v5f32(float 0.000000e+00, <5 x float> undef)
	; FP16-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %fadd_v2f64 = call fast double @llvm.vector.reduce.fadd.v2f64(double 0.000000e+00, <2 x double> undef)			; FP16-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %fadd_v2f64 = call fast double @llvm.vector.reduce.fadd.v2f64(double 0.000000e+00, <2 x double> undef)
	; FP16-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %fadd_v2f64_reassoc = call reassoc double @llvm.vector.reduce.fadd.v2f64(double 0.000000e+00, <2 x double> undef)			; FP16-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %fadd_v2f64_reassoc = call reassoc double @llvm.vector.reduce.fadd.v2f64(double 0.000000e+00, <2 x double> undef)
	; FP16-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %fadd_v4f64 = call fast double @llvm.vector.reduce.fadd.v4f64(double 0.000000e+00, <4 x double> undef)			; FP16-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %fadd_v4f64 = call fast double @llvm.vector.reduce.fadd.v4f64(double 0.000000e+00, <4 x double> undef)
	; FP16-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %fadd_v4f64_reassoc = call reassoc double @llvm.vector.reduce.fadd.v4f64(double 0.000000e+00, <4 x double> undef)			; FP16-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %fadd_v4f64_reassoc = call reassoc double @llvm.vector.reduce.fadd.v4f64(double 0.000000e+00, <4 x double> undef)
	; FP16-NEXT: Cost Model: Found an estimated cost of 15 for instruction: %fadd_v7f64 = call fast double @llvm.vector.reduce.fadd.v7f64(double 0.000000e+00, <7 x double> undef)			; FP16-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %fadd_v7f64 = call fast double @llvm.vector.reduce.fadd.v7f64(double 0.000000e+00, <7 x double> undef)
	; FP16-NEXT: Cost Model: Found an estimated cost of 23 for instruction: %fadd_v9f64_reassoc = call reassoc double @llvm.vector.reduce.fadd.v9f64(double 0.000000e+00, <9 x double> undef)			; FP16-NEXT: Cost Model: Found an estimated cost of 17 for instruction: %fadd_v9f64_reassoc = call reassoc double @llvm.vector.reduce.fadd.v9f64(double 0.000000e+00, <9 x double> undef)
	; FP16-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %fadd_v4f8 = call reassoc bfloat @llvm.vector.reduce.fadd.v4bf16(bfloat 0xR8000, <4 x bfloat> undef)			; FP16-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %fadd_v4f8 = call reassoc bfloat @llvm.vector.reduce.fadd.v4bf16(bfloat 0xR8000, <4 x bfloat> undef)
	; FP16-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %fadd_v4f128 = call reassoc fp128 @llvm.vector.reduce.fadd.v4f128(fp128 undef, <4 x fp128> undef)			; FP16-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %fadd_v4f128 = call reassoc fp128 @llvm.vector.reduce.fadd.v4f128(fp128 undef, <4 x fp128> undef)
	; FP16-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void			; FP16-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	;			;
	; BF16-LABEL: 'fast_fp_reductions'			; BF16-LABEL: 'fast_fp_reductions'
	; BF16-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %fadd_v4f16_fast = call fast half @llvm.vector.reduce.fadd.v4f16(half 0xH0000, <4 x half> undef)			; BF16-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %fadd_v4f16_fast = call fast half @llvm.vector.reduce.fadd.v4f16(half 0xH0000, <4 x half> undef)
	; BF16-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %fadd_v4f16_reassoc = call reassoc half @llvm.vector.reduce.fadd.v4f16(half 0xH0000, <4 x half> undef)			; BF16-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %fadd_v4f16_reassoc = call reassoc half @llvm.vector.reduce.fadd.v4f16(half 0xH0000, <4 x half> undef)
	; BF16-NEXT: Cost Model: Found an estimated cost of 30 for instruction: %fadd_v8f16 = call fast half @llvm.vector.reduce.fadd.v8f16(half 0xH0000, <8 x half> undef)			; BF16-NEXT: Cost Model: Found an estimated cost of 30 for instruction: %fadd_v8f16 = call fast half @llvm.vector.reduce.fadd.v8f16(half 0xH0000, <8 x half> undef)
	; BF16-NEXT: Cost Model: Found an estimated cost of 30 for instruction: %fadd_v8f16_reassoc = call reassoc half @llvm.vector.reduce.fadd.v8f16(half 0xH0000, <8 x half> undef)			; BF16-NEXT: Cost Model: Found an estimated cost of 30 for instruction: %fadd_v8f16_reassoc = call reassoc half @llvm.vector.reduce.fadd.v8f16(half 0xH0000, <8 x half> undef)
	; BF16-NEXT: Cost Model: Found an estimated cost of 46 for instruction: %fadd_v11f16 = call fast half @llvm.vector.reduce.fadd.v11f16(half 0xH0000, <11 x half> undef)			; BF16-NEXT: Cost Model: Found an estimated cost of 38 for instruction: %fadd_v11f16 = call fast half @llvm.vector.reduce.fadd.v11f16(half 0xH0000, <11 x half> undef)
	; BF16-NEXT: Cost Model: Found an estimated cost of 52 for instruction: %fadd_v13f16_reassoc = call reassoc half @llvm.vector.reduce.fadd.v13f16(half 0xH0000, <13 x half> undef)			; BF16-NEXT: Cost Model: Found an estimated cost of 42 for instruction: %fadd_v13f16_reassoc = call reassoc half @llvm.vector.reduce.fadd.v13f16(half 0xH0000, <13 x half> undef)
	; BF16-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %fadd_v4f32 = call fast float @llvm.vector.reduce.fadd.v4f32(float 0.000000e+00, <4 x float> undef)			; BF16-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %fadd_v4f32 = call fast float @llvm.vector.reduce.fadd.v4f32(float 0.000000e+00, <4 x float> undef)
	; BF16-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %fadd_v4f32_reassoc = call reassoc float @llvm.vector.reduce.fadd.v4f32(float 0.000000e+00, <4 x float> undef)			; BF16-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %fadd_v4f32_reassoc = call reassoc float @llvm.vector.reduce.fadd.v4f32(float 0.000000e+00, <4 x float> undef)
	; BF16-NEXT: Cost Model: Found an estimated cost of 27 for instruction: %fadd_v8f32 = call fast float @llvm.vector.reduce.fadd.v8f32(float 0.000000e+00, <8 x float> undef)			; BF16-NEXT: Cost Model: Found an estimated cost of 21 for instruction: %fadd_v8f32 = call fast float @llvm.vector.reduce.fadd.v8f32(float 0.000000e+00, <8 x float> undef)
	; BF16-NEXT: Cost Model: Found an estimated cost of 27 for instruction: %fadd_v8f32_reassoc = call reassoc float @llvm.vector.reduce.fadd.v8f32(float 0.000000e+00, <8 x float> undef)			; BF16-NEXT: Cost Model: Found an estimated cost of 21 for instruction: %fadd_v8f32_reassoc = call reassoc float @llvm.vector.reduce.fadd.v8f32(float 0.000000e+00, <8 x float> undef)
	; BF16-NEXT: Cost Model: Found an estimated cost of 46 for instruction: %fadd_v13f32 = call fast float @llvm.vector.reduce.fadd.v13f32(float 0.000000e+00, <13 x float> undef)			; BF16-NEXT: Cost Model: Found an estimated cost of 33 for instruction: %fadd_v13f32 = call fast float @llvm.vector.reduce.fadd.v13f32(float 0.000000e+00, <13 x float> undef)
	; BF16-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %fadd_v5f32_reassoc = call reassoc float @llvm.vector.reduce.fadd.v5f32(float 0.000000e+00, <5 x float> undef)			; BF16-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %fadd_v5f32_reassoc = call reassoc float @llvm.vector.reduce.fadd.v5f32(float 0.000000e+00, <5 x float> undef)
	; BF16-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %fadd_v2f64 = call fast double @llvm.vector.reduce.fadd.v2f64(double 0.000000e+00, <2 x double> undef)			; BF16-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %fadd_v2f64 = call fast double @llvm.vector.reduce.fadd.v2f64(double 0.000000e+00, <2 x double> undef)
	; BF16-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %fadd_v2f64_reassoc = call reassoc double @llvm.vector.reduce.fadd.v2f64(double 0.000000e+00, <2 x double> undef)			; BF16-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %fadd_v2f64_reassoc = call reassoc double @llvm.vector.reduce.fadd.v2f64(double 0.000000e+00, <2 x double> undef)
	; BF16-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %fadd_v4f64 = call fast double @llvm.vector.reduce.fadd.v4f64(double 0.000000e+00, <4 x double> undef)			; BF16-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %fadd_v4f64 = call fast double @llvm.vector.reduce.fadd.v4f64(double 0.000000e+00, <4 x double> undef)
	; BF16-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %fadd_v4f64_reassoc = call reassoc double @llvm.vector.reduce.fadd.v4f64(double 0.000000e+00, <4 x double> undef)			; BF16-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %fadd_v4f64_reassoc = call reassoc double @llvm.vector.reduce.fadd.v4f64(double 0.000000e+00, <4 x double> undef)
	; BF16-NEXT: Cost Model: Found an estimated cost of 15 for instruction: %fadd_v7f64 = call fast double @llvm.vector.reduce.fadd.v7f64(double 0.000000e+00, <7 x double> undef)			; BF16-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %fadd_v7f64 = call fast double @llvm.vector.reduce.fadd.v7f64(double 0.000000e+00, <7 x double> undef)
	; BF16-NEXT: Cost Model: Found an estimated cost of 23 for instruction: %fadd_v9f64_reassoc = call reassoc double @llvm.vector.reduce.fadd.v9f64(double 0.000000e+00, <9 x double> undef)			; BF16-NEXT: Cost Model: Found an estimated cost of 17 for instruction: %fadd_v9f64_reassoc = call reassoc double @llvm.vector.reduce.fadd.v9f64(double 0.000000e+00, <9 x double> undef)
	; BF16-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %fadd_v4f8 = call reassoc bfloat @llvm.vector.reduce.fadd.v4bf16(bfloat 0xR8000, <4 x bfloat> undef)			; BF16-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %fadd_v4f8 = call reassoc bfloat @llvm.vector.reduce.fadd.v4bf16(bfloat 0xR8000, <4 x bfloat> undef)
	; BF16-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %fadd_v4f128 = call reassoc fp128 @llvm.vector.reduce.fadd.v4f128(fp128 undef, <4 x fp128> undef)			; BF16-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %fadd_v4f128 = call reassoc fp128 @llvm.vector.reduce.fadd.v4f128(fp128 undef, <4 x fp128> undef)
	; BF16-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void			; BF16-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	;			;
	%fadd_v4f16_fast = call fast half @llvm.vector.reduce.fadd.v4f16(half 0.0, <4 x half> undef)			%fadd_v4f16_fast = call fast half @llvm.vector.reduce.fadd.v4f16(half 0.0, <4 x half> undef)
	%fadd_v4f16_reassoc = call reassoc half @llvm.vector.reduce.fadd.v4f16(half 0.0, <4 x half> undef)			%fadd_v4f16_reassoc = call reassoc half @llvm.vector.reduce.fadd.v4f16(half 0.0, <4 x half> undef)

	%fadd_v8f16 = call fast half @llvm.vector.reduce.fadd.v8f16(half 0.0, <8 x half> undef)			%fadd_v8f16 = call fast half @llvm.vector.reduce.fadd.v8f16(half 0.0, <8 x half> undef)
	▲ Show 20 Lines • Show All 46 Lines • Show Last 20 Lines

llvm/test/Analysis/CostModel/AArch64/reduce-minmax.ll

Show First 20 Lines • Show All 156 Lines • ▼ Show 20 Lines	;
%V8i32 = call i32 @llvm.vector.reduce.smax.v8i32(<8 x i32> undef)		%V8i32 = call i32 @llvm.vector.reduce.smax.v8i32(<8 x i32> undef)
%V2i64 = call i64 @llvm.vector.reduce.smax.v2i64(<2 x i64> undef)		%V2i64 = call i64 @llvm.vector.reduce.smax.v2i64(<2 x i64> undef)
%V4i64 = call i64 @llvm.vector.reduce.smax.v4i64(<4 x i64> undef)		%V4i64 = call i64 @llvm.vector.reduce.smax.v4i64(<4 x i64> undef)
ret void		ret void
}		}

define void @reduce_fmin16() {		define void @reduce_fmin16() {
; CHECK-NOF16-LABEL: 'reduce_fmin16'		; CHECK-NOF16-LABEL: 'reduce_fmin16'
; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %V2f16 = call half @llvm.vector.reduce.fmin.v2f16(<2 x half> undef)		; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %V2f16 = call half @llvm.vector.reduce.fmin.v2f16(<2 x half> undef)
; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 68 for instruction: %V4f16 = call half @llvm.vector.reduce.fmin.v4f16(<4 x half> undef)		; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 50 for instruction: %V4f16 = call half @llvm.vector.reduce.fmin.v4f16(<4 x half> undef)
; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 237 for instruction: %V8f16 = call half @llvm.vector.reduce.fmin.v8f16(<8 x half> undef)		; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 174 for instruction: %V8f16 = call half @llvm.vector.reduce.fmin.v8f16(<8 x half> undef)
; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 350 for instruction: %V16f16 = call half @llvm.vector.reduce.fmin.v16f16(<16 x half> undef)		; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 252 for instruction: %V16f16 = call half @llvm.vector.reduce.fmin.v16f16(<16 x half> undef)
; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %V2f16m = call half @llvm.vector.reduce.fminimum.v2f16(<2 x half> undef)		; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %V2f16m = call half @llvm.vector.reduce.fminimum.v2f16(<2 x half> undef)
; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 68 for instruction: %V4f16m = call half @llvm.vector.reduce.fminimum.v4f16(<4 x half> undef)		; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 50 for instruction: %V4f16m = call half @llvm.vector.reduce.fminimum.v4f16(<4 x half> undef)
; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 237 for instruction: %V8f16m = call half @llvm.vector.reduce.fminimum.v8f16(<8 x half> undef)		; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 174 for instruction: %V8f16m = call half @llvm.vector.reduce.fminimum.v8f16(<8 x half> undef)
; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 350 for instruction: %V16f16m = call half @llvm.vector.reduce.fminimum.v16f16(<16 x half> undef)		; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 252 for instruction: %V16f16m = call half @llvm.vector.reduce.fminimum.v16f16(<16 x half> undef)
; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
; CHECK-F16-LABEL: 'reduce_fmin16'		; CHECK-F16-LABEL: 'reduce_fmin16'
; CHECK-F16-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V2f16 = call half @llvm.vector.reduce.fmin.v2f16(<2 x half> undef)		; CHECK-F16-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V2f16 = call half @llvm.vector.reduce.fmin.v2f16(<2 x half> undef)
; CHECK-F16-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V4f16 = call half @llvm.vector.reduce.fmin.v4f16(<4 x half> undef)		; CHECK-F16-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V4f16 = call half @llvm.vector.reduce.fmin.v4f16(<4 x half> undef)
; CHECK-F16-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V8f16 = call half @llvm.vector.reduce.fmin.v8f16(<8 x half> undef)		; CHECK-F16-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V8f16 = call half @llvm.vector.reduce.fmin.v8f16(<8 x half> undef)
; CHECK-F16-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V16f16 = call half @llvm.vector.reduce.fmin.v16f16(<16 x half> undef)		; CHECK-F16-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V16f16 = call half @llvm.vector.reduce.fmin.v16f16(<16 x half> undef)
; CHECK-F16-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V2f16m = call half @llvm.vector.reduce.fminimum.v2f16(<2 x half> undef)		; CHECK-F16-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V2f16m = call half @llvm.vector.reduce.fminimum.v2f16(<2 x half> undef)
Show All 10 Lines	;
%V4f16m = call half @llvm.vector.reduce.fminimum.v4f16(<4 x half> undef)		%V4f16m = call half @llvm.vector.reduce.fminimum.v4f16(<4 x half> undef)
%V8f16m = call half @llvm.vector.reduce.fminimum.v8f16(<8 x half> undef)		%V8f16m = call half @llvm.vector.reduce.fminimum.v8f16(<8 x half> undef)
%V16f16m = call half @llvm.vector.reduce.fminimum.v16f16(<16 x half> undef)		%V16f16m = call half @llvm.vector.reduce.fminimum.v16f16(<16 x half> undef)
ret void		ret void
}		}

define void @reduce_fmax16() {		define void @reduce_fmax16() {
; CHECK-NOF16-LABEL: 'reduce_fmax16'		; CHECK-NOF16-LABEL: 'reduce_fmax16'
; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %V2f16 = call half @llvm.vector.reduce.fmax.v2f16(<2 x half> undef)		; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %V2f16 = call half @llvm.vector.reduce.fmax.v2f16(<2 x half> undef)
; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 68 for instruction: %V4f16 = call half @llvm.vector.reduce.fmax.v4f16(<4 x half> undef)		; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 50 for instruction: %V4f16 = call half @llvm.vector.reduce.fmax.v4f16(<4 x half> undef)
; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 237 for instruction: %V8f16 = call half @llvm.vector.reduce.fmax.v8f16(<8 x half> undef)		; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 174 for instruction: %V8f16 = call half @llvm.vector.reduce.fmax.v8f16(<8 x half> undef)
; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 350 for instruction: %V16f16 = call half @llvm.vector.reduce.fmax.v16f16(<16 x half> undef)		; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 252 for instruction: %V16f16 = call half @llvm.vector.reduce.fmax.v16f16(<16 x half> undef)
; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %V2f16m = call half @llvm.vector.reduce.fmaximum.v2f16(<2 x half> undef)		; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %V2f16m = call half @llvm.vector.reduce.fmaximum.v2f16(<2 x half> undef)
; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 68 for instruction: %V4f16m = call half @llvm.vector.reduce.fmaximum.v4f16(<4 x half> undef)		; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 50 for instruction: %V4f16m = call half @llvm.vector.reduce.fmaximum.v4f16(<4 x half> undef)
; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 237 for instruction: %V8f16m = call half @llvm.vector.reduce.fmaximum.v8f16(<8 x half> undef)		; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 174 for instruction: %V8f16m = call half @llvm.vector.reduce.fmaximum.v8f16(<8 x half> undef)
; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 350 for instruction: %V16f16m = call half @llvm.vector.reduce.fmaximum.v16f16(<16 x half> undef)		; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 252 for instruction: %V16f16m = call half @llvm.vector.reduce.fmaximum.v16f16(<16 x half> undef)
; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
; CHECK-F16-LABEL: 'reduce_fmax16'		; CHECK-F16-LABEL: 'reduce_fmax16'
; CHECK-F16-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V2f16 = call half @llvm.vector.reduce.fmax.v2f16(<2 x half> undef)		; CHECK-F16-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V2f16 = call half @llvm.vector.reduce.fmax.v2f16(<2 x half> undef)
; CHECK-F16-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V4f16 = call half @llvm.vector.reduce.fmax.v4f16(<4 x half> undef)		; CHECK-F16-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V4f16 = call half @llvm.vector.reduce.fmax.v4f16(<4 x half> undef)
; CHECK-F16-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V8f16 = call half @llvm.vector.reduce.fmax.v8f16(<8 x half> undef)		; CHECK-F16-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V8f16 = call half @llvm.vector.reduce.fmax.v8f16(<8 x half> undef)
; CHECK-F16-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V16f16 = call half @llvm.vector.reduce.fmax.v16f16(<16 x half> undef)		; CHECK-F16-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V16f16 = call half @llvm.vector.reduce.fmax.v16f16(<16 x half> undef)
; CHECK-F16-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V2f16m = call half @llvm.vector.reduce.fmaximum.v2f16(<2 x half> undef)		; CHECK-F16-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V2f16m = call half @llvm.vector.reduce.fmaximum.v2f16(<2 x half> undef)
▲ Show 20 Lines • Show All 177 Lines • Show Last 20 Lines

llvm/test/Analysis/CostModel/AArch64/rem.ll

; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py
; RUN: opt < %s -mtriple=aarch64-unknown-linux-gnu -passes="print<cost-model>" 2>&1 -disable-output \| FileCheck %s		; RUN: opt < %s -mtriple=aarch64-unknown-linux-gnu -passes="print<cost-model>" 2>&1 -disable-output \| FileCheck %s

target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"		target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"

define i32 @srem() {		define i32 @srem() {
; CHECK-LABEL: 'srem'		; CHECK-LABEL: 'srem'
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I64 = srem i64 undef, undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I64 = srem i64 undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %V2i64 = srem <2 x i64> undef, undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %V2i64 = srem <2 x i64> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %V4i64 = srem <4 x i64> undef, undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %V4i64 = srem <4 x i64> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 48 for instruction: %V8i64 = srem <8 x i64> undef, undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 56 for instruction: %V8i64 = srem <8 x i64> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I32 = srem i32 undef, undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I32 = srem i32 undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 30 for instruction: %V4i32 = srem <4 x i32> undef, undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %V4i32 = srem <4 x i32> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 60 for instruction: %V8i32 = srem <8 x i32> undef, undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 56 for instruction: %V8i32 = srem <8 x i32> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 120 for instruction: %V16i32 = srem <16 x i32> undef, undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 112 for instruction: %V16i32 = srem <16 x i32> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I16 = srem i16 undef, undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I16 = srem i16 undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 66 for instruction: %V8i16 = srem <8 x i16> undef, undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 56 for instruction: %V8i16 = srem <8 x i16> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 132 for instruction: %V16i16 = srem <16 x i16> undef, undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 112 for instruction: %V16i16 = srem <16 x i16> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 264 for instruction: %V32i16 = srem <32 x i16> undef, undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 224 for instruction: %V32i16 = srem <32 x i16> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I8 = srem i8 undef, undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I8 = srem i8 undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 138 for instruction: %V16i8 = srem <16 x i8> undef, undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 112 for instruction: %V16i8 = srem <16 x i8> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 276 for instruction: %V32i8 = srem <32 x i8> undef, undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 224 for instruction: %V32i8 = srem <32 x i8> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 552 for instruction: %V64i8 = srem <64 x i8> undef, undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 448 for instruction: %V64i8 = srem <64 x i8> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
;		;
%I64 = srem i64 undef, undef		%I64 = srem i64 undef, undef
%V2i64 = srem <2 x i64> undef, undef		%V2i64 = srem <2 x i64> undef, undef
%V4i64 = srem <4 x i64> undef, undef		%V4i64 = srem <4 x i64> undef, undef
%V8i64 = srem <8 x i64> undef, undef		%V8i64 = srem <8 x i64> undef, undef

%I32 = srem i32 undef, undef		%I32 = srem i32 undef, undef
Show All 12 Lines	;
%V64i8 = srem <64 x i8> undef, undef		%V64i8 = srem <64 x i8> undef, undef

ret i32 undef		ret i32 undef
}		}

define i32 @urem() {		define i32 @urem() {
; CHECK-LABEL: 'urem'		; CHECK-LABEL: 'urem'
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I64 = urem i64 undef, undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I64 = urem i64 undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %V2i64 = urem <2 x i64> undef, undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %V2i64 = urem <2 x i64> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %V4i64 = urem <4 x i64> undef, undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %V4i64 = urem <4 x i64> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 48 for instruction: %V8i64 = urem <8 x i64> undef, undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 56 for instruction: %V8i64 = urem <8 x i64> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I32 = urem i32 undef, undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I32 = urem i32 undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 30 for instruction: %V4i32 = urem <4 x i32> undef, undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %V4i32 = urem <4 x i32> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 60 for instruction: %V8i32 = urem <8 x i32> undef, undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 56 for instruction: %V8i32 = urem <8 x i32> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 120 for instruction: %V16i32 = urem <16 x i32> undef, undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 112 for instruction: %V16i32 = urem <16 x i32> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I16 = urem i16 undef, undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I16 = urem i16 undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 66 for instruction: %V8i16 = urem <8 x i16> undef, undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 56 for instruction: %V8i16 = urem <8 x i16> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 132 for instruction: %V16i16 = urem <16 x i16> undef, undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 112 for instruction: %V16i16 = urem <16 x i16> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 264 for instruction: %V32i16 = urem <32 x i16> undef, undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 224 for instruction: %V32i16 = urem <32 x i16> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I8 = urem i8 undef, undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I8 = urem i8 undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 138 for instruction: %V16i8 = urem <16 x i8> undef, undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 112 for instruction: %V16i8 = urem <16 x i8> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 276 for instruction: %V32i8 = urem <32 x i8> undef, undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 224 for instruction: %V32i8 = urem <32 x i8> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 552 for instruction: %V64i8 = urem <64 x i8> undef, undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 448 for instruction: %V64i8 = urem <64 x i8> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
;		;
%I64 = urem i64 undef, undef		%I64 = urem i64 undef, undef
%V2i64 = urem <2 x i64> undef, undef		%V2i64 = urem <2 x i64> undef, undef
%V4i64 = urem <4 x i64> undef, undef		%V4i64 = urem <4 x i64> undef, undef
%V8i64 = urem <8 x i64> undef, undef		%V8i64 = urem <8 x i64> undef, undef

%I32 = urem i32 undef, undef		%I32 = urem i32 undef, undef
Show All 12 Lines	;
%V64i8 = urem <64 x i8> undef, undef		%V64i8 = urem <64 x i8> undef, undef

ret i32 undef		ret i32 undef
}		}

define i32 @srem_const() {		define i32 @srem_const() {
; CHECK-LABEL: 'srem_const'		; CHECK-LABEL: 'srem_const'
; CHECK-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %I64 = srem i64 undef, 7		; CHECK-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %I64 = srem i64 undef, 7
; CHECK-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %V2i64 = srem <2 x i64> undef, <i64 6, i64 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %V2i64 = srem <2 x i64> undef, <i64 6, i64 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %V4i64 = srem <4 x i64> undef, <i64 4, i64 5, i64 6, i64 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %V4i64 = srem <4 x i64> undef, <i64 4, i64 5, i64 6, i64 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 48 for instruction: %V8i64 = srem <8 x i64> undef, <i64 4, i64 5, i64 6, i64 7, i64 8, i64 9, i64 10, i64 11>		; CHECK-NEXT: Cost Model: Found an estimated cost of 56 for instruction: %V8i64 = srem <8 x i64> undef, <i64 4, i64 5, i64 6, i64 7, i64 8, i64 9, i64 10, i64 11>
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I32 = srem i32 undef, 7		; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I32 = srem i32 undef, 7
; CHECK-NEXT: Cost Model: Found an estimated cost of 30 for instruction: %V4i32 = srem <4 x i32> undef, <i32 4, i32 5, i32 6, i32 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %V4i32 = srem <4 x i32> undef, <i32 4, i32 5, i32 6, i32 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 60 for instruction: %V8i32 = srem <8 x i32> undef, <i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11>		; CHECK-NEXT: Cost Model: Found an estimated cost of 56 for instruction: %V8i32 = srem <8 x i32> undef, <i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11>
; CHECK-NEXT: Cost Model: Found an estimated cost of 120 for instruction: %V16i32 = srem <16 x i32> undef, <i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 17, i32 18, i32 19>		; CHECK-NEXT: Cost Model: Found an estimated cost of 112 for instruction: %V16i32 = srem <16 x i32> undef, <i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 17, i32 18, i32 19>
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I16 = srem i16 undef, 7		; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I16 = srem i16 undef, 7
; CHECK-NEXT: Cost Model: Found an estimated cost of 66 for instruction: %V8i16 = srem <8 x i16> undef, <i16 4, i16 5, i16 6, i16 7, i16 8, i16 9, i16 10, i16 11>		; CHECK-NEXT: Cost Model: Found an estimated cost of 56 for instruction: %V8i16 = srem <8 x i16> undef, <i16 4, i16 5, i16 6, i16 7, i16 8, i16 9, i16 10, i16 11>
; CHECK-NEXT: Cost Model: Found an estimated cost of 132 for instruction: %V16i16 = srem <16 x i16> undef, <i16 4, i16 5, i16 6, i16 7, i16 8, i16 9, i16 10, i16 11, i16 12, i16 13, i16 14, i16 15, i16 16, i16 17, i16 18, i16 19>		; CHECK-NEXT: Cost Model: Found an estimated cost of 112 for instruction: %V16i16 = srem <16 x i16> undef, <i16 4, i16 5, i16 6, i16 7, i16 8, i16 9, i16 10, i16 11, i16 12, i16 13, i16 14, i16 15, i16 16, i16 17, i16 18, i16 19>
; CHECK-NEXT: Cost Model: Found an estimated cost of 264 for instruction: %V32i16 = srem <32 x i16> undef, <i16 4, i16 5, i16 6, i16 7, i16 8, i16 9, i16 10, i16 11, i16 12, i16 13, i16 14, i16 15, i16 16, i16 17, i16 18, i16 19, i16 4, i16 5, i16 6, i16 7, i16 8, i16 9, i16 10, i16 11, i16 12, i16 13, i16 14, i16 15, i16 16, i16 17, i16 18, i16 19>		; CHECK-NEXT: Cost Model: Found an estimated cost of 224 for instruction: %V32i16 = srem <32 x i16> undef, <i16 4, i16 5, i16 6, i16 7, i16 8, i16 9, i16 10, i16 11, i16 12, i16 13, i16 14, i16 15, i16 16, i16 17, i16 18, i16 19, i16 4, i16 5, i16 6, i16 7, i16 8, i16 9, i16 10, i16 11, i16 12, i16 13, i16 14, i16 15, i16 16, i16 17, i16 18, i16 19>
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I8 = srem i8 undef, 7		; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I8 = srem i8 undef, 7
; CHECK-NEXT: Cost Model: Found an estimated cost of 138 for instruction: %V16i8 = srem <16 x i8> undef, <i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19>		; CHECK-NEXT: Cost Model: Found an estimated cost of 112 for instruction: %V16i8 = srem <16 x i8> undef, <i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19>
; CHECK-NEXT: Cost Model: Found an estimated cost of 276 for instruction: %V32i8 = srem <32 x i8> undef, <i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19>		; CHECK-NEXT: Cost Model: Found an estimated cost of 224 for instruction: %V32i8 = srem <32 x i8> undef, <i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19>
; CHECK-NEXT: Cost Model: Found an estimated cost of 552 for instruction: %V64i8 = srem <64 x i8> undef, <i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19>		; CHECK-NEXT: Cost Model: Found an estimated cost of 448 for instruction: %V64i8 = srem <64 x i8> undef, <i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19>
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
;		;
%I64 = srem i64 undef, 7		%I64 = srem i64 undef, 7
%V2i64 = srem <2 x i64> undef, <i64 6, i64 7>		%V2i64 = srem <2 x i64> undef, <i64 6, i64 7>
%V4i64 = srem <4 x i64> undef, <i64 4, i64 5, i64 6, i64 7>		%V4i64 = srem <4 x i64> undef, <i64 4, i64 5, i64 6, i64 7>
%V8i64 = srem <8 x i64> undef, <i64 4, i64 5, i64 6, i64 7, i64 8, i64 9, i64 10, i64 11>		%V8i64 = srem <8 x i64> undef, <i64 4, i64 5, i64 6, i64 7, i64 8, i64 9, i64 10, i64 11>

%I32 = srem i32 undef, 7		%I32 = srem i32 undef, 7
Show All 12 Lines	;
%V64i8 = srem <64 x i8> undef, <i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19>		%V64i8 = srem <64 x i8> undef, <i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19>

ret i32 undef		ret i32 undef
}		}

define i32 @urem_const() {		define i32 @urem_const() {
; CHECK-LABEL: 'urem_const'		; CHECK-LABEL: 'urem_const'
; CHECK-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %I64 = urem i64 undef, 7		; CHECK-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %I64 = urem i64 undef, 7
; CHECK-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %V2i64 = urem <2 x i64> undef, <i64 6, i64 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %V2i64 = urem <2 x i64> undef, <i64 6, i64 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %V4i64 = urem <4 x i64> undef, <i64 4, i64 5, i64 6, i64 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %V4i64 = urem <4 x i64> undef, <i64 4, i64 5, i64 6, i64 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 48 for instruction: %V8i64 = urem <8 x i64> undef, <i64 4, i64 5, i64 6, i64 7, i64 8, i64 9, i64 10, i64 11>		; CHECK-NEXT: Cost Model: Found an estimated cost of 56 for instruction: %V8i64 = urem <8 x i64> undef, <i64 4, i64 5, i64 6, i64 7, i64 8, i64 9, i64 10, i64 11>
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I32 = urem i32 undef, 7		; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I32 = urem i32 undef, 7
; CHECK-NEXT: Cost Model: Found an estimated cost of 30 for instruction: %V4i32 = urem <4 x i32> undef, <i32 4, i32 5, i32 6, i32 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %V4i32 = urem <4 x i32> undef, <i32 4, i32 5, i32 6, i32 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 60 for instruction: %V8i32 = urem <8 x i32> undef, <i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11>		; CHECK-NEXT: Cost Model: Found an estimated cost of 56 for instruction: %V8i32 = urem <8 x i32> undef, <i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11>
; CHECK-NEXT: Cost Model: Found an estimated cost of 120 for instruction: %V16i32 = urem <16 x i32> undef, <i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 17, i32 18, i32 19>		; CHECK-NEXT: Cost Model: Found an estimated cost of 112 for instruction: %V16i32 = urem <16 x i32> undef, <i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 17, i32 18, i32 19>
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I16 = urem i16 undef, 7		; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I16 = urem i16 undef, 7
; CHECK-NEXT: Cost Model: Found an estimated cost of 66 for instruction: %V8i16 = urem <8 x i16> undef, <i16 4, i16 5, i16 6, i16 7, i16 8, i16 9, i16 10, i16 11>		; CHECK-NEXT: Cost Model: Found an estimated cost of 56 for instruction: %V8i16 = urem <8 x i16> undef, <i16 4, i16 5, i16 6, i16 7, i16 8, i16 9, i16 10, i16 11>
; CHECK-NEXT: Cost Model: Found an estimated cost of 132 for instruction: %V16i16 = urem <16 x i16> undef, <i16 4, i16 5, i16 6, i16 7, i16 8, i16 9, i16 10, i16 11, i16 12, i16 13, i16 14, i16 15, i16 16, i16 17, i16 18, i16 19>		; CHECK-NEXT: Cost Model: Found an estimated cost of 112 for instruction: %V16i16 = urem <16 x i16> undef, <i16 4, i16 5, i16 6, i16 7, i16 8, i16 9, i16 10, i16 11, i16 12, i16 13, i16 14, i16 15, i16 16, i16 17, i16 18, i16 19>
; CHECK-NEXT: Cost Model: Found an estimated cost of 264 for instruction: %V32i16 = urem <32 x i16> undef, <i16 4, i16 5, i16 6, i16 7, i16 8, i16 9, i16 10, i16 11, i16 12, i16 13, i16 14, i16 15, i16 16, i16 17, i16 18, i16 19, i16 4, i16 5, i16 6, i16 7, i16 8, i16 9, i16 10, i16 11, i16 12, i16 13, i16 14, i16 15, i16 16, i16 17, i16 18, i16 19>		; CHECK-NEXT: Cost Model: Found an estimated cost of 224 for instruction: %V32i16 = urem <32 x i16> undef, <i16 4, i16 5, i16 6, i16 7, i16 8, i16 9, i16 10, i16 11, i16 12, i16 13, i16 14, i16 15, i16 16, i16 17, i16 18, i16 19, i16 4, i16 5, i16 6, i16 7, i16 8, i16 9, i16 10, i16 11, i16 12, i16 13, i16 14, i16 15, i16 16, i16 17, i16 18, i16 19>
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I8 = urem i8 undef, 7		; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I8 = urem i8 undef, 7
; CHECK-NEXT: Cost Model: Found an estimated cost of 138 for instruction: %V16i8 = urem <16 x i8> undef, <i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19>		; CHECK-NEXT: Cost Model: Found an estimated cost of 112 for instruction: %V16i8 = urem <16 x i8> undef, <i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19>
; CHECK-NEXT: Cost Model: Found an estimated cost of 276 for instruction: %V32i8 = urem <32 x i8> undef, <i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19>		; CHECK-NEXT: Cost Model: Found an estimated cost of 224 for instruction: %V32i8 = urem <32 x i8> undef, <i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19>
; CHECK-NEXT: Cost Model: Found an estimated cost of 552 for instruction: %V64i8 = urem <64 x i8> undef, <i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19>		; CHECK-NEXT: Cost Model: Found an estimated cost of 448 for instruction: %V64i8 = urem <64 x i8> undef, <i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19>
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
;		;
%I64 = urem i64 undef, 7		%I64 = urem i64 undef, 7
%V2i64 = urem <2 x i64> undef, <i64 6, i64 7>		%V2i64 = urem <2 x i64> undef, <i64 6, i64 7>
%V4i64 = urem <4 x i64> undef, <i64 4, i64 5, i64 6, i64 7>		%V4i64 = urem <4 x i64> undef, <i64 4, i64 5, i64 6, i64 7>
%V8i64 = urem <8 x i64> undef, <i64 4, i64 5, i64 6, i64 7, i64 8, i64 9, i64 10, i64 11>		%V8i64 = urem <8 x i64> undef, <i64 4, i64 5, i64 6, i64 7, i64 8, i64 9, i64 10, i64 11>

%I32 = urem i32 undef, 7		%I32 = urem i32 undef, 7
Show All 12 Lines	;
%V64i8 = urem <64 x i8> undef, <i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19>		%V64i8 = urem <64 x i8> undef, <i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19>

ret i32 undef		ret i32 undef
}		}

define i32 @srem_uniformconst() {		define i32 @srem_uniformconst() {
; CHECK-LABEL: 'srem_uniformconst'		; CHECK-LABEL: 'srem_uniformconst'
; CHECK-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %I64 = srem i64 undef, 7		; CHECK-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %I64 = srem i64 undef, 7
; CHECK-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %V2i64 = srem <2 x i64> undef, <i64 7, i64 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 26 for instruction: %V2i64 = srem <2 x i64> undef, <i64 7, i64 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 48 for instruction: %V4i64 = srem <4 x i64> undef, <i64 7, i64 7, i64 7, i64 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 52 for instruction: %V4i64 = srem <4 x i64> undef, <i64 7, i64 7, i64 7, i64 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 96 for instruction: %V8i64 = srem <8 x i64> undef, <i64 7, i64 7, i64 7, i64 7, i64 7, i64 7, i64 7, i64 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 104 for instruction: %V8i64 = srem <8 x i64> undef, <i64 7, i64 7, i64 7, i64 7, i64 7, i64 7, i64 7, i64 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I32 = srem i32 undef, 7		; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I32 = srem i32 undef, 7
; CHECK-NEXT: Cost Model: Found an estimated cost of 30 for instruction: %V4i32 = srem <4 x i32> undef, <i32 7, i32 7, i32 7, i32 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %V4i32 = srem <4 x i32> undef, <i32 7, i32 7, i32 7, i32 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 60 for instruction: %V8i32 = srem <8 x i32> undef, <i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 56 for instruction: %V8i32 = srem <8 x i32> undef, <i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 120 for instruction: %V16i32 = srem <16 x i32> undef, <i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 112 for instruction: %V16i32 = srem <16 x i32> undef, <i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I16 = srem i16 undef, 7		; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I16 = srem i16 undef, 7
; CHECK-NEXT: Cost Model: Found an estimated cost of 66 for instruction: %V8i16 = srem <8 x i16> undef, <i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 56 for instruction: %V8i16 = srem <8 x i16> undef, <i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 132 for instruction: %V16i16 = srem <16 x i16> undef, <i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 112 for instruction: %V16i16 = srem <16 x i16> undef, <i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 264 for instruction: %V32i16 = srem <32 x i16> undef, <i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 224 for instruction: %V32i16 = srem <32 x i16> undef, <i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I8 = srem i8 undef, 7		; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I8 = srem i8 undef, 7
; CHECK-NEXT: Cost Model: Found an estimated cost of 138 for instruction: %V16i8 = srem <16 x i8> undef, <i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 112 for instruction: %V16i8 = srem <16 x i8> undef, <i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 276 for instruction: %V32i8 = srem <32 x i8> undef, <i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 224 for instruction: %V32i8 = srem <32 x i8> undef, <i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 552 for instruction: %V64i8 = srem <64 x i8> undef, <i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 448 for instruction: %V64i8 = srem <64 x i8> undef, <i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
;		;
%I64 = srem i64 undef, 7		%I64 = srem i64 undef, 7
%V2i64 = srem <2 x i64> undef, <i64 7, i64 7>		%V2i64 = srem <2 x i64> undef, <i64 7, i64 7>
%V4i64 = srem <4 x i64> undef, <i64 7, i64 7, i64 7, i64 7>		%V4i64 = srem <4 x i64> undef, <i64 7, i64 7, i64 7, i64 7>
%V8i64 = srem <8 x i64> undef, <i64 7, i64 7, i64 7, i64 7, i64 7, i64 7, i64 7, i64 7>		%V8i64 = srem <8 x i64> undef, <i64 7, i64 7, i64 7, i64 7, i64 7, i64 7, i64 7, i64 7>

%I32 = srem i32 undef, 7		%I32 = srem i32 undef, 7
Show All 12 Lines	;
%V64i8 = srem <64 x i8> undef, <i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7>		%V64i8 = srem <64 x i8> undef, <i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7>

ret i32 undef		ret i32 undef
}		}

define i32 @urem_uniformconst() {		define i32 @urem_uniformconst() {
; CHECK-LABEL: 'urem_uniformconst'		; CHECK-LABEL: 'urem_uniformconst'
; CHECK-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %I64 = urem i64 undef, 7		; CHECK-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %I64 = urem i64 undef, 7
; CHECK-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %V2i64 = urem <2 x i64> undef, <i64 7, i64 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 26 for instruction: %V2i64 = urem <2 x i64> undef, <i64 7, i64 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 48 for instruction: %V4i64 = urem <4 x i64> undef, <i64 7, i64 7, i64 7, i64 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 52 for instruction: %V4i64 = urem <4 x i64> undef, <i64 7, i64 7, i64 7, i64 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 96 for instruction: %V8i64 = urem <8 x i64> undef, <i64 7, i64 7, i64 7, i64 7, i64 7, i64 7, i64 7, i64 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 104 for instruction: %V8i64 = urem <8 x i64> undef, <i64 7, i64 7, i64 7, i64 7, i64 7, i64 7, i64 7, i64 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I32 = urem i32 undef, 7		; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I32 = urem i32 undef, 7
; CHECK-NEXT: Cost Model: Found an estimated cost of 30 for instruction: %V4i32 = urem <4 x i32> undef, <i32 7, i32 7, i32 7, i32 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %V4i32 = urem <4 x i32> undef, <i32 7, i32 7, i32 7, i32 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 60 for instruction: %V8i32 = urem <8 x i32> undef, <i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 56 for instruction: %V8i32 = urem <8 x i32> undef, <i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 120 for instruction: %V16i32 = urem <16 x i32> undef, <i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 112 for instruction: %V16i32 = urem <16 x i32> undef, <i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I16 = urem i16 undef, 7		; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I16 = urem i16 undef, 7
; CHECK-NEXT: Cost Model: Found an estimated cost of 66 for instruction: %V8i16 = urem <8 x i16> undef, <i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 56 for instruction: %V8i16 = urem <8 x i16> undef, <i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 132 for instruction: %V16i16 = urem <16 x i16> undef, <i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 112 for instruction: %V16i16 = urem <16 x i16> undef, <i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 264 for instruction: %V32i16 = urem <32 x i16> undef, <i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 224 for instruction: %V32i16 = urem <32 x i16> undef, <i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I8 = urem i8 undef, 7		; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I8 = urem i8 undef, 7
; CHECK-NEXT: Cost Model: Found an estimated cost of 138 for instruction: %V16i8 = urem <16 x i8> undef, <i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 112 for instruction: %V16i8 = urem <16 x i8> undef, <i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 276 for instruction: %V32i8 = urem <32 x i8> undef, <i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 224 for instruction: %V32i8 = urem <32 x i8> undef, <i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 552 for instruction: %V64i8 = urem <64 x i8> undef, <i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 448 for instruction: %V64i8 = urem <64 x i8> undef, <i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
;		;
%I64 = urem i64 undef, 7		%I64 = urem i64 undef, 7
%V2i64 = urem <2 x i64> undef, <i64 7, i64 7>		%V2i64 = urem <2 x i64> undef, <i64 7, i64 7>
%V4i64 = urem <4 x i64> undef, <i64 7, i64 7, i64 7, i64 7>		%V4i64 = urem <4 x i64> undef, <i64 7, i64 7, i64 7, i64 7>
%V8i64 = urem <8 x i64> undef, <i64 7, i64 7, i64 7, i64 7, i64 7, i64 7, i64 7, i64 7>		%V8i64 = urem <8 x i64> undef, <i64 7, i64 7, i64 7, i64 7, i64 7, i64 7, i64 7, i64 7>

%I32 = urem i32 undef, 7		%I32 = urem i32 undef, 7
Show All 12 Lines	;
%V64i8 = urem <64 x i8> undef, <i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7>		%V64i8 = urem <64 x i8> undef, <i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7>

ret i32 undef		ret i32 undef
}		}

define i32 @srem_constpow2() {		define i32 @srem_constpow2() {
; CHECK-LABEL: 'srem_constpow2'		; CHECK-LABEL: 'srem_constpow2'
; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %I64 = srem i64 undef, 16		; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %I64 = srem i64 undef, 16
; CHECK-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %V2i64 = srem <2 x i64> undef, <i64 8, i64 16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %V2i64 = srem <2 x i64> undef, <i64 8, i64 16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %V4i64 = srem <4 x i64> undef, <i64 2, i64 4, i64 8, i64 16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %V4i64 = srem <4 x i64> undef, <i64 2, i64 4, i64 8, i64 16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 48 for instruction: %V8i64 = srem <8 x i64> undef, <i64 2, i64 4, i64 8, i64 16, i64 32, i64 64, i64 128, i64 256>		; CHECK-NEXT: Cost Model: Found an estimated cost of 56 for instruction: %V8i64 = srem <8 x i64> undef, <i64 2, i64 4, i64 8, i64 16, i64 32, i64 64, i64 128, i64 256>
; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %I32 = srem i32 undef, 16		; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %I32 = srem i32 undef, 16
; CHECK-NEXT: Cost Model: Found an estimated cost of 30 for instruction: %V4i32 = srem <4 x i32> undef, <i32 2, i32 4, i32 8, i32 16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %V4i32 = srem <4 x i32> undef, <i32 2, i32 4, i32 8, i32 16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 60 for instruction: %V8i32 = srem <8 x i32> undef, <i32 2, i32 4, i32 8, i32 16, i32 32, i32 64, i32 128, i32 256>		; CHECK-NEXT: Cost Model: Found an estimated cost of 56 for instruction: %V8i32 = srem <8 x i32> undef, <i32 2, i32 4, i32 8, i32 16, i32 32, i32 64, i32 128, i32 256>
; CHECK-NEXT: Cost Model: Found an estimated cost of 120 for instruction: %V16i32 = srem <16 x i32> undef, <i32 2, i32 4, i32 8, i32 16, i32 32, i32 64, i32 128, i32 256, i32 2, i32 4, i32 8, i32 16, i32 32, i32 64, i32 128, i32 256>		; CHECK-NEXT: Cost Model: Found an estimated cost of 112 for instruction: %V16i32 = srem <16 x i32> undef, <i32 2, i32 4, i32 8, i32 16, i32 32, i32 64, i32 128, i32 256, i32 2, i32 4, i32 8, i32 16, i32 32, i32 64, i32 128, i32 256>
; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %I16 = srem i16 undef, 16		; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %I16 = srem i16 undef, 16
; CHECK-NEXT: Cost Model: Found an estimated cost of 66 for instruction: %V8i16 = srem <8 x i16> undef, <i16 2, i16 4, i16 8, i16 16, i16 32, i16 64, i16 128, i16 256>		; CHECK-NEXT: Cost Model: Found an estimated cost of 56 for instruction: %V8i16 = srem <8 x i16> undef, <i16 2, i16 4, i16 8, i16 16, i16 32, i16 64, i16 128, i16 256>
; CHECK-NEXT: Cost Model: Found an estimated cost of 132 for instruction: %V16i16 = srem <16 x i16> undef, <i16 2, i16 4, i16 8, i16 16, i16 32, i16 64, i16 128, i16 256, i16 2, i16 4, i16 8, i16 16, i16 32, i16 64, i16 128, i16 256>		; CHECK-NEXT: Cost Model: Found an estimated cost of 112 for instruction: %V16i16 = srem <16 x i16> undef, <i16 2, i16 4, i16 8, i16 16, i16 32, i16 64, i16 128, i16 256, i16 2, i16 4, i16 8, i16 16, i16 32, i16 64, i16 128, i16 256>
; CHECK-NEXT: Cost Model: Found an estimated cost of 264 for instruction: %V32i16 = srem <32 x i16> undef, <i16 2, i16 4, i16 8, i16 16, i16 32, i16 64, i16 128, i16 256, i16 2, i16 4, i16 8, i16 16, i16 32, i16 64, i16 128, i16 256, i16 2, i16 4, i16 8, i16 16, i16 32, i16 64, i16 128, i16 256, i16 2, i16 4, i16 8, i16 16, i16 32, i16 64, i16 128, i16 256>		; CHECK-NEXT: Cost Model: Found an estimated cost of 224 for instruction: %V32i16 = srem <32 x i16> undef, <i16 2, i16 4, i16 8, i16 16, i16 32, i16 64, i16 128, i16 256, i16 2, i16 4, i16 8, i16 16, i16 32, i16 64, i16 128, i16 256, i16 2, i16 4, i16 8, i16 16, i16 32, i16 64, i16 128, i16 256, i16 2, i16 4, i16 8, i16 16, i16 32, i16 64, i16 128, i16 256>
; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %I8 = srem i8 undef, 16		; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %I8 = srem i8 undef, 16
; CHECK-NEXT: Cost Model: Found an estimated cost of 138 for instruction: %V16i8 = srem <16 x i8> undef, <i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 112 for instruction: %V16i8 = srem <16 x i8> undef, <i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 276 for instruction: %V32i8 = srem <32 x i8> undef, <i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 224 for instruction: %V32i8 = srem <32 x i8> undef, <i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 552 for instruction: %V64i8 = srem <64 x i8> undef, <i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 448 for instruction: %V64i8 = srem <64 x i8> undef, <i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
;		;
%I64 = srem i64 undef, 16		%I64 = srem i64 undef, 16
%V2i64 = srem <2 x i64> undef, <i64 8, i64 16>		%V2i64 = srem <2 x i64> undef, <i64 8, i64 16>
%V4i64 = srem <4 x i64> undef, <i64 2, i64 4, i64 8, i64 16>		%V4i64 = srem <4 x i64> undef, <i64 2, i64 4, i64 8, i64 16>
%V8i64 = srem <8 x i64> undef, <i64 2, i64 4, i64 8, i64 16, i64 32, i64 64, i64 128, i64 256>		%V8i64 = srem <8 x i64> undef, <i64 2, i64 4, i64 8, i64 16, i64 32, i64 64, i64 128, i64 256>

%I32 = srem i32 undef, 16		%I32 = srem i32 undef, 16
Show All 12 Lines	;
%V64i8 = srem <64 x i8> undef, <i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16>		%V64i8 = srem <64 x i8> undef, <i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16>

ret i32 undef		ret i32 undef
}		}

define i32 @urem_constpow2() {		define i32 @urem_constpow2() {
; CHECK-LABEL: 'urem_constpow2'		; CHECK-LABEL: 'urem_constpow2'
; CHECK-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %I64 = urem i64 undef, 16		; CHECK-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %I64 = urem i64 undef, 16
; CHECK-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %V2i64 = urem <2 x i64> undef, <i64 8, i64 16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %V2i64 = urem <2 x i64> undef, <i64 8, i64 16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %V4i64 = urem <4 x i64> undef, <i64 2, i64 4, i64 8, i64 16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %V4i64 = urem <4 x i64> undef, <i64 2, i64 4, i64 8, i64 16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 48 for instruction: %V8i64 = urem <8 x i64> undef, <i64 2, i64 4, i64 8, i64 16, i64 32, i64 64, i64 128, i64 256>		; CHECK-NEXT: Cost Model: Found an estimated cost of 56 for instruction: %V8i64 = urem <8 x i64> undef, <i64 2, i64 4, i64 8, i64 16, i64 32, i64 64, i64 128, i64 256>
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I32 = urem i32 undef, 16		; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I32 = urem i32 undef, 16
; CHECK-NEXT: Cost Model: Found an estimated cost of 30 for instruction: %V4i32 = urem <4 x i32> undef, <i32 2, i32 4, i32 8, i32 16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %V4i32 = urem <4 x i32> undef, <i32 2, i32 4, i32 8, i32 16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 60 for instruction: %V8i32 = urem <8 x i32> undef, <i32 2, i32 4, i32 8, i32 16, i32 32, i32 64, i32 128, i32 256>		; CHECK-NEXT: Cost Model: Found an estimated cost of 56 for instruction: %V8i32 = urem <8 x i32> undef, <i32 2, i32 4, i32 8, i32 16, i32 32, i32 64, i32 128, i32 256>
; CHECK-NEXT: Cost Model: Found an estimated cost of 120 for instruction: %V16i32 = urem <16 x i32> undef, <i32 2, i32 4, i32 8, i32 16, i32 32, i32 64, i32 128, i32 256, i32 2, i32 4, i32 8, i32 16, i32 32, i32 64, i32 128, i32 256>		; CHECK-NEXT: Cost Model: Found an estimated cost of 112 for instruction: %V16i32 = urem <16 x i32> undef, <i32 2, i32 4, i32 8, i32 16, i32 32, i32 64, i32 128, i32 256, i32 2, i32 4, i32 8, i32 16, i32 32, i32 64, i32 128, i32 256>
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I16 = urem i16 undef, 16		; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I16 = urem i16 undef, 16
; CHECK-NEXT: Cost Model: Found an estimated cost of 66 for instruction: %V8i16 = urem <8 x i16> undef, <i16 2, i16 4, i16 8, i16 16, i16 32, i16 64, i16 128, i16 256>		; CHECK-NEXT: Cost Model: Found an estimated cost of 56 for instruction: %V8i16 = urem <8 x i16> undef, <i16 2, i16 4, i16 8, i16 16, i16 32, i16 64, i16 128, i16 256>
; CHECK-NEXT: Cost Model: Found an estimated cost of 132 for instruction: %V16i16 = urem <16 x i16> undef, <i16 2, i16 4, i16 8, i16 16, i16 32, i16 64, i16 128, i16 256, i16 2, i16 4, i16 8, i16 16, i16 32, i16 64, i16 128, i16 256>		; CHECK-NEXT: Cost Model: Found an estimated cost of 112 for instruction: %V16i16 = urem <16 x i16> undef, <i16 2, i16 4, i16 8, i16 16, i16 32, i16 64, i16 128, i16 256, i16 2, i16 4, i16 8, i16 16, i16 32, i16 64, i16 128, i16 256>
; CHECK-NEXT: Cost Model: Found an estimated cost of 264 for instruction: %V32i16 = urem <32 x i16> undef, <i16 2, i16 4, i16 8, i16 16, i16 32, i16 64, i16 128, i16 256, i16 2, i16 4, i16 8, i16 16, i16 32, i16 64, i16 128, i16 256, i16 2, i16 4, i16 8, i16 16, i16 32, i16 64, i16 128, i16 256, i16 2, i16 4, i16 8, i16 16, i16 32, i16 64, i16 128, i16 256>		; CHECK-NEXT: Cost Model: Found an estimated cost of 224 for instruction: %V32i16 = urem <32 x i16> undef, <i16 2, i16 4, i16 8, i16 16, i16 32, i16 64, i16 128, i16 256, i16 2, i16 4, i16 8, i16 16, i16 32, i16 64, i16 128, i16 256, i16 2, i16 4, i16 8, i16 16, i16 32, i16 64, i16 128, i16 256, i16 2, i16 4, i16 8, i16 16, i16 32, i16 64, i16 128, i16 256>
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I8 = urem i8 undef, 16		; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I8 = urem i8 undef, 16
; CHECK-NEXT: Cost Model: Found an estimated cost of 138 for instruction: %V16i8 = urem <16 x i8> undef, <i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 112 for instruction: %V16i8 = urem <16 x i8> undef, <i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 276 for instruction: %V32i8 = urem <32 x i8> undef, <i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 224 for instruction: %V32i8 = urem <32 x i8> undef, <i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 552 for instruction: %V64i8 = urem <64 x i8> undef, <i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 448 for instruction: %V64i8 = urem <64 x i8> undef, <i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
;		;
%I64 = urem i64 undef, 16		%I64 = urem i64 undef, 16
%V2i64 = urem <2 x i64> undef, <i64 8, i64 16>		%V2i64 = urem <2 x i64> undef, <i64 8, i64 16>
%V4i64 = urem <4 x i64> undef, <i64 2, i64 4, i64 8, i64 16>		%V4i64 = urem <4 x i64> undef, <i64 2, i64 4, i64 8, i64 16>
%V8i64 = urem <8 x i64> undef, <i64 2, i64 4, i64 8, i64 16, i64 32, i64 64, i64 128, i64 256>		%V8i64 = urem <8 x i64> undef, <i64 2, i64 4, i64 8, i64 16, i64 32, i64 64, i64 128, i64 256>

%I32 = urem i32 undef, 16		%I32 = urem i32 undef, 16
Show All 12 Lines	;
%V64i8 = urem <64 x i8> undef, <i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16>		%V64i8 = urem <64 x i8> undef, <i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16, i8 2, i8 4, i8 8, i8 16>

ret i32 undef		ret i32 undef
}		}

define i32 @srem_uniformconstpow2() {		define i32 @srem_uniformconstpow2() {
; CHECK-LABEL: 'srem_uniformconstpow2'		; CHECK-LABEL: 'srem_uniformconstpow2'
; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %I64 = srem i64 undef, 16		; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %I64 = srem i64 undef, 16
; CHECK-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %V2i64 = srem <2 x i64> undef, <i64 16, i64 16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 22 for instruction: %V2i64 = srem <2 x i64> undef, <i64 16, i64 16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %V4i64 = srem <4 x i64> undef, <i64 16, i64 16, i64 16, i64 16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 44 for instruction: %V4i64 = srem <4 x i64> undef, <i64 16, i64 16, i64 16, i64 16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %V8i64 = srem <8 x i64> undef, <i64 16, i64 16, i64 16, i64 16, i64 16, i64 16, i64 16, i64 16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 88 for instruction: %V8i64 = srem <8 x i64> undef, <i64 16, i64 16, i64 16, i64 16, i64 16, i64 16, i64 16, i64 16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %I32 = srem i32 undef, 16		; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %I32 = srem i32 undef, 16
; CHECK-NEXT: Cost Model: Found an estimated cost of 46 for instruction: %V4i32 = srem <4 x i32> undef, <i32 16, i32 16, i32 16, i32 16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 44 for instruction: %V4i32 = srem <4 x i32> undef, <i32 16, i32 16, i32 16, i32 16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 92 for instruction: %V8i32 = srem <8 x i32> undef, <i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 88 for instruction: %V8i32 = srem <8 x i32> undef, <i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 184 for instruction: %V16i32 = srem <16 x i32> undef, <i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 176 for instruction: %V16i32 = srem <16 x i32> undef, <i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %I16 = srem i16 undef, 16		; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %I16 = srem i16 undef, 16
; CHECK-NEXT: Cost Model: Found an estimated cost of 98 for instruction: %V8i16 = srem <8 x i16> undef, <i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 88 for instruction: %V8i16 = srem <8 x i16> undef, <i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 196 for instruction: %V16i16 = srem <16 x i16> undef, <i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 176 for instruction: %V16i16 = srem <16 x i16> undef, <i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 392 for instruction: %V32i16 = srem <32 x i16> undef, <i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 352 for instruction: %V32i16 = srem <32 x i16> undef, <i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %I8 = srem i8 undef, 16		; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %I8 = srem i8 undef, 16
; CHECK-NEXT: Cost Model: Found an estimated cost of 202 for instruction: %V16i8 = srem <16 x i8> undef, <i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 176 for instruction: %V16i8 = srem <16 x i8> undef, <i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 404 for instruction: %V32i8 = srem <32 x i8> undef, <i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 352 for instruction: %V32i8 = srem <32 x i8> undef, <i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 808 for instruction: %V64i8 = srem <64 x i8> undef, <i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 704 for instruction: %V64i8 = srem <64 x i8> undef, <i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
;		;
%I64 = srem i64 undef, 16		%I64 = srem i64 undef, 16
%V2i64 = srem <2 x i64> undef, <i64 16, i64 16>		%V2i64 = srem <2 x i64> undef, <i64 16, i64 16>
%V4i64 = srem <4 x i64> undef, <i64 16, i64 16, i64 16, i64 16>		%V4i64 = srem <4 x i64> undef, <i64 16, i64 16, i64 16, i64 16>
%V8i64 = srem <8 x i64> undef, <i64 16, i64 16, i64 16, i64 16, i64 16, i64 16, i64 16, i64 16>		%V8i64 = srem <8 x i64> undef, <i64 16, i64 16, i64 16, i64 16, i64 16, i64 16, i64 16, i64 16>

%I32 = srem i32 undef, 16		%I32 = srem i32 undef, 16
Show All 12 Lines	;
%V64i8 = srem <64 x i8> undef, <i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16>		%V64i8 = srem <64 x i8> undef, <i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16>

ret i32 undef		ret i32 undef
}		}

define i32 @urem_uniformconstpow2() {		define i32 @urem_uniformconstpow2() {
; CHECK-LABEL: 'urem_uniformconstpow2'		; CHECK-LABEL: 'urem_uniformconstpow2'
; CHECK-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %I64 = urem i64 undef, 16		; CHECK-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %I64 = urem i64 undef, 16
; CHECK-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %V2i64 = urem <2 x i64> undef, <i64 16, i64 16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 26 for instruction: %V2i64 = urem <2 x i64> undef, <i64 16, i64 16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 48 for instruction: %V4i64 = urem <4 x i64> undef, <i64 16, i64 16, i64 16, i64 16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 52 for instruction: %V4i64 = urem <4 x i64> undef, <i64 16, i64 16, i64 16, i64 16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 96 for instruction: %V8i64 = urem <8 x i64> undef, <i64 16, i64 16, i64 16, i64 16, i64 16, i64 16, i64 16, i64 16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 104 for instruction: %V8i64 = urem <8 x i64> undef, <i64 16, i64 16, i64 16, i64 16, i64 16, i64 16, i64 16, i64 16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I32 = urem i32 undef, 16		; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I32 = urem i32 undef, 16
; CHECK-NEXT: Cost Model: Found an estimated cost of 30 for instruction: %V4i32 = urem <4 x i32> undef, <i32 16, i32 16, i32 16, i32 16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %V4i32 = urem <4 x i32> undef, <i32 16, i32 16, i32 16, i32 16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 60 for instruction: %V8i32 = urem <8 x i32> undef, <i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 56 for instruction: %V8i32 = urem <8 x i32> undef, <i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 120 for instruction: %V16i32 = urem <16 x i32> undef, <i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 112 for instruction: %V16i32 = urem <16 x i32> undef, <i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I16 = urem i16 undef, 16		; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I16 = urem i16 undef, 16
; CHECK-NEXT: Cost Model: Found an estimated cost of 66 for instruction: %V8i16 = urem <8 x i16> undef, <i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 56 for instruction: %V8i16 = urem <8 x i16> undef, <i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 132 for instruction: %V16i16 = urem <16 x i16> undef, <i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 112 for instruction: %V16i16 = urem <16 x i16> undef, <i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 264 for instruction: %V32i16 = urem <32 x i16> undef, <i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 224 for instruction: %V32i16 = urem <32 x i16> undef, <i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I8 = urem i8 undef, 16		; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I8 = urem i8 undef, 16
; CHECK-NEXT: Cost Model: Found an estimated cost of 138 for instruction: %V16i8 = urem <16 x i8> undef, <i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 112 for instruction: %V16i8 = urem <16 x i8> undef, <i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 276 for instruction: %V32i8 = urem <32 x i8> undef, <i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 224 for instruction: %V32i8 = urem <32 x i8> undef, <i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 552 for instruction: %V64i8 = urem <64 x i8> undef, <i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 448 for instruction: %V64i8 = urem <64 x i8> undef, <i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
;		;
%I64 = urem i64 undef, 16		%I64 = urem i64 undef, 16
%V2i64 = urem <2 x i64> undef, <i64 16, i64 16>		%V2i64 = urem <2 x i64> undef, <i64 16, i64 16>
%V4i64 = urem <4 x i64> undef, <i64 16, i64 16, i64 16, i64 16>		%V4i64 = urem <4 x i64> undef, <i64 16, i64 16, i64 16, i64 16>
%V8i64 = urem <8 x i64> undef, <i64 16, i64 16, i64 16, i64 16, i64 16, i64 16, i64 16, i64 16>		%V8i64 = urem <8 x i64> undef, <i64 16, i64 16, i64 16, i64 16, i64 16, i64 16, i64 16, i64 16>

%I32 = urem i32 undef, 16		%I32 = urem i32 undef, 16
Show All 12 Lines	;
%V64i8 = urem <64 x i8> undef, <i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16>		%V64i8 = urem <64 x i8> undef, <i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16>

ret i32 undef		ret i32 undef
}		}

define i32 @srem_constnegpow2() {		define i32 @srem_constnegpow2() {
; CHECK-LABEL: 'srem_constnegpow2'		; CHECK-LABEL: 'srem_constnegpow2'
; CHECK-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %I64 = srem i64 undef, -16		; CHECK-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %I64 = srem i64 undef, -16
; CHECK-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %V2i64 = srem <2 x i64> undef, <i64 -8, i64 -16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %V2i64 = srem <2 x i64> undef, <i64 -8, i64 -16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %V4i64 = srem <4 x i64> undef, <i64 -2, i64 -4, i64 -8, i64 -16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %V4i64 = srem <4 x i64> undef, <i64 -2, i64 -4, i64 -8, i64 -16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 48 for instruction: %V8i64 = srem <8 x i64> undef, <i64 -2, i64 -4, i64 -8, i64 -16, i64 -32, i64 -64, i64 -128, i64 -256>		; CHECK-NEXT: Cost Model: Found an estimated cost of 56 for instruction: %V8i64 = srem <8 x i64> undef, <i64 -2, i64 -4, i64 -8, i64 -16, i64 -32, i64 -64, i64 -128, i64 -256>
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I32 = srem i32 undef, -16		; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I32 = srem i32 undef, -16
; CHECK-NEXT: Cost Model: Found an estimated cost of 30 for instruction: %V4i32 = srem <4 x i32> undef, <i32 -2, i32 -4, i32 -8, i32 -16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %V4i32 = srem <4 x i32> undef, <i32 -2, i32 -4, i32 -8, i32 -16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 60 for instruction: %V8i32 = srem <8 x i32> undef, <i32 -2, i32 -4, i32 -8, i32 -16, i32 -32, i32 -64, i32 -128, i32 -256>		; CHECK-NEXT: Cost Model: Found an estimated cost of 56 for instruction: %V8i32 = srem <8 x i32> undef, <i32 -2, i32 -4, i32 -8, i32 -16, i32 -32, i32 -64, i32 -128, i32 -256>
; CHECK-NEXT: Cost Model: Found an estimated cost of 120 for instruction: %V16i32 = srem <16 x i32> undef, <i32 -2, i32 -4, i32 -8, i32 -16, i32 -32, i32 -64, i32 -128, i32 -256, i32 -2, i32 -4, i32 -8, i32 -16, i32 -32, i32 -64, i32 -128, i32 -256>		; CHECK-NEXT: Cost Model: Found an estimated cost of 112 for instruction: %V16i32 = srem <16 x i32> undef, <i32 -2, i32 -4, i32 -8, i32 -16, i32 -32, i32 -64, i32 -128, i32 -256, i32 -2, i32 -4, i32 -8, i32 -16, i32 -32, i32 -64, i32 -128, i32 -256>
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I16 = srem i16 undef, -16		; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I16 = srem i16 undef, -16
; CHECK-NEXT: Cost Model: Found an estimated cost of 66 for instruction: %V8i16 = srem <8 x i16> undef, <i16 -2, i16 -4, i16 -8, i16 -16, i16 -32, i16 -64, i16 -128, i16 -256>		; CHECK-NEXT: Cost Model: Found an estimated cost of 56 for instruction: %V8i16 = srem <8 x i16> undef, <i16 -2, i16 -4, i16 -8, i16 -16, i16 -32, i16 -64, i16 -128, i16 -256>
; CHECK-NEXT: Cost Model: Found an estimated cost of 132 for instruction: %V16i16 = srem <16 x i16> undef, <i16 -2, i16 -4, i16 -8, i16 -16, i16 -32, i16 -64, i16 -128, i16 -256, i16 -2, i16 -4, i16 -8, i16 -16, i16 -32, i16 -64, i16 -128, i16 -256>		; CHECK-NEXT: Cost Model: Found an estimated cost of 112 for instruction: %V16i16 = srem <16 x i16> undef, <i16 -2, i16 -4, i16 -8, i16 -16, i16 -32, i16 -64, i16 -128, i16 -256, i16 -2, i16 -4, i16 -8, i16 -16, i16 -32, i16 -64, i16 -128, i16 -256>
; CHECK-NEXT: Cost Model: Found an estimated cost of 264 for instruction: %V32i16 = srem <32 x i16> undef, <i16 -2, i16 -4, i16 -8, i16 -16, i16 -32, i16 -64, i16 -128, i16 -256, i16 -2, i16 -4, i16 -8, i16 -16, i16 -32, i16 -64, i16 -128, i16 -256, i16 -2, i16 -4, i16 -8, i16 -16, i16 -32, i16 -64, i16 -128, i16 -256, i16 -2, i16 -4, i16 -8, i16 -16, i16 -32, i16 -64, i16 -128, i16 -256>		; CHECK-NEXT: Cost Model: Found an estimated cost of 224 for instruction: %V32i16 = srem <32 x i16> undef, <i16 -2, i16 -4, i16 -8, i16 -16, i16 -32, i16 -64, i16 -128, i16 -256, i16 -2, i16 -4, i16 -8, i16 -16, i16 -32, i16 -64, i16 -128, i16 -256, i16 -2, i16 -4, i16 -8, i16 -16, i16 -32, i16 -64, i16 -128, i16 -256, i16 -2, i16 -4, i16 -8, i16 -16, i16 -32, i16 -64, i16 -128, i16 -256>
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I8 = srem i8 undef, -16		; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I8 = srem i8 undef, -16
; CHECK-NEXT: Cost Model: Found an estimated cost of 138 for instruction: %V16i8 = srem <16 x i8> undef, <i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 112 for instruction: %V16i8 = srem <16 x i8> undef, <i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 276 for instruction: %V32i8 = srem <32 x i8> undef, <i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 224 for instruction: %V32i8 = srem <32 x i8> undef, <i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 552 for instruction: %V64i8 = srem <64 x i8> undef, <i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 448 for instruction: %V64i8 = srem <64 x i8> undef, <i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
;		;
%I64 = srem i64 undef, -16		%I64 = srem i64 undef, -16
%V2i64 = srem <2 x i64> undef, <i64 -8, i64 -16>		%V2i64 = srem <2 x i64> undef, <i64 -8, i64 -16>
%V4i64 = srem <4 x i64> undef, <i64 -2, i64 -4, i64 -8, i64 -16>		%V4i64 = srem <4 x i64> undef, <i64 -2, i64 -4, i64 -8, i64 -16>
%V8i64 = srem <8 x i64> undef, <i64 -2, i64 -4, i64 -8, i64 -16, i64 -32, i64 -64, i64 -128, i64 -256>		%V8i64 = srem <8 x i64> undef, <i64 -2, i64 -4, i64 -8, i64 -16, i64 -32, i64 -64, i64 -128, i64 -256>

%I32 = srem i32 undef, -16		%I32 = srem i32 undef, -16
Show All 12 Lines	;
%V64i8 = srem <64 x i8> undef, <i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16>		%V64i8 = srem <64 x i8> undef, <i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16>

ret i32 undef		ret i32 undef
}		}

define i32 @urem_constnegpow2() {		define i32 @urem_constnegpow2() {
; CHECK-LABEL: 'urem_constnegpow2'		; CHECK-LABEL: 'urem_constnegpow2'
; CHECK-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %I64 = urem i64 undef, -16		; CHECK-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %I64 = urem i64 undef, -16
; CHECK-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %V2i64 = urem <2 x i64> undef, <i64 -8, i64 -16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %V2i64 = urem <2 x i64> undef, <i64 -8, i64 -16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %V4i64 = urem <4 x i64> undef, <i64 -2, i64 -4, i64 -8, i64 -16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %V4i64 = urem <4 x i64> undef, <i64 -2, i64 -4, i64 -8, i64 -16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 48 for instruction: %V8i64 = urem <8 x i64> undef, <i64 -2, i64 -4, i64 -8, i64 -16, i64 -32, i64 -64, i64 -128, i64 -256>		; CHECK-NEXT: Cost Model: Found an estimated cost of 56 for instruction: %V8i64 = urem <8 x i64> undef, <i64 -2, i64 -4, i64 -8, i64 -16, i64 -32, i64 -64, i64 -128, i64 -256>
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I32 = urem i32 undef, -16		; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I32 = urem i32 undef, -16
; CHECK-NEXT: Cost Model: Found an estimated cost of 30 for instruction: %V4i32 = urem <4 x i32> undef, <i32 -2, i32 -4, i32 -8, i32 -16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %V4i32 = urem <4 x i32> undef, <i32 -2, i32 -4, i32 -8, i32 -16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 60 for instruction: %V8i32 = urem <8 x i32> undef, <i32 -2, i32 -4, i32 -8, i32 -16, i32 -32, i32 -64, i32 -128, i32 -256>		; CHECK-NEXT: Cost Model: Found an estimated cost of 56 for instruction: %V8i32 = urem <8 x i32> undef, <i32 -2, i32 -4, i32 -8, i32 -16, i32 -32, i32 -64, i32 -128, i32 -256>
; CHECK-NEXT: Cost Model: Found an estimated cost of 120 for instruction: %V16i32 = urem <16 x i32> undef, <i32 -2, i32 -4, i32 -8, i32 -16, i32 -32, i32 -64, i32 -128, i32 -256, i32 -2, i32 -4, i32 -8, i32 -16, i32 -32, i32 -64, i32 -128, i32 -256>		; CHECK-NEXT: Cost Model: Found an estimated cost of 112 for instruction: %V16i32 = urem <16 x i32> undef, <i32 -2, i32 -4, i32 -8, i32 -16, i32 -32, i32 -64, i32 -128, i32 -256, i32 -2, i32 -4, i32 -8, i32 -16, i32 -32, i32 -64, i32 -128, i32 -256>
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I16 = urem i16 undef, -16		; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I16 = urem i16 undef, -16
; CHECK-NEXT: Cost Model: Found an estimated cost of 66 for instruction: %V8i16 = urem <8 x i16> undef, <i16 -2, i16 -4, i16 -8, i16 -16, i16 -32, i16 -64, i16 -128, i16 -256>		; CHECK-NEXT: Cost Model: Found an estimated cost of 56 for instruction: %V8i16 = urem <8 x i16> undef, <i16 -2, i16 -4, i16 -8, i16 -16, i16 -32, i16 -64, i16 -128, i16 -256>
; CHECK-NEXT: Cost Model: Found an estimated cost of 132 for instruction: %V16i16 = urem <16 x i16> undef, <i16 -2, i16 -4, i16 -8, i16 -16, i16 -32, i16 -64, i16 -128, i16 -256, i16 -2, i16 -4, i16 -8, i16 -16, i16 -32, i16 -64, i16 -128, i16 -256>		; CHECK-NEXT: Cost Model: Found an estimated cost of 112 for instruction: %V16i16 = urem <16 x i16> undef, <i16 -2, i16 -4, i16 -8, i16 -16, i16 -32, i16 -64, i16 -128, i16 -256, i16 -2, i16 -4, i16 -8, i16 -16, i16 -32, i16 -64, i16 -128, i16 -256>
; CHECK-NEXT: Cost Model: Found an estimated cost of 264 for instruction: %V32i16 = urem <32 x i16> undef, <i16 -2, i16 -4, i16 -8, i16 -16, i16 -32, i16 -64, i16 -128, i16 -256, i16 -2, i16 -4, i16 -8, i16 -16, i16 -32, i16 -64, i16 -128, i16 -256, i16 -2, i16 -4, i16 -8, i16 -16, i16 -32, i16 -64, i16 -128, i16 -256, i16 -2, i16 -4, i16 -8, i16 -16, i16 -32, i16 -64, i16 -128, i16 -256>		; CHECK-NEXT: Cost Model: Found an estimated cost of 224 for instruction: %V32i16 = urem <32 x i16> undef, <i16 -2, i16 -4, i16 -8, i16 -16, i16 -32, i16 -64, i16 -128, i16 -256, i16 -2, i16 -4, i16 -8, i16 -16, i16 -32, i16 -64, i16 -128, i16 -256, i16 -2, i16 -4, i16 -8, i16 -16, i16 -32, i16 -64, i16 -128, i16 -256, i16 -2, i16 -4, i16 -8, i16 -16, i16 -32, i16 -64, i16 -128, i16 -256>
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I8 = urem i8 undef, -16		; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I8 = urem i8 undef, -16
; CHECK-NEXT: Cost Model: Found an estimated cost of 138 for instruction: %V16i8 = urem <16 x i8> undef, <i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 112 for instruction: %V16i8 = urem <16 x i8> undef, <i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 276 for instruction: %V32i8 = urem <32 x i8> undef, <i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 224 for instruction: %V32i8 = urem <32 x i8> undef, <i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 552 for instruction: %V64i8 = urem <64 x i8> undef, <i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 448 for instruction: %V64i8 = urem <64 x i8> undef, <i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
;		;
%I64 = urem i64 undef, -16		%I64 = urem i64 undef, -16
%V2i64 = urem <2 x i64> undef, <i64 -8, i64 -16>		%V2i64 = urem <2 x i64> undef, <i64 -8, i64 -16>
%V4i64 = urem <4 x i64> undef, <i64 -2, i64 -4, i64 -8, i64 -16>		%V4i64 = urem <4 x i64> undef, <i64 -2, i64 -4, i64 -8, i64 -16>
%V8i64 = urem <8 x i64> undef, <i64 -2, i64 -4, i64 -8, i64 -16, i64 -32, i64 -64, i64 -128, i64 -256>		%V8i64 = urem <8 x i64> undef, <i64 -2, i64 -4, i64 -8, i64 -16, i64 -32, i64 -64, i64 -128, i64 -256>

%I32 = urem i32 undef, -16		%I32 = urem i32 undef, -16
Show All 12 Lines	;
%V64i8 = urem <64 x i8> undef, <i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16>		%V64i8 = urem <64 x i8> undef, <i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16>

ret i32 undef		ret i32 undef
}		}

define i32 @srem_uniformconstnegpow2() {		define i32 @srem_uniformconstnegpow2() {
; CHECK-LABEL: 'srem_uniformconstnegpow2'		; CHECK-LABEL: 'srem_uniformconstnegpow2'
; CHECK-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %I64 = srem i64 undef, -16		; CHECK-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %I64 = srem i64 undef, -16
; CHECK-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %V2i64 = srem <2 x i64> undef, <i64 -16, i64 -16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 26 for instruction: %V2i64 = srem <2 x i64> undef, <i64 -16, i64 -16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 48 for instruction: %V4i64 = srem <4 x i64> undef, <i64 -16, i64 -16, i64 -16, i64 -16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 52 for instruction: %V4i64 = srem <4 x i64> undef, <i64 -16, i64 -16, i64 -16, i64 -16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 96 for instruction: %V8i64 = srem <8 x i64> undef, <i64 -16, i64 -16, i64 -16, i64 -16, i64 -16, i64 -16, i64 -16, i64 -16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 104 for instruction: %V8i64 = srem <8 x i64> undef, <i64 -16, i64 -16, i64 -16, i64 -16, i64 -16, i64 -16, i64 -16, i64 -16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I32 = srem i32 undef, -16		; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I32 = srem i32 undef, -16
; CHECK-NEXT: Cost Model: Found an estimated cost of 30 for instruction: %V4i32 = srem <4 x i32> undef, <i32 -16, i32 -16, i32 -16, i32 -16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %V4i32 = srem <4 x i32> undef, <i32 -16, i32 -16, i32 -16, i32 -16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 60 for instruction: %V8i32 = srem <8 x i32> undef, <i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 56 for instruction: %V8i32 = srem <8 x i32> undef, <i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 120 for instruction: %V16i32 = srem <16 x i32> undef, <i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 112 for instruction: %V16i32 = srem <16 x i32> undef, <i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I16 = srem i16 undef, -16		; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I16 = srem i16 undef, -16
; CHECK-NEXT: Cost Model: Found an estimated cost of 66 for instruction: %V8i16 = srem <8 x i16> undef, <i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 56 for instruction: %V8i16 = srem <8 x i16> undef, <i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 132 for instruction: %V16i16 = srem <16 x i16> undef, <i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 112 for instruction: %V16i16 = srem <16 x i16> undef, <i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 264 for instruction: %V32i16 = srem <32 x i16> undef, <i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 224 for instruction: %V32i16 = srem <32 x i16> undef, <i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I8 = srem i8 undef, -16		; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I8 = srem i8 undef, -16
; CHECK-NEXT: Cost Model: Found an estimated cost of 138 for instruction: %V16i8 = srem <16 x i8> undef, <i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 112 for instruction: %V16i8 = srem <16 x i8> undef, <i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 276 for instruction: %V32i8 = srem <32 x i8> undef, <i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 224 for instruction: %V32i8 = srem <32 x i8> undef, <i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 552 for instruction: %V64i8 = srem <64 x i8> undef, <i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 448 for instruction: %V64i8 = srem <64 x i8> undef, <i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
;		;
%I64 = srem i64 undef, -16		%I64 = srem i64 undef, -16
%V2i64 = srem <2 x i64> undef, <i64 -16, i64 -16>		%V2i64 = srem <2 x i64> undef, <i64 -16, i64 -16>
%V4i64 = srem <4 x i64> undef, <i64 -16, i64 -16, i64 -16, i64 -16>		%V4i64 = srem <4 x i64> undef, <i64 -16, i64 -16, i64 -16, i64 -16>
%V8i64 = srem <8 x i64> undef, <i64 -16, i64 -16, i64 -16, i64 -16, i64 -16, i64 -16, i64 -16, i64 -16>		%V8i64 = srem <8 x i64> undef, <i64 -16, i64 -16, i64 -16, i64 -16, i64 -16, i64 -16, i64 -16, i64 -16>

%I32 = srem i32 undef, -16		%I32 = srem i32 undef, -16
Show All 12 Lines	;
%V64i8 = srem <64 x i8> undef, <i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16>		%V64i8 = srem <64 x i8> undef, <i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16>

ret i32 undef		ret i32 undef
}		}

define i32 @urem_uniformconstnegpow2() {		define i32 @urem_uniformconstnegpow2() {
; CHECK-LABEL: 'urem_uniformconstnegpow2'		; CHECK-LABEL: 'urem_uniformconstnegpow2'
; CHECK-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %I64 = urem i64 undef, -16		; CHECK-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %I64 = urem i64 undef, -16
; CHECK-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %V2i64 = urem <2 x i64> undef, <i64 -16, i64 -16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 26 for instruction: %V2i64 = urem <2 x i64> undef, <i64 -16, i64 -16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 48 for instruction: %V4i64 = urem <4 x i64> undef, <i64 -16, i64 -16, i64 -16, i64 -16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 52 for instruction: %V4i64 = urem <4 x i64> undef, <i64 -16, i64 -16, i64 -16, i64 -16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 96 for instruction: %V8i64 = urem <8 x i64> undef, <i64 -16, i64 -16, i64 -16, i64 -16, i64 -16, i64 -16, i64 -16, i64 -16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 104 for instruction: %V8i64 = urem <8 x i64> undef, <i64 -16, i64 -16, i64 -16, i64 -16, i64 -16, i64 -16, i64 -16, i64 -16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I32 = urem i32 undef, -16		; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I32 = urem i32 undef, -16
; CHECK-NEXT: Cost Model: Found an estimated cost of 30 for instruction: %V4i32 = urem <4 x i32> undef, <i32 -16, i32 -16, i32 -16, i32 -16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %V4i32 = urem <4 x i32> undef, <i32 -16, i32 -16, i32 -16, i32 -16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 60 for instruction: %V8i32 = urem <8 x i32> undef, <i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 56 for instruction: %V8i32 = urem <8 x i32> undef, <i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 120 for instruction: %V16i32 = urem <16 x i32> undef, <i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 112 for instruction: %V16i32 = urem <16 x i32> undef, <i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I16 = urem i16 undef, -16		; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I16 = urem i16 undef, -16
; CHECK-NEXT: Cost Model: Found an estimated cost of 66 for instruction: %V8i16 = urem <8 x i16> undef, <i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 56 for instruction: %V8i16 = urem <8 x i16> undef, <i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 132 for instruction: %V16i16 = urem <16 x i16> undef, <i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 112 for instruction: %V16i16 = urem <16 x i16> undef, <i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 264 for instruction: %V32i16 = urem <32 x i16> undef, <i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 224 for instruction: %V32i16 = urem <32 x i16> undef, <i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I8 = urem i8 undef, -16		; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I8 = urem i8 undef, -16
; CHECK-NEXT: Cost Model: Found an estimated cost of 138 for instruction: %V16i8 = urem <16 x i8> undef, <i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 112 for instruction: %V16i8 = urem <16 x i8> undef, <i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 276 for instruction: %V32i8 = urem <32 x i8> undef, <i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 224 for instruction: %V32i8 = urem <32 x i8> undef, <i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 552 for instruction: %V64i8 = urem <64 x i8> undef, <i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 448 for instruction: %V64i8 = urem <64 x i8> undef, <i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
;		;
%I64 = urem i64 undef, -16		%I64 = urem i64 undef, -16
%V2i64 = urem <2 x i64> undef, <i64 -16, i64 -16>		%V2i64 = urem <2 x i64> undef, <i64 -16, i64 -16>
%V4i64 = urem <4 x i64> undef, <i64 -16, i64 -16, i64 -16, i64 -16>		%V4i64 = urem <4 x i64> undef, <i64 -16, i64 -16, i64 -16, i64 -16>
%V8i64 = urem <8 x i64> undef, <i64 -16, i64 -16, i64 -16, i64 -16, i64 -16, i64 -16, i64 -16, i64 -16>		%V8i64 = urem <8 x i64> undef, <i64 -16, i64 -16, i64 -16, i64 -16, i64 -16, i64 -16, i64 -16, i64 -16>

%I32 = urem i32 undef, -16		%I32 = urem i32 undef, -16
Show All 16 Lines

llvm/test/Analysis/CostModel/AArch64/shuffle-load.ll

Show First 20 Lines • Show All 259 Lines • ▼ Show 20 Lines	entry:
ret <2 x double> %lane		ret <2 x double> %lane
}		}

; Check ld1r generated from scalar integer loads		; Check ld1r generated from scalar integer loads

define <8 x i8> @ld1r_8b_int_shuff(ptr nocapture %x) {		define <8 x i8> @ld1r_8b_int_shuff(ptr nocapture %x) {
; CHECK-LABEL: 'ld1r_8b_int_shuff'		; CHECK-LABEL: 'ld1r_8b_int_shuff'
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %tmp = load i8, ptr %x, align 2		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %tmp = load i8, ptr %x, align 2
; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %tmp1 = insertelement <8 x i8> undef, i8 %tmp, i8 0		; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %tmp1 = insertelement <8 x i8> undef, i8 %tmp, i8 0
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %lane = shufflevector <8 x i8> %tmp1, <8 x i8> undef, <8 x i32> zeroinitializer		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %lane = shufflevector <8 x i8> %tmp1, <8 x i8> undef, <8 x i32> zeroinitializer
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <8 x i8> %lane		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <8 x i8> %lane
;		;
; CODESIZE-LABEL: 'ld1r_8b_int_shuff'		; CODESIZE-LABEL: 'ld1r_8b_int_shuff'
; CODESIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %tmp = load i8, ptr %x, align 2		; CODESIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %tmp = load i8, ptr %x, align 2
; CODESIZE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %tmp1 = insertelement <8 x i8> undef, i8 %tmp, i8 0		; CODESIZE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %tmp1 = insertelement <8 x i8> undef, i8 %tmp, i8 0
; CODESIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %lane = shufflevector <8 x i8> %tmp1, <8 x i8> undef, <8 x i32> zeroinitializer		; CODESIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %lane = shufflevector <8 x i8> %tmp1, <8 x i8> undef, <8 x i32> zeroinitializer
; CODESIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret <8 x i8> %lane		; CODESIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret <8 x i8> %lane
;		;
entry:		entry:
%tmp = load i8, ptr %x, align 2		%tmp = load i8, ptr %x, align 2
%tmp1 = insertelement <8 x i8> undef, i8 %tmp, i8 0		%tmp1 = insertelement <8 x i8> undef, i8 %tmp, i8 0
%lane = shufflevector <8 x i8> %tmp1, <8 x i8> undef, <8 x i32> zeroinitializer		%lane = shufflevector <8 x i8> %tmp1, <8 x i8> undef, <8 x i32> zeroinitializer
ret <8 x i8> %lane		ret <8 x i8> %lane
}		}

define <16 x i8> @ld1r_16b_int_shuff(ptr nocapture %x) {		define <16 x i8> @ld1r_16b_int_shuff(ptr nocapture %x) {
; CHECK-LABEL: 'ld1r_16b_int_shuff'		; CHECK-LABEL: 'ld1r_16b_int_shuff'
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %tmp = load i8, ptr %x, align 2		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %tmp = load i8, ptr %x, align 2
; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %tmp1 = insertelement <16 x i8> undef, i8 %tmp, i8 0		; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %tmp1 = insertelement <16 x i8> undef, i8 %tmp, i8 0
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %lane = shufflevector <16 x i8> %tmp1, <16 x i8> undef, <16 x i32> zeroinitializer		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %lane = shufflevector <16 x i8> %tmp1, <16 x i8> undef, <16 x i32> zeroinitializer
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <16 x i8> %lane		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <16 x i8> %lane
;		;
; CODESIZE-LABEL: 'ld1r_16b_int_shuff'		; CODESIZE-LABEL: 'ld1r_16b_int_shuff'
; CODESIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %tmp = load i8, ptr %x, align 2		; CODESIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %tmp = load i8, ptr %x, align 2
; CODESIZE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %tmp1 = insertelement <16 x i8> undef, i8 %tmp, i8 0		; CODESIZE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %tmp1 = insertelement <16 x i8> undef, i8 %tmp, i8 0
; CODESIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %lane = shufflevector <16 x i8> %tmp1, <16 x i8> undef, <16 x i32> zeroinitializer		; CODESIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %lane = shufflevector <16 x i8> %tmp1, <16 x i8> undef, <16 x i32> zeroinitializer
; CODESIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret <16 x i8> %lane		; CODESIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret <16 x i8> %lane
;		;
entry:		entry:
%tmp = load i8, ptr %x, align 2		%tmp = load i8, ptr %x, align 2
%tmp1 = insertelement <16 x i8> undef, i8 %tmp, i8 0		%tmp1 = insertelement <16 x i8> undef, i8 %tmp, i8 0
%lane = shufflevector <16 x i8> %tmp1, <16 x i8> undef, <16 x i32> zeroinitializer		%lane = shufflevector <16 x i8> %tmp1, <16 x i8> undef, <16 x i32> zeroinitializer
ret <16 x i8> %lane		ret <16 x i8> %lane
}		}

define <4 x i16> @ld1r_4h_int_shuff(ptr nocapture %x) {		define <4 x i16> @ld1r_4h_int_shuff(ptr nocapture %x) {
; CHECK-LABEL: 'ld1r_4h_int_shuff'		; CHECK-LABEL: 'ld1r_4h_int_shuff'
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %tmp = load i16, ptr %x, align 2		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %tmp = load i16, ptr %x, align 2
; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %tmp1 = insertelement <4 x i16> undef, i16 %tmp, i16 0		; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %tmp1 = insertelement <4 x i16> undef, i16 %tmp, i16 0
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %lane = shufflevector <4 x i16> %tmp1, <4 x i16> undef, <4 x i32> zeroinitializer		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %lane = shufflevector <4 x i16> %tmp1, <4 x i16> undef, <4 x i32> zeroinitializer
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i16> %lane		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i16> %lane
;		;
; CODESIZE-LABEL: 'ld1r_4h_int_shuff'		; CODESIZE-LABEL: 'ld1r_4h_int_shuff'
; CODESIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %tmp = load i16, ptr %x, align 2		; CODESIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %tmp = load i16, ptr %x, align 2
; CODESIZE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %tmp1 = insertelement <4 x i16> undef, i16 %tmp, i16 0		; CODESIZE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %tmp1 = insertelement <4 x i16> undef, i16 %tmp, i16 0
; CODESIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %lane = shufflevector <4 x i16> %tmp1, <4 x i16> undef, <4 x i32> zeroinitializer		; CODESIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %lane = shufflevector <4 x i16> %tmp1, <4 x i16> undef, <4 x i32> zeroinitializer
; CODESIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret <4 x i16> %lane		; CODESIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret <4 x i16> %lane
;		;
entry:		entry:
%tmp = load i16, ptr %x, align 2		%tmp = load i16, ptr %x, align 2
%tmp1 = insertelement <4 x i16> undef, i16 %tmp, i16 0		%tmp1 = insertelement <4 x i16> undef, i16 %tmp, i16 0
%lane = shufflevector <4 x i16> %tmp1, <4 x i16> undef, <4 x i32> zeroinitializer		%lane = shufflevector <4 x i16> %tmp1, <4 x i16> undef, <4 x i32> zeroinitializer
ret <4 x i16> %lane		ret <4 x i16> %lane
}		}

define <8 x i16> @ld1r_8h_int_shuff(ptr nocapture %x) {		define <8 x i16> @ld1r_8h_int_shuff(ptr nocapture %x) {
; CHECK-LABEL: 'ld1r_8h_int_shuff'		; CHECK-LABEL: 'ld1r_8h_int_shuff'
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %tmp = load i16, ptr %x, align 2		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %tmp = load i16, ptr %x, align 2
; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %tmp1 = insertelement <8 x i16> undef, i16 %tmp, i16 0		; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %tmp1 = insertelement <8 x i16> undef, i16 %tmp, i16 0
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %lane = shufflevector <8 x i16> %tmp1, <8 x i16> undef, <8 x i32> zeroinitializer		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %lane = shufflevector <8 x i16> %tmp1, <8 x i16> undef, <8 x i32> zeroinitializer
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <8 x i16> %lane		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <8 x i16> %lane
;		;
; CODESIZE-LABEL: 'ld1r_8h_int_shuff'		; CODESIZE-LABEL: 'ld1r_8h_int_shuff'
; CODESIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %tmp = load i16, ptr %x, align 2		; CODESIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %tmp = load i16, ptr %x, align 2
; CODESIZE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %tmp1 = insertelement <8 x i16> undef, i16 %tmp, i16 0		; CODESIZE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %tmp1 = insertelement <8 x i16> undef, i16 %tmp, i16 0
; CODESIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %lane = shufflevector <8 x i16> %tmp1, <8 x i16> undef, <8 x i32> zeroinitializer		; CODESIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %lane = shufflevector <8 x i16> %tmp1, <8 x i16> undef, <8 x i32> zeroinitializer
; CODESIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret <8 x i16> %lane		; CODESIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret <8 x i16> %lane
;		;
entry:		entry:
%tmp = load i16, ptr %x, align 2		%tmp = load i16, ptr %x, align 2
%tmp1 = insertelement <8 x i16> undef, i16 %tmp, i16 0		%tmp1 = insertelement <8 x i16> undef, i16 %tmp, i16 0
%lane = shufflevector <8 x i16> %tmp1, <8 x i16> undef, <8 x i32> zeroinitializer		%lane = shufflevector <8 x i16> %tmp1, <8 x i16> undef, <8 x i32> zeroinitializer
ret <8 x i16> %lane		ret <8 x i16> %lane
}		}

define <2 x i32> @ld1r_2s_int_shuff(ptr nocapture %x) {		define <2 x i32> @ld1r_2s_int_shuff(ptr nocapture %x) {
; CHECK-LABEL: 'ld1r_2s_int_shuff'		; CHECK-LABEL: 'ld1r_2s_int_shuff'
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %tmp = load i32, ptr %x, align 4		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %tmp = load i32, ptr %x, align 4
; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %tmp1 = insertelement <2 x i32> undef, i32 %tmp, i32 0		; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %tmp1 = insertelement <2 x i32> undef, i32 %tmp, i32 0
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %lane = shufflevector <2 x i32> %tmp1, <2 x i32> undef, <2 x i32> zeroinitializer		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %lane = shufflevector <2 x i32> %tmp1, <2 x i32> undef, <2 x i32> zeroinitializer
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <2 x i32> %lane		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <2 x i32> %lane
;		;
; CODESIZE-LABEL: 'ld1r_2s_int_shuff'		; CODESIZE-LABEL: 'ld1r_2s_int_shuff'
; CODESIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %tmp = load i32, ptr %x, align 4		; CODESIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %tmp = load i32, ptr %x, align 4
; CODESIZE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %tmp1 = insertelement <2 x i32> undef, i32 %tmp, i32 0		; CODESIZE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %tmp1 = insertelement <2 x i32> undef, i32 %tmp, i32 0
; CODESIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %lane = shufflevector <2 x i32> %tmp1, <2 x i32> undef, <2 x i32> zeroinitializer		; CODESIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %lane = shufflevector <2 x i32> %tmp1, <2 x i32> undef, <2 x i32> zeroinitializer
; CODESIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret <2 x i32> %lane		; CODESIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret <2 x i32> %lane
;		;
entry:		entry:
%tmp = load i32, ptr %x, align 4		%tmp = load i32, ptr %x, align 4
%tmp1 = insertelement <2 x i32> undef, i32 %tmp, i32 0		%tmp1 = insertelement <2 x i32> undef, i32 %tmp, i32 0
%lane = shufflevector <2 x i32> %tmp1, <2 x i32> undef, <2 x i32> zeroinitializer		%lane = shufflevector <2 x i32> %tmp1, <2 x i32> undef, <2 x i32> zeroinitializer
ret <2 x i32> %lane		ret <2 x i32> %lane
}		}

define <4 x i32> @ld1r_4s_int_shuff(ptr nocapture %x) {		define <4 x i32> @ld1r_4s_int_shuff(ptr nocapture %x) {
; CHECK-LABEL: 'ld1r_4s_int_shuff'		; CHECK-LABEL: 'ld1r_4s_int_shuff'
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %tmp = load i32, ptr %x, align 4		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %tmp = load i32, ptr %x, align 4
; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %tmp1 = insertelement <4 x i32> undef, i32 %tmp, i32 0		; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %tmp1 = insertelement <4 x i32> undef, i32 %tmp, i32 0
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %lane = shufflevector <4 x i32> %tmp1, <4 x i32> undef, <4 x i32> zeroinitializer		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %lane = shufflevector <4 x i32> %tmp1, <4 x i32> undef, <4 x i32> zeroinitializer
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i32> %lane		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i32> %lane
;		;
; CODESIZE-LABEL: 'ld1r_4s_int_shuff'		; CODESIZE-LABEL: 'ld1r_4s_int_shuff'
; CODESIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %tmp = load i32, ptr %x, align 4		; CODESIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %tmp = load i32, ptr %x, align 4
; CODESIZE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %tmp1 = insertelement <4 x i32> undef, i32 %tmp, i32 0		; CODESIZE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %tmp1 = insertelement <4 x i32> undef, i32 %tmp, i32 0
; CODESIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %lane = shufflevector <4 x i32> %tmp1, <4 x i32> undef, <4 x i32> zeroinitializer		; CODESIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %lane = shufflevector <4 x i32> %tmp1, <4 x i32> undef, <4 x i32> zeroinitializer
; CODESIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret <4 x i32> %lane		; CODESIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret <4 x i32> %lane
;		;
entry:		entry:
%tmp = load i32, ptr %x, align 4		%tmp = load i32, ptr %x, align 4
%tmp1 = insertelement <4 x i32> undef, i32 %tmp, i32 0		%tmp1 = insertelement <4 x i32> undef, i32 %tmp, i32 0
%lane = shufflevector <4 x i32> %tmp1, <4 x i32> undef, <4 x i32> zeroinitializer		%lane = shufflevector <4 x i32> %tmp1, <4 x i32> undef, <4 x i32> zeroinitializer
ret <4 x i32> %lane		ret <4 x i32> %lane
}		}

define <2 x i64> @ld1r_2d_int_shuff(ptr nocapture %x) {		define <2 x i64> @ld1r_2d_int_shuff(ptr nocapture %x) {
; CHECK-LABEL: 'ld1r_2d_int_shuff'		; CHECK-LABEL: 'ld1r_2d_int_shuff'
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %tmp = load i64, ptr %x, align 8		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %tmp = load i64, ptr %x, align 8
; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %tmp1 = insertelement <2 x i64> undef, i64 %tmp, i32 0		; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %tmp1 = insertelement <2 x i64> undef, i64 %tmp, i32 0
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %lane = shufflevector <2 x i64> %tmp1, <2 x i64> undef, <2 x i32> zeroinitializer		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %lane = shufflevector <2 x i64> %tmp1, <2 x i64> undef, <2 x i32> zeroinitializer
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <2 x i64> %lane		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <2 x i64> %lane
;		;
; CODESIZE-LABEL: 'ld1r_2d_int_shuff'		; CODESIZE-LABEL: 'ld1r_2d_int_shuff'
; CODESIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %tmp = load i64, ptr %x, align 8		; CODESIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %tmp = load i64, ptr %x, align 8
; CODESIZE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %tmp1 = insertelement <2 x i64> undef, i64 %tmp, i32 0		; CODESIZE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %tmp1 = insertelement <2 x i64> undef, i64 %tmp, i32 0
; CODESIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %lane = shufflevector <2 x i64> %tmp1, <2 x i64> undef, <2 x i32> zeroinitializer		; CODESIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %lane = shufflevector <2 x i64> %tmp1, <2 x i64> undef, <2 x i32> zeroinitializer
; CODESIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret <2 x i64> %lane		; CODESIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret <2 x i64> %lane
;		;
entry:		entry:
%tmp = load i64, ptr %x, align 8		%tmp = load i64, ptr %x, align 8
%tmp1 = insertelement <2 x i64> undef, i64 %tmp, i32 0		%tmp1 = insertelement <2 x i64> undef, i64 %tmp, i32 0
%lane = shufflevector <2 x i64> %tmp1, <2 x i64> undef, <2 x i32> zeroinitializer		%lane = shufflevector <2 x i64> %tmp1, <2 x i64> undef, <2 x i32> zeroinitializer
ret <2 x i64> %lane		ret <2 x i64> %lane
}		}

llvm/test/Analysis/CostModel/AArch64/shuffle-other.ll

Show First 20 Lines • Show All 81 Lines • ▼ Show 20 Lines	;

%v4f64 = shufflevector <2 x double> undef, <2 x double> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		%v4f64 = shufflevector <2 x double> undef, <2 x double> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>

ret void		ret void
}		}

define void @insert_subvec() {		define void @insert_subvec() {
; CHECK-LABEL: 'insert_subvec'		; CHECK-LABEL: 'insert_subvec'
; CHECK-NEXT: Cost Model: Found an estimated cost of 18 for instruction: %v4i8_2_0 = shufflevector <4 x i8> undef, <4 x i8> undef, <4 x i32> <i32 0, i32 1, i32 6, i32 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %v4i8_2_0 = shufflevector <4 x i8> undef, <4 x i8> undef, <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i8_2_1 = shufflevector <4 x i8> undef, <4 x i8> undef, <4 x i32> <i32 4, i32 5, i32 0, i32 1>		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i8_2_1 = shufflevector <4 x i8> undef, <4 x i8> undef, <4 x i32> <i32 4, i32 5, i32 0, i32 1>
; CHECK-NEXT: Cost Model: Found an estimated cost of 42 for instruction: %v8i8_2_0 = shufflevector <8 x i8> undef, <8 x i8> undef, <8 x i32> <i32 8, i32 9, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %v8i8_2_0 = shufflevector <8 x i8> undef, <8 x i8> undef, <8 x i32> <i32 8, i32 9, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i8_2_1 = shufflevector <8 x i8> undef, <8 x i8> undef, <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 4, i32 5, i32 6, i32 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i8_2_1 = shufflevector <8 x i8> undef, <8 x i8> undef, <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 4, i32 5, i32 6, i32 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i8_2_2 = shufflevector <8 x i8> undef, <8 x i8> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 6, i32 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i8_2_2 = shufflevector <8 x i8> undef, <8 x i8> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 6, i32 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i8_2_3 = shufflevector <8 x i8> undef, <8 x i8> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i8_2_3 = shufflevector <8 x i8> undef, <8 x i8> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
; CHECK-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %v8i8_2_05 = shufflevector <8 x i8> undef, <8 x i8> undef, <8 x i32> <i32 0, i32 8, i32 9, i32 3, i32 4, i32 5, i32 6, i32 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %v8i8_2_05 = shufflevector <8 x i8> undef, <8 x i8> undef, <8 x i32> <i32 0, i32 8, i32 9, i32 3, i32 4, i32 5, i32 6, i32 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 90 for instruction: %v16i8_4_0 = shufflevector <16 x i8> undef, <16 x i8> undef, <16 x i32> <i32 16, i32 17, i32 18, i32 19, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		; CHECK-NEXT: Cost Model: Found an estimated cost of 60 for instruction: %v16i8_4_0 = shufflevector <16 x i8> undef, <16 x i8> undef, <16 x i32> <i32 16, i32 17, i32 18, i32 19, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i8_4_1 = shufflevector <16 x i8> undef, <16 x i8> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 16, i32 17, i32 18, i32 19, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i8_4_1 = shufflevector <16 x i8> undef, <16 x i8> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 16, i32 17, i32 18, i32 19, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i8_4_2 = shufflevector <16 x i8> undef, <16 x i8> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 16, i32 17, i32 18, i32 19, i32 12, i32 13, i32 14, i32 15>		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i8_4_2 = shufflevector <16 x i8> undef, <16 x i8> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 16, i32 17, i32 18, i32 19, i32 12, i32 13, i32 14, i32 15>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i8_4_3 = shufflevector <16 x i8> undef, <16 x i8> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 16, i32 17, i32 18, i32 19>		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i8_4_3 = shufflevector <16 x i8> undef, <16 x i8> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 16, i32 17, i32 18, i32 19>
; CHECK-NEXT: Cost Model: Found an estimated cost of 21 for instruction: %v16i8_4_05 = shufflevector <16 x i8> undef, <16 x i8> undef, <16 x i32> <i32 0, i32 1, i32 16, i32 17, i32 18, i32 19, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		; CHECK-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %v16i8_4_05 = shufflevector <16 x i8> undef, <16 x i8> undef, <16 x i32> <i32 0, i32 1, i32 16, i32 17, i32 18, i32 19, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i16_2_0 = shufflevector <4 x i16> undef, <4 x i16> undef, <4 x i32> <i32 0, i32 1, i32 6, i32 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i16_2_0 = shufflevector <4 x i16> undef, <4 x i16> undef, <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i16_2_1 = shufflevector <4 x i16> undef, <4 x i16> undef, <4 x i32> <i32 4, i32 5, i32 0, i32 1>		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i16_2_1 = shufflevector <4 x i16> undef, <4 x i16> undef, <4 x i32> <i32 4, i32 5, i32 0, i32 1>
; CHECK-NEXT: Cost Model: Found an estimated cost of 42 for instruction: %v8i16_2_0 = shufflevector <8 x i16> undef, <8 x i16> undef, <8 x i32> <i32 8, i32 9, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %v8i16_2_0 = shufflevector <8 x i16> undef, <8 x i16> undef, <8 x i32> <i32 8, i32 9, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i16_2_1 = shufflevector <8 x i16> undef, <8 x i16> undef, <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 4, i32 5, i32 6, i32 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i16_2_1 = shufflevector <8 x i16> undef, <8 x i16> undef, <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 4, i32 5, i32 6, i32 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i16_2_2 = shufflevector <8 x i16> undef, <8 x i16> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 6, i32 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i16_2_2 = shufflevector <8 x i16> undef, <8 x i16> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 6, i32 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i16_2_3 = shufflevector <8 x i16> undef, <8 x i16> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i16_2_3 = shufflevector <8 x i16> undef, <8 x i16> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
; CHECK-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %v8i16_2_05 = shufflevector <8 x i16> undef, <8 x i16> undef, <8 x i32> <i32 0, i32 8, i32 9, i32 3, i32 4, i32 5, i32 6, i32 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %v8i16_2_05 = shufflevector <8 x i16> undef, <8 x i16> undef, <8 x i32> <i32 0, i32 8, i32 9, i32 3, i32 4, i32 5, i32 6, i32 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 50 for instruction: %v16i16_4_0 = shufflevector <16 x i16> undef, <16 x i16> undef, <16 x i32> <i32 16, i32 17, i32 18, i32 19, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		; CHECK-NEXT: Cost Model: Found an estimated cost of 36 for instruction: %v16i16_4_0 = shufflevector <16 x i16> undef, <16 x i16> undef, <16 x i32> <i32 16, i32 17, i32 18, i32 19, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i16_4_1 = shufflevector <16 x i16> undef, <16 x i16> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 16, i32 17, i32 18, i32 19, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i16_4_1 = shufflevector <16 x i16> undef, <16 x i16> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 16, i32 17, i32 18, i32 19, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i16_4_2 = shufflevector <16 x i16> undef, <16 x i16> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 16, i32 17, i32 18, i32 19, i32 12, i32 13, i32 14, i32 15>		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i16_4_2 = shufflevector <16 x i16> undef, <16 x i16> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 16, i32 17, i32 18, i32 19, i32 12, i32 13, i32 14, i32 15>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i16_4_3 = shufflevector <16 x i16> undef, <16 x i16> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 16, i32 17, i32 18, i32 19>		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i16_4_3 = shufflevector <16 x i16> undef, <16 x i16> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 16, i32 17, i32 18, i32 19>
; CHECK-NEXT: Cost Model: Found an estimated cost of 21 for instruction: %v16i16_4_05 = shufflevector <16 x i16> undef, <16 x i16> undef, <16 x i32> <i32 0, i32 1, i32 16, i32 17, i32 18, i32 19, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		; CHECK-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %v16i16_4_05 = shufflevector <16 x i16> undef, <16 x i16> undef, <16 x i32> <i32 0, i32 1, i32 16, i32 17, i32 18, i32 19, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i32_2_0 = shufflevector <4 x i32> undef, <4 x i32> undef, <4 x i32> <i32 0, i32 1, i32 6, i32 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i32_2_0 = shufflevector <4 x i32> undef, <4 x i32> undef, <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i32_2_1 = shufflevector <4 x i32> undef, <4 x i32> undef, <4 x i32> <i32 4, i32 5, i32 0, i32 1>		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v4i32_2_1 = shufflevector <4 x i32> undef, <4 x i32> undef, <4 x i32> <i32 4, i32 5, i32 0, i32 1>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i32_2_0 = shufflevector <8 x i32> undef, <8 x i32> undef, <8 x i32> <i32 8, i32 9, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i32_2_0 = shufflevector <8 x i32> undef, <8 x i32> undef, <8 x i32> <i32 8, i32 9, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i32_2_1 = shufflevector <8 x i32> undef, <8 x i32> undef, <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 4, i32 5, i32 6, i32 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i32_2_1 = shufflevector <8 x i32> undef, <8 x i32> undef, <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 4, i32 5, i32 6, i32 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i32_2_2 = shufflevector <8 x i32> undef, <8 x i32> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 6, i32 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i32_2_2 = shufflevector <8 x i32> undef, <8 x i32> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 6, i32 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i32_2_3 = shufflevector <8 x i32> undef, <8 x i32> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8i32_2_3 = shufflevector <8 x i32> undef, <8 x i32> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
; CHECK-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %v8i32_2_05 = shufflevector <8 x i32> undef, <8 x i32> undef, <8 x i32> <i32 0, i32 8, i32 9, i32 3, i32 4, i32 5, i32 6, i32 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %v8i32_2_05 = shufflevector <8 x i32> undef, <8 x i32> undef, <8 x i32> <i32 0, i32 8, i32 9, i32 3, i32 4, i32 5, i32 6, i32 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %v16i32_4_0 = shufflevector <16 x i32> undef, <16 x i32> undef, <16 x i32> <i32 16, i32 17, i32 18, i32 19, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %v16i32_4_0 = shufflevector <16 x i32> undef, <16 x i32> undef, <16 x i32> <i32 16, i32 17, i32 18, i32 19, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i32_4_1 = shufflevector <16 x i32> undef, <16 x i32> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 16, i32 17, i32 18, i32 19, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i32_4_1 = shufflevector <16 x i32> undef, <16 x i32> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 16, i32 17, i32 18, i32 19, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i32_4_2 = shufflevector <16 x i32> undef, <16 x i32> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 16, i32 17, i32 18, i32 19, i32 12, i32 13, i32 14, i32 15>		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i32_4_2 = shufflevector <16 x i32> undef, <16 x i32> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 16, i32 17, i32 18, i32 19, i32 12, i32 13, i32 14, i32 15>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i32_4_3 = shufflevector <16 x i32> undef, <16 x i32> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 16, i32 17, i32 18, i32 19>		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16i32_4_3 = shufflevector <16 x i32> undef, <16 x i32> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 16, i32 17, i32 18, i32 19>
; CHECK-NEXT: Cost Model: Found an estimated cost of 18 for instruction: %v16i32_4_05 = shufflevector <16 x i32> undef, <16 x i32> undef, <16 x i32> <i32 0, i32 1, i32 16, i32 17, i32 18, i32 19, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		; CHECK-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %v16i32_4_05 = shufflevector <16 x i32> undef, <16 x i32> undef, <16 x i32> <i32 0, i32 1, i32 16, i32 17, i32 18, i32 19, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
%v4i8_2_0 = shufflevector <4 x i8> undef, <4 x i8> undef, <4 x i32> <i32 0, i32 1, i32 6, i32 7>		%v4i8_2_0 = shufflevector <4 x i8> undef, <4 x i8> undef, <4 x i32> <i32 0, i32 1, i32 6, i32 7>
%v4i8_2_1 = shufflevector <4 x i8> undef, <4 x i8> undef, <4 x i32> <i32 4, i32 5, i32 0, i32 1>		%v4i8_2_1 = shufflevector <4 x i8> undef, <4 x i8> undef, <4 x i32> <i32 4, i32 5, i32 0, i32 1>
%v8i8_2_0 = shufflevector <8 x i8> undef, <8 x i8> undef, <8 x i32> <i32 8, i32 9, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		%v8i8_2_0 = shufflevector <8 x i8> undef, <8 x i8> undef, <8 x i32> <i32 8, i32 9, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
%v8i8_2_1 = shufflevector <8 x i8> undef, <8 x i8> undef, <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 4, i32 5, i32 6, i32 7>		%v8i8_2_1 = shufflevector <8 x i8> undef, <8 x i8> undef, <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 4, i32 5, i32 6, i32 7>
%v8i8_2_2 = shufflevector <8 x i8> undef, <8 x i8> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 6, i32 7>		%v8i8_2_2 = shufflevector <8 x i8> undef, <8 x i8> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 6, i32 7>
%v8i8_2_3 = shufflevector <8 x i8> undef, <8 x i8> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>		%v8i8_2_3 = shufflevector <8 x i8> undef, <8 x i8> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
Show All 30 Lines	;
%v16i32_4_3 = shufflevector <16 x i32> undef, <16 x i32> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 16, i32 17, i32 18, i32 19>		%v16i32_4_3 = shufflevector <16 x i32> undef, <16 x i32> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 16, i32 17, i32 18, i32 19>
%v16i32_4_05 = shufflevector <16 x i32> undef, <16 x i32> undef, <16 x i32> <i32 0, i32 1, i32 16, i32 17, i32 18, i32 19, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		%v16i32_4_05 = shufflevector <16 x i32> undef, <16 x i32> undef, <16 x i32> <i32 0, i32 1, i32 16, i32 17, i32 18, i32 19, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>

ret void		ret void
}		}

define void @multipart() {		define void @multipart() {
; CHECK-LABEL: 'multipart'		; CHECK-LABEL: 'multipart'
; CHECK-NEXT: Cost Model: Found an estimated cost of 42 for instruction: %v16a = shufflevector <8 x i16> undef, <8 x i16> undef, <8 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11>		; CHECK-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %v16a = shufflevector <8 x i16> undef, <8 x i16> undef, <8 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11>
; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %v16b = shufflevector <8 x i16> undef, <8 x i16> undef, <8 x i32> <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>		; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %v16b = shufflevector <8 x i16> undef, <8 x i16> undef, <8 x i32> <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
; CHECK-NEXT: Cost Model: Found an estimated cost of 50 for instruction: %v16c = shufflevector <16 x i16> undef, <16 x i16> undef, <16 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>		; CHECK-NEXT: Cost Model: Found an estimated cost of 36 for instruction: %v16c = shufflevector <16 x i16> undef, <16 x i16> undef, <16 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
; CHECK-NEXT: Cost Model: Found an estimated cost of 50 for instruction: %v16d = shufflevector <16 x i16> undef, <16 x i16> undef, <16 x i32> <i32 0, i32 16, i32 1, i32 17, i32 2, i32 18, i32 3, i32 19, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>		; CHECK-NEXT: Cost Model: Found an estimated cost of 36 for instruction: %v16d = shufflevector <16 x i16> undef, <16 x i16> undef, <16 x i32> <i32 0, i32 16, i32 1, i32 17, i32 2, i32 18, i32 3, i32 19, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32a = shufflevector <4 x i32> undef, <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 0, i32 1>		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v32a = shufflevector <4 x i32> undef, <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 0, i32 1>
; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v32a4 = shufflevector <16 x i32> undef, <16 x i32> undef, <16 x i32> <i32 2, i32 3, i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3, i32 0, i32 1>		; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v32a4 = shufflevector <16 x i32> undef, <16 x i32> undef, <16 x i32> <i32 2, i32 3, i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3, i32 0, i32 1>
; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v32idrev = shufflevector <16 x i32> undef, <16 x i32> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 15, i32 14, i32 13, i32 12, i32 16, i32 17, i32 18, i32 19, i32 31, i32 30, i32 29, i32 28>		; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v32idrev = shufflevector <16 x i32> undef, <16 x i32> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 15, i32 14, i32 13, i32 12, i32 16, i32 17, i32 18, i32 19, i32 31, i32 30, i32 29, i32 28>
; CHECK-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %v32many = shufflevector <16 x i32> undef, <16 x i32> undef, <16 x i32> <i32 0, i32 4, i32 8, i32 12, i32 16, i32 20, i32 24, i32 28, i32 2, i32 6, i32 10, i32 14, i32 18, i32 22, i32 26, i32 30>		; CHECK-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %v32many = shufflevector <16 x i32> undef, <16 x i32> undef, <16 x i32> <i32 0, i32 4, i32 8, i32 12, i32 16, i32 20, i32 24, i32 28, i32 2, i32 6, i32 10, i32 14, i32 18, i32 22, i32 26, i32 30>
; CHECK-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %v32many2 = shufflevector <16 x i32> undef, <16 x i32> undef, <16 x i32> <i32 1, i32 4, i32 8, i32 12, i32 17, i32 20, i32 24, i32 28, i32 2, i32 6, i32 11, i32 14, i32 18, i32 22, i32 27, i32 30>		; CHECK-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %v32many2 = shufflevector <16 x i32> undef, <16 x i32> undef, <16 x i32> <i32 1, i32 4, i32 8, i32 12, i32 17, i32 20, i32 24, i32 28, i32 2, i32 6, i32 11, i32 14, i32 18, i32 22, i32 27, i32 30>
; CHECK-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %v323 = shufflevector <3 x i32> undef, <3 x i32> undef, <3 x i32> <i32 2, i32 3, i32 0>		; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %v323 = shufflevector <3 x i32> undef, <3 x i32> undef, <3 x i32> <i32 2, i32 3, i32 0>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64a = shufflevector <2 x i64> undef, <2 x i64> undef, <2 x i32> <i32 1, i32 1>		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64a = shufflevector <2 x i64> undef, <2 x i64> undef, <2 x i32> <i32 1, i32 1>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64b = shufflevector <2 x i64> undef, <2 x i64> undef, <2 x i32> zeroinitializer		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v64b = shufflevector <2 x i64> undef, <2 x i64> undef, <2 x i32> zeroinitializer
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v64ab = shufflevector <4 x i64> undef, <4 x i64> undef, <4 x i32> <i32 1, i32 1, i32 0, i32 0>		; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v64ab = shufflevector <4 x i64> undef, <4 x i64> undef, <4 x i32> <i32 1, i32 1, i32 0, i32 0>
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v64d = shufflevector <4 x i64> undef, <4 x i64> undef, <4 x i32> <i32 1, i32 1, i32 4, i32 4>		; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v64d = shufflevector <4 x i64> undef, <4 x i64> undef, <4 x i32> <i32 1, i32 1, i32 4, i32 4>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %f64a = shufflevector <2 x double> undef, <2 x double> undef, <2 x i32> <i32 1, i32 1>		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %f64a = shufflevector <2 x double> undef, <2 x double> undef, <2 x i32> <i32 1, i32 1>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %f64b = shufflevector <2 x double> undef, <2 x double> undef, <2 x i32> zeroinitializer		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %f64b = shufflevector <2 x double> undef, <2 x double> undef, <2 x i32> zeroinitializer
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %f64ab = shufflevector <4 x double> undef, <4 x double> undef, <4 x i32> <i32 1, i32 1, i32 0, i32 0>		; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %f64ab = shufflevector <4 x double> undef, <4 x double> undef, <4 x i32> <i32 1, i32 1, i32 0, i32 0>
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
Show All 24 Lines

llvm/test/Analysis/CostModel/AArch64/shuffle-select.ll

	; RUN: opt < %s -mtriple=aarch64--linux-gnu -passes="print<cost-model>" 2>&1 -disable-output \| FileCheck %s --check-prefix=COST			; RUN: opt < %s -mtriple=aarch64--linux-gnu -passes="print<cost-model>" 2>&1 -disable-output \| FileCheck %s --check-prefix=COST
	; RUN: llc < %s -mtriple=aarch64--linux-gnu \| FileCheck %s --check-prefix=CODE			; RUN: llc < %s -mtriple=aarch64--linux-gnu \| FileCheck %s --check-prefix=CODE

	target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"

	; COST-LABEL: sel.v8i8			; COST-LABEL: sel.v8i8
	; COST: Found an estimated cost of 42 for instruction: %tmp0 = shufflevector <8 x i8> %v0, <8 x i8> %v1, <8 x i32> <i32 0, i32 9, i32 2, i32 11, i32 4, i32 13, i32 6, i32 15>			; COST: Found an estimated cost of 28 for instruction: %tmp0 = shufflevector <8 x i8> %v0, <8 x i8> %v1, <8 x i32> <i32 0, i32 9, i32 2, i32 11, i32 4, i32 13, i32 6, i32 15>
	; CODE-LABEL: sel.v8i8			; CODE-LABEL: sel.v8i8
	; CODE: tbl v0.8b, { v0.16b }, v1.8b			; CODE: tbl v0.8b, { v0.16b }, v1.8b
	define <8 x i8> @sel.v8i8(<8 x i8> %v0, <8 x i8> %v1) {			define <8 x i8> @sel.v8i8(<8 x i8> %v0, <8 x i8> %v1) {
	%tmp0 = shufflevector <8 x i8> %v0, <8 x i8> %v1, <8 x i32> <i32 0, i32 9, i32 2, i32 11, i32 4, i32 13, i32 6, i32 15>			%tmp0 = shufflevector <8 x i8> %v0, <8 x i8> %v1, <8 x i32> <i32 0, i32 9, i32 2, i32 11, i32 4, i32 13, i32 6, i32 15>
	ret <8 x i8> %tmp0			ret <8 x i8> %tmp0
	}			}

	; COST-LABEL: sel.v16i8			; COST-LABEL: sel.v16i8
	; COST: Found an estimated cost of 90 for instruction: %tmp0 = shufflevector <16 x i8> %v0, <16 x i8> %v1, <16 x i32> <i32 0, i32 17, i32 2, i32 19, i32 4, i32 21, i32 6, i32 23, i32 8, i32 25, i32 10, i32 27, i32 12, i32 29, i32 14, i32 31>			; COST: Found an estimated cost of 60 for instruction: %tmp0 = shufflevector <16 x i8> %v0, <16 x i8> %v1, <16 x i32> <i32 0, i32 17, i32 2, i32 19, i32 4, i32 21, i32 6, i32 23, i32 8, i32 25, i32 10, i32 27, i32 12, i32 29, i32 14, i32 31>
	; CODE-LABEL: sel.v16i8			; CODE-LABEL: sel.v16i8
	; CODE: tbl v0.16b, { v0.16b, v1.16b }, v2.16b			; CODE: tbl v0.16b, { v0.16b, v1.16b }, v2.16b
	define <16 x i8> @sel.v16i8(<16 x i8> %v0, <16 x i8> %v1) {			define <16 x i8> @sel.v16i8(<16 x i8> %v0, <16 x i8> %v1) {
	%tmp0 = shufflevector <16 x i8> %v0, <16 x i8> %v1, <16 x i32> <i32 0, i32 17, i32 2, i32 19, i32 4, i32 21, i32 6, i32 23, i32 8, i32 25, i32 10, i32 27, i32 12, i32 29, i32 14, i32 31>			%tmp0 = shufflevector <16 x i8> %v0, <16 x i8> %v1, <16 x i32> <i32 0, i32 17, i32 2, i32 19, i32 4, i32 21, i32 6, i32 23, i32 8, i32 25, i32 10, i32 27, i32 12, i32 29, i32 14, i32 31>
	ret <16 x i8> %tmp0			ret <16 x i8> %tmp0
	}			}

	; COST-LABEL: sel.v4i16			; COST-LABEL: sel.v4i16
	; COST: Found an estimated cost of 2 for instruction: %tmp0 = shufflevector <4 x i16> %v0, <4 x i16> %v1, <4 x i32> <i32 0, i32 5, i32 2, i32 7>			; COST: Found an estimated cost of 2 for instruction: %tmp0 = shufflevector <4 x i16> %v0, <4 x i16> %v1, <4 x i32> <i32 0, i32 5, i32 2, i32 7>
	; CODE-LABEL: sel.v4i16			; CODE-LABEL: sel.v4i16
	; CODE: rev32 v0.4h, v0.4h			; CODE: rev32 v0.4h, v0.4h
	; CODE: trn2 v0.4h, v0.4h, v1.4h			; CODE: trn2 v0.4h, v0.4h, v1.4h
	define <4 x i16> @sel.v4i16(<4 x i16> %v0, <4 x i16> %v1) {			define <4 x i16> @sel.v4i16(<4 x i16> %v0, <4 x i16> %v1) {
	%tmp0 = shufflevector <4 x i16> %v0, <4 x i16> %v1, <4 x i32> <i32 0, i32 5, i32 2, i32 7>			%tmp0 = shufflevector <4 x i16> %v0, <4 x i16> %v1, <4 x i32> <i32 0, i32 5, i32 2, i32 7>
	ret <4 x i16> %tmp0			ret <4 x i16> %tmp0
	}			}

	; COST-LABEL: sel.v8i16			; COST-LABEL: sel.v8i16
	; COST: Found an estimated cost of 42 for instruction: %tmp0 = shufflevector <8 x i16> %v0, <8 x i16> %v1, <8 x i32> <i32 0, i32 9, i32 2, i32 11, i32 4, i32 13, i32 6, i32 15>			; COST: Found an estimated cost of 28 for instruction: %tmp0 = shufflevector <8 x i16> %v0, <8 x i16> %v1, <8 x i32> <i32 0, i32 9, i32 2, i32 11, i32 4, i32 13, i32 6, i32 15>
				efriedmaUnsubmitted Not Done Reply Inline Actions Weird cost modeling. efriedma: Weird cost modeling.
				dmgreenAuthorUnsubmitted Done Reply Inline Actions Do you mean because of the tbl? We have never costed tbls as cheap. I'm not sure if that would be profitable or not, and feels very much like a different issue. dmgreen: Do you mean because of the tbl? We have never costed tbls as cheap. I'm not sure if that would…
				efriedmaUnsubmitted Not Done Reply Inline Actions We should be modeling the fact that tbl exists, at least. (I mean, it doesn't need to be super-cheap, but basically all ARM chips have a reasonably fast tbl.) efriedma: We should be modeling the fact that tbl exists, at least. (I mean, it doesn't need to be super…
	; CODE-LABEL: sel.v8i16			; CODE-LABEL: sel.v8i16
	; CODE: tbl v0.16b, { v0.16b, v1.16b }, v2.16b			; CODE: tbl v0.16b, { v0.16b, v1.16b }, v2.16b
	define <8 x i16> @sel.v8i16(<8 x i16> %v0, <8 x i16> %v1) {			define <8 x i16> @sel.v8i16(<8 x i16> %v0, <8 x i16> %v1) {
	%tmp0 = shufflevector <8 x i16> %v0, <8 x i16> %v1, <8 x i32> <i32 0, i32 9, i32 2, i32 11, i32 4, i32 13, i32 6, i32 15>			%tmp0 = shufflevector <8 x i16> %v0, <8 x i16> %v1, <8 x i32> <i32 0, i32 9, i32 2, i32 11, i32 4, i32 13, i32 6, i32 15>
	ret <8 x i16> %tmp0			ret <8 x i16> %tmp0
	}			}

	; COST-LABEL: sel.v2i32			; COST-LABEL: sel.v2i32
	▲ Show 20 Lines • Show All 54 Lines • Show Last 20 Lines

llvm/test/Analysis/CostModel/AArch64/sve-insert-extract.ll

; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py

; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -S < %s \| FileCheck --check-prefix=CHECK-DEFAULT %s		; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -S < %s \| FileCheck --check-prefix=CHECK-DEFAULT %s
; RUN: opt -aarch64-insert-extract-base-cost=0 -passes="print<cost-model>" 2>&1 -disable-output -S < %s \| FileCheck --check-prefix=CHECK-LOW %s		; RUN: opt -aarch64-insert-extract-base-cost=0 -passes="print<cost-model>" 2>&1 -disable-output -S < %s \| FileCheck --check-prefix=CHECK-LOW %s
; RUN: opt -aarch64-insert-extract-base-cost=100000 -passes="print<cost-model>" 2>&1 -disable-output -S < %s \| FileCheck --check-prefix=CHECK-HIGH %s		; RUN: opt -aarch64-insert-extract-base-cost=100000 -passes="print<cost-model>" 2>&1 -disable-output -S < %s \| FileCheck --check-prefix=CHECK-HIGH %s

target triple = "aarch64-unknown-linux-gnu"		target triple = "aarch64-unknown-linux-gnu"
target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"		target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"

define void @ins_el0() #0 {		define void @ins_el0() #0 {
; CHECK-DEFAULT-LABEL: 'ins_el0'		; CHECK-DEFAULT-LABEL: 'ins_el0'
; CHECK-DEFAULT-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %vi1 = insertelement <vscale x 16 x i1> zeroinitializer, i1 false, i64 0		; CHECK-DEFAULT-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %vi1 = insertelement <vscale x 16 x i1> zeroinitializer, i1 false, i64 0
; CHECK-DEFAULT-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v0 = insertelement <vscale x 16 x i8> zeroinitializer, i8 0, i64 0		; CHECK-DEFAULT-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v0 = insertelement <vscale x 16 x i8> zeroinitializer, i8 0, i64 0
; CHECK-DEFAULT-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v1 = insertelement <vscale x 8 x i16> zeroinitializer, i16 0, i64 0		; CHECK-DEFAULT-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v1 = insertelement <vscale x 8 x i16> zeroinitializer, i16 0, i64 0
; CHECK-DEFAULT-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v2 = insertelement <vscale x 4 x i32> zeroinitializer, i32 0, i64 0		; CHECK-DEFAULT-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v2 = insertelement <vscale x 4 x i32> zeroinitializer, i32 0, i64 0
; CHECK-DEFAULT-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v3 = insertelement <vscale x 2 x i64> zeroinitializer, i64 0, i64 0		; CHECK-DEFAULT-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v3 = insertelement <vscale x 2 x i64> zeroinitializer, i64 0, i64 0
; CHECK-DEFAULT-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %v4 = insertelement <vscale x 4 x float> zeroinitializer, float 0.000000e+00, i64 0		; CHECK-DEFAULT-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %v4 = insertelement <vscale x 4 x float> zeroinitializer, float 0.000000e+00, i64 0
; CHECK-DEFAULT-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %v5 = insertelement <vscale x 2 x double> zeroinitializer, double 0.000000e+00, i64 0		; CHECK-DEFAULT-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %v5 = insertelement <vscale x 2 x double> zeroinitializer, double 0.000000e+00, i64 0
; CHECK-DEFAULT-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; CHECK-DEFAULT-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
; CHECK-LOW-LABEL: 'ins_el0'		; CHECK-LOW-LABEL: 'ins_el0'
; CHECK-LOW-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %vi1 = insertelement <vscale x 16 x i1> zeroinitializer, i1 false, i64 0		; CHECK-LOW-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %vi1 = insertelement <vscale x 16 x i1> zeroinitializer, i1 false, i64 0
; CHECK-LOW-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %v0 = insertelement <vscale x 16 x i8> zeroinitializer, i8 0, i64 0		; CHECK-LOW-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %v0 = insertelement <vscale x 16 x i8> zeroinitializer, i8 0, i64 0
; CHECK-LOW-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %v1 = insertelement <vscale x 8 x i16> zeroinitializer, i16 0, i64 0		; CHECK-LOW-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %v1 = insertelement <vscale x 8 x i16> zeroinitializer, i16 0, i64 0
Show All 20 Lines	;
%v3 = insertelement <vscale x 2 x i64> zeroinitializer, i64 0, i64 0		%v3 = insertelement <vscale x 2 x i64> zeroinitializer, i64 0, i64 0
%v4 = insertelement <vscale x 4 x float> zeroinitializer, float 0., i64 0		%v4 = insertelement <vscale x 4 x float> zeroinitializer, float 0., i64 0
%v5 = insertelement <vscale x 2 x double> zeroinitializer, double 0., i64 0		%v5 = insertelement <vscale x 2 x double> zeroinitializer, double 0., i64 0
ret void		ret void
}		}

define void @ins_el1() #0 {		define void @ins_el1() #0 {
; CHECK-DEFAULT-LABEL: 'ins_el1'		; CHECK-DEFAULT-LABEL: 'ins_el1'
; CHECK-DEFAULT-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %vi1 = insertelement <vscale x 16 x i1> zeroinitializer, i1 false, i64 1		; CHECK-DEFAULT-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %vi1 = insertelement <vscale x 16 x i1> zeroinitializer, i1 false, i64 1
; CHECK-DEFAULT-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v0 = insertelement <vscale x 16 x i8> zeroinitializer, i8 0, i64 1		; CHECK-DEFAULT-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v0 = insertelement <vscale x 16 x i8> zeroinitializer, i8 0, i64 1
; CHECK-DEFAULT-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v1 = insertelement <vscale x 8 x i16> zeroinitializer, i16 0, i64 1		; CHECK-DEFAULT-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v1 = insertelement <vscale x 8 x i16> zeroinitializer, i16 0, i64 1
; CHECK-DEFAULT-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v2 = insertelement <vscale x 4 x i32> zeroinitializer, i32 0, i64 1		; CHECK-DEFAULT-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v2 = insertelement <vscale x 4 x i32> zeroinitializer, i32 0, i64 1
; CHECK-DEFAULT-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v3 = insertelement <vscale x 2 x i64> zeroinitializer, i64 0, i64 1		; CHECK-DEFAULT-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v3 = insertelement <vscale x 2 x i64> zeroinitializer, i64 0, i64 1
; CHECK-DEFAULT-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v4 = insertelement <vscale x 4 x float> zeroinitializer, float 0.000000e+00, i64 1		; CHECK-DEFAULT-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v4 = insertelement <vscale x 4 x float> zeroinitializer, float 0.000000e+00, i64 1
; CHECK-DEFAULT-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v5 = insertelement <vscale x 2 x double> zeroinitializer, double 0.000000e+00, i64 1		; CHECK-DEFAULT-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v5 = insertelement <vscale x 2 x double> zeroinitializer, double 0.000000e+00, i64 1
; CHECK-DEFAULT-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; CHECK-DEFAULT-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
; CHECK-LOW-LABEL: 'ins_el1'		; CHECK-LOW-LABEL: 'ins_el1'
; CHECK-LOW-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %vi1 = insertelement <vscale x 16 x i1> zeroinitializer, i1 false, i64 1		; CHECK-LOW-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %vi1 = insertelement <vscale x 16 x i1> zeroinitializer, i1 false, i64 1
; CHECK-LOW-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %v0 = insertelement <vscale x 16 x i8> zeroinitializer, i8 0, i64 1		; CHECK-LOW-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %v0 = insertelement <vscale x 16 x i8> zeroinitializer, i8 0, i64 1
; CHECK-LOW-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %v1 = insertelement <vscale x 8 x i16> zeroinitializer, i16 0, i64 1		; CHECK-LOW-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %v1 = insertelement <vscale x 8 x i16> zeroinitializer, i16 0, i64 1
; CHECK-LOW-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %v2 = insertelement <vscale x 4 x i32> zeroinitializer, i32 0, i64 1		; CHECK-LOW-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %v2 = insertelement <vscale x 4 x i32> zeroinitializer, i32 0, i64 1
; CHECK-LOW-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %v3 = insertelement <vscale x 2 x i64> zeroinitializer, i64 0, i64 1		; CHECK-LOW-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %v3 = insertelement <vscale x 2 x i64> zeroinitializer, i64 0, i64 1
Show All 19 Lines	;
%v4 = insertelement <vscale x 4 x float> zeroinitializer, float 0., i64 1		%v4 = insertelement <vscale x 4 x float> zeroinitializer, float 0., i64 1
%v5 = insertelement <vscale x 2 x double> zeroinitializer, double 0., i64 1		%v5 = insertelement <vscale x 2 x double> zeroinitializer, double 0., i64 1
ret void		ret void
}		}


define void @ext_el0() #0 {		define void @ext_el0() #0 {
; CHECK-DEFAULT-LABEL: 'ext_el0'		; CHECK-DEFAULT-LABEL: 'ext_el0'
; CHECK-DEFAULT-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %vi1 = extractelement <vscale x 16 x i1> zeroinitializer, i64 0		; CHECK-DEFAULT-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %vi1 = extractelement <vscale x 16 x i1> zeroinitializer, i64 0
; CHECK-DEFAULT-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v0 = extractelement <vscale x 16 x i8> zeroinitializer, i64 0		; CHECK-DEFAULT-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v0 = extractelement <vscale x 16 x i8> zeroinitializer, i64 0
; CHECK-DEFAULT-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v1 = extractelement <vscale x 8 x i16> zeroinitializer, i64 0		; CHECK-DEFAULT-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v1 = extractelement <vscale x 8 x i16> zeroinitializer, i64 0
; CHECK-DEFAULT-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v2 = extractelement <vscale x 4 x i32> zeroinitializer, i64 0		; CHECK-DEFAULT-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v2 = extractelement <vscale x 4 x i32> zeroinitializer, i64 0
; CHECK-DEFAULT-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v3 = extractelement <vscale x 2 x i64> zeroinitializer, i64 0		; CHECK-DEFAULT-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v3 = extractelement <vscale x 2 x i64> zeroinitializer, i64 0
; CHECK-DEFAULT-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %v4 = extractelement <vscale x 4 x float> zeroinitializer, i64 0		; CHECK-DEFAULT-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %v4 = extractelement <vscale x 4 x float> zeroinitializer, i64 0
; CHECK-DEFAULT-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %v5 = extractelement <vscale x 2 x double> zeroinitializer, i64 0		; CHECK-DEFAULT-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %v5 = extractelement <vscale x 2 x double> zeroinitializer, i64 0
; CHECK-DEFAULT-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; CHECK-DEFAULT-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
; CHECK-LOW-LABEL: 'ext_el0'		; CHECK-LOW-LABEL: 'ext_el0'
; CHECK-LOW-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %vi1 = extractelement <vscale x 16 x i1> zeroinitializer, i64 0		; CHECK-LOW-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %vi1 = extractelement <vscale x 16 x i1> zeroinitializer, i64 0
; CHECK-LOW-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %v0 = extractelement <vscale x 16 x i8> zeroinitializer, i64 0		; CHECK-LOW-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %v0 = extractelement <vscale x 16 x i8> zeroinitializer, i64 0
; CHECK-LOW-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %v1 = extractelement <vscale x 8 x i16> zeroinitializer, i64 0		; CHECK-LOW-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %v1 = extractelement <vscale x 8 x i16> zeroinitializer, i64 0
Show All 20 Lines	;
%v3 = extractelement <vscale x 2 x i64> zeroinitializer, i64 0		%v3 = extractelement <vscale x 2 x i64> zeroinitializer, i64 0
%v4 = extractelement <vscale x 4 x float> zeroinitializer, i64 0		%v4 = extractelement <vscale x 4 x float> zeroinitializer, i64 0
%v5 = extractelement <vscale x 2 x double> zeroinitializer, i64 0		%v5 = extractelement <vscale x 2 x double> zeroinitializer, i64 0
ret void		ret void
}		}

define void @ext_el1() #0 {		define void @ext_el1() #0 {
; CHECK-DEFAULT-LABEL: 'ext_el1'		; CHECK-DEFAULT-LABEL: 'ext_el1'
; CHECK-DEFAULT-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %vi1 = extractelement <vscale x 16 x i1> zeroinitializer, i64 1		; CHECK-DEFAULT-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %vi1 = extractelement <vscale x 16 x i1> zeroinitializer, i64 1
; CHECK-DEFAULT-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v0 = extractelement <vscale x 16 x i8> zeroinitializer, i64 1		; CHECK-DEFAULT-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v0 = extractelement <vscale x 16 x i8> zeroinitializer, i64 1
; CHECK-DEFAULT-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v1 = extractelement <vscale x 8 x i16> zeroinitializer, i64 1		; CHECK-DEFAULT-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v1 = extractelement <vscale x 8 x i16> zeroinitializer, i64 1
; CHECK-DEFAULT-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v2 = extractelement <vscale x 4 x i32> zeroinitializer, i64 1		; CHECK-DEFAULT-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v2 = extractelement <vscale x 4 x i32> zeroinitializer, i64 1
; CHECK-DEFAULT-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v3 = extractelement <vscale x 2 x i64> zeroinitializer, i64 1		; CHECK-DEFAULT-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v3 = extractelement <vscale x 2 x i64> zeroinitializer, i64 1
; CHECK-DEFAULT-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v4 = extractelement <vscale x 4 x float> zeroinitializer, i64 1		; CHECK-DEFAULT-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v4 = extractelement <vscale x 4 x float> zeroinitializer, i64 1
; CHECK-DEFAULT-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v5 = extractelement <vscale x 2 x double> zeroinitializer, i64 1		; CHECK-DEFAULT-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v5 = extractelement <vscale x 2 x double> zeroinitializer, i64 1
; CHECK-DEFAULT-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; CHECK-DEFAULT-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
; CHECK-LOW-LABEL: 'ext_el1'		; CHECK-LOW-LABEL: 'ext_el1'
; CHECK-LOW-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %vi1 = extractelement <vscale x 16 x i1> zeroinitializer, i64 1		; CHECK-LOW-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %vi1 = extractelement <vscale x 16 x i1> zeroinitializer, i64 1
; CHECK-LOW-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %v0 = extractelement <vscale x 16 x i8> zeroinitializer, i64 1		; CHECK-LOW-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %v0 = extractelement <vscale x 16 x i8> zeroinitializer, i64 1
; CHECK-LOW-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %v1 = extractelement <vscale x 8 x i16> zeroinitializer, i64 1		; CHECK-LOW-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %v1 = extractelement <vscale x 8 x i16> zeroinitializer, i64 1
; CHECK-LOW-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %v2 = extractelement <vscale x 4 x i32> zeroinitializer, i64 1		; CHECK-LOW-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %v2 = extractelement <vscale x 4 x i32> zeroinitializer, i64 1
; CHECK-LOW-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %v3 = extractelement <vscale x 2 x i64> zeroinitializer, i64 1		; CHECK-LOW-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %v3 = extractelement <vscale x 2 x i64> zeroinitializer, i64 1
▲ Show 20 Lines • Show All 73 Lines • Show Last 20 Lines

llvm/test/Analysis/CostModel/AArch64/sve-intrinsics.ll

	; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py
	; RUN: opt < %s -passes="print<cost-model>" 2>&1 -disable-output -S -mtriple=aarch64--linux-gnu -mattr=+sve \| FileCheck %s			; RUN: opt < %s -passes="print<cost-model>" 2>&1 -disable-output -S -mtriple=aarch64--linux-gnu -mattr=+sve \| FileCheck %s
	; RUN: opt < %s -passes="print<cost-model>" 2>&1 -type-based-intrinsic-cost -disable-output -S -mtriple=aarch64--linux-gnu -mattr=+sve \| FileCheck %s --check-prefix=TYPE_BASED_ONLY			; RUN: opt < %s -passes="print<cost-model>" 2>&1 -type-based-intrinsic-cost -disable-output -S -mtriple=aarch64--linux-gnu -mattr=+sve \| FileCheck %s --check-prefix=TYPE_BASED_ONLY

	target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"

	define void @vector_insert_extract(<vscale x 4 x i32> %v0, <vscale x 16 x i32> %v1, <16 x i32> %v2) {			define void @vector_insert_extract(<vscale x 4 x i32> %v0, <vscale x 16 x i32> %v1, <16 x i32> %v2) {
	; CHECK-LABEL: 'vector_insert_extract'			; CHECK-LABEL: 'vector_insert_extract'
	; CHECK-NEXT: Cost Model: Found an estimated cost of 81 for instruction: %extract_fixed_from_scalable = call <16 x i32> @llvm.vector.extract.v16i32.nxv4i32(<vscale x 4 x i32> %v0, i64 0)			; CHECK-NEXT: Cost Model: Found an estimated cost of 54 for instruction: %extract_fixed_from_scalable = call <16 x i32> @llvm.vector.extract.v16i32.nxv4i32(<vscale x 4 x i32> %v0, i64 0)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 81 for instruction: %insert_fixed_into_scalable = call <vscale x 4 x i32> @llvm.vector.insert.nxv4i32.v16i32(<vscale x 4 x i32> %v0, <16 x i32> %v2, i64 0)			; CHECK-NEXT: Cost Model: Found an estimated cost of 54 for instruction: %insert_fixed_into_scalable = call <vscale x 4 x i32> @llvm.vector.insert.nxv4i32.v16i32(<vscale x 4 x i32> %v0, <16 x i32> %v2, i64 0)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %extract_scalable_from_scalable = call <vscale x 4 x i32> @llvm.vector.extract.nxv4i32.nxv16i32(<vscale x 16 x i32> %v1, i64 0)			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %extract_scalable_from_scalable = call <vscale x 4 x i32> @llvm.vector.extract.nxv4i32.nxv16i32(<vscale x 16 x i32> %v1, i64 0)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %insert_scalable_into_scalable = call <vscale x 16 x i32> @llvm.vector.insert.nxv16i32.nxv4i32(<vscale x 16 x i32> %v1, <vscale x 4 x i32> %v0, i64 0)			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %insert_scalable_into_scalable = call <vscale x 16 x i32> @llvm.vector.insert.nxv16i32.nxv4i32(<vscale x 16 x i32> %v1, <vscale x 4 x i32> %v0, i64 0)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	;			;
	; TYPE_BASED_ONLY-LABEL: 'vector_insert_extract'			; TYPE_BASED_ONLY-LABEL: 'vector_insert_extract'
	; TYPE_BASED_ONLY-NEXT: Cost Model: Invalid cost for instruction: %extract_fixed_from_scalable = call <16 x i32> @llvm.vector.extract.v16i32.nxv4i32(<vscale x 4 x i32> %v0, i64 0)			; TYPE_BASED_ONLY-NEXT: Cost Model: Invalid cost for instruction: %extract_fixed_from_scalable = call <16 x i32> @llvm.vector.extract.v16i32.nxv4i32(<vscale x 4 x i32> %v0, i64 0)
	; TYPE_BASED_ONLY-NEXT: Cost Model: Invalid cost for instruction: %insert_fixed_into_scalable = call <vscale x 4 x i32> @llvm.vector.insert.nxv4i32.v16i32(<vscale x 4 x i32> %v0, <16 x i32> %v2, i64 0)			; TYPE_BASED_ONLY-NEXT: Cost Model: Invalid cost for instruction: %insert_fixed_into_scalable = call <vscale x 4 x i32> @llvm.vector.insert.nxv4i32.v16i32(<vscale x 4 x i32> %v0, <16 x i32> %v2, i64 0)
	; TYPE_BASED_ONLY-NEXT: Cost Model: Invalid cost for instruction: %extract_scalable_from_scalable = call <vscale x 4 x i32> @llvm.vector.extract.nxv4i32.nxv16i32(<vscale x 16 x i32> %v1, i64 0)			; TYPE_BASED_ONLY-NEXT: Cost Model: Invalid cost for instruction: %extract_scalable_from_scalable = call <vscale x 4 x i32> @llvm.vector.extract.nxv4i32.nxv16i32(<vscale x 16 x i32> %v1, i64 0)
	▲ Show 20 Lines • Show All 594 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mask_v16i1_i64 = call <16 x i1> @llvm.get.active.lane.mask.v16i1.i64(i64 undef, i64 undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mask_v16i1_i64 = call <16 x i1> @llvm.get.active.lane.mask.v16i1.i64(i64 undef, i64 undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mask_v8i1_i64 = call <8 x i1> @llvm.get.active.lane.mask.v8i1.i64(i64 undef, i64 undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mask_v8i1_i64 = call <8 x i1> @llvm.get.active.lane.mask.v8i1.i64(i64 undef, i64 undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mask_v4i1_i64 = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i64(i64 undef, i64 undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mask_v4i1_i64 = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i64(i64 undef, i64 undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mask_v2i1_i64 = call <2 x i1> @llvm.get.active.lane.mask.v2i1.i64(i64 undef, i64 undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mask_v2i1_i64 = call <2 x i1> @llvm.get.active.lane.mask.v2i1.i64(i64 undef, i64 undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mask_v16i1_i32 = call <16 x i1> @llvm.get.active.lane.mask.v16i1.i32(i32 undef, i32 undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mask_v16i1_i32 = call <16 x i1> @llvm.get.active.lane.mask.v16i1.i32(i32 undef, i32 undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mask_v8i1_i32 = call <8 x i1> @llvm.get.active.lane.mask.v8i1.i32(i32 undef, i32 undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mask_v8i1_i32 = call <8 x i1> @llvm.get.active.lane.mask.v8i1.i32(i32 undef, i32 undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mask_v4i1_i32 = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i32(i32 undef, i32 undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mask_v4i1_i32 = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i32(i32 undef, i32 undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mask_v2i1_i32 = call <2 x i1> @llvm.get.active.lane.mask.v2i1.i32(i32 undef, i32 undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mask_v2i1_i32 = call <2 x i1> @llvm.get.active.lane.mask.v2i1.i32(i32 undef, i32 undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 128 for instruction: %mask_v32i1_i64 = call <32 x i1> @llvm.get.active.lane.mask.v32i1.i64(i64 undef, i64 undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 144 for instruction: %mask_v32i1_i64 = call <32 x i1> @llvm.get.active.lane.mask.v32i1.i64(i64 undef, i64 undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 22 for instruction: %mask_v16i1_i16 = call <16 x i1> @llvm.get.active.lane.mask.v16i1.i16(i16 undef, i16 undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 22 for instruction: %mask_v16i1_i16 = call <16 x i1> @llvm.get.active.lane.mask.v16i1.i16(i16 undef, i16 undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	;			;
	; TYPE_BASED_ONLY-LABEL: 'get_lane_mask'			; TYPE_BASED_ONLY-LABEL: 'get_lane_mask'
	; TYPE_BASED_ONLY-NEXT: Cost Model: Invalid cost for instruction: %mask_nxv16i1_i64 = call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 undef, i64 undef)			; TYPE_BASED_ONLY-NEXT: Cost Model: Invalid cost for instruction: %mask_nxv16i1_i64 = call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 undef, i64 undef)
	; TYPE_BASED_ONLY-NEXT: Cost Model: Invalid cost for instruction: %mask_nxv8i1_i64 = call <vscale x 8 x i1> @llvm.get.active.lane.mask.nxv8i1.i64(i64 undef, i64 undef)			; TYPE_BASED_ONLY-NEXT: Cost Model: Invalid cost for instruction: %mask_nxv8i1_i64 = call <vscale x 8 x i1> @llvm.get.active.lane.mask.nxv8i1.i64(i64 undef, i64 undef)
	; TYPE_BASED_ONLY-NEXT: Cost Model: Invalid cost for instruction: %mask_nxv4i1_i64 = call <vscale x 4 x i1> @llvm.get.active.lane.mask.nxv4i1.i64(i64 undef, i64 undef)			; TYPE_BASED_ONLY-NEXT: Cost Model: Invalid cost for instruction: %mask_nxv4i1_i64 = call <vscale x 4 x i1> @llvm.get.active.lane.mask.nxv4i1.i64(i64 undef, i64 undef)
	; TYPE_BASED_ONLY-NEXT: Cost Model: Invalid cost for instruction: %mask_nxv2i1_i64 = call <vscale x 2 x i1> @llvm.get.active.lane.mask.nxv2i1.i64(i64 undef, i64 undef)			; TYPE_BASED_ONLY-NEXT: Cost Model: Invalid cost for instruction: %mask_nxv2i1_i64 = call <vscale x 2 x i1> @llvm.get.active.lane.mask.nxv2i1.i64(i64 undef, i64 undef)
	; TYPE_BASED_ONLY-NEXT: Cost Model: Invalid cost for instruction: %mask_nxv16i1_i32 = call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i32(i32 undef, i32 undef)			; TYPE_BASED_ONLY-NEXT: Cost Model: Invalid cost for instruction: %mask_nxv16i1_i32 = call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i32(i32 undef, i32 undef)
	; TYPE_BASED_ONLY-NEXT: Cost Model: Invalid cost for instruction: %mask_nxv8i1_i32 = call <vscale x 8 x i1> @llvm.get.active.lane.mask.nxv8i1.i32(i32 undef, i32 undef)			; TYPE_BASED_ONLY-NEXT: Cost Model: Invalid cost for instruction: %mask_nxv8i1_i32 = call <vscale x 8 x i1> @llvm.get.active.lane.mask.nxv8i1.i32(i32 undef, i32 undef)
	; TYPE_BASED_ONLY-NEXT: Cost Model: Invalid cost for instruction: %mask_nxv4i1_i32 = call <vscale x 4 x i1> @llvm.get.active.lane.mask.nxv4i1.i32(i32 undef, i32 undef)			; TYPE_BASED_ONLY-NEXT: Cost Model: Invalid cost for instruction: %mask_nxv4i1_i32 = call <vscale x 4 x i1> @llvm.get.active.lane.mask.nxv4i1.i32(i32 undef, i32 undef)
	; TYPE_BASED_ONLY-NEXT: Cost Model: Invalid cost for instruction: %mask_nxv2i1_i32 = call <vscale x 2 x i1> @llvm.get.active.lane.mask.nxv2i1.i32(i32 undef, i32 undef)			; TYPE_BASED_ONLY-NEXT: Cost Model: Invalid cost for instruction: %mask_nxv2i1_i32 = call <vscale x 2 x i1> @llvm.get.active.lane.mask.nxv2i1.i32(i32 undef, i32 undef)
	; TYPE_BASED_ONLY-NEXT: Cost Model: Invalid cost for instruction: %mask_nxv32i1_i64 = call <vscale x 32 x i1> @llvm.get.active.lane.mask.nxv32i1.i64(i64 undef, i64 undef)			; TYPE_BASED_ONLY-NEXT: Cost Model: Invalid cost for instruction: %mask_nxv32i1_i64 = call <vscale x 32 x i1> @llvm.get.active.lane.mask.nxv32i1.i64(i64 undef, i64 undef)
	; TYPE_BASED_ONLY-NEXT: Cost Model: Invalid cost for instruction: %mask_nxv16i1_i16 = call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i16(i16 undef, i16 undef)			; TYPE_BASED_ONLY-NEXT: Cost Model: Invalid cost for instruction: %mask_nxv16i1_i16 = call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i16(i16 undef, i16 undef)
	; TYPE_BASED_ONLY-NEXT: Cost Model: Found an estimated cost of 76 for instruction: %mask_v16i1_i64 = call <16 x i1> @llvm.get.active.lane.mask.v16i1.i64(i64 undef, i64 undef)			; TYPE_BASED_ONLY-NEXT: Cost Model: Found an estimated cost of 48 for instruction: %mask_v16i1_i64 = call <16 x i1> @llvm.get.active.lane.mask.v16i1.i64(i64 undef, i64 undef)
	; TYPE_BASED_ONLY-NEXT: Cost Model: Found an estimated cost of 36 for instruction: %mask_v8i1_i64 = call <8 x i1> @llvm.get.active.lane.mask.v8i1.i64(i64 undef, i64 undef)			; TYPE_BASED_ONLY-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %mask_v8i1_i64 = call <8 x i1> @llvm.get.active.lane.mask.v8i1.i64(i64 undef, i64 undef)
	; TYPE_BASED_ONLY-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %mask_v4i1_i64 = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i64(i64 undef, i64 undef)			; TYPE_BASED_ONLY-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %mask_v4i1_i64 = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i64(i64 undef, i64 undef)
	; TYPE_BASED_ONLY-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %mask_v2i1_i64 = call <2 x i1> @llvm.get.active.lane.mask.v2i1.i64(i64 undef, i64 undef)			; TYPE_BASED_ONLY-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %mask_v2i1_i64 = call <2 x i1> @llvm.get.active.lane.mask.v2i1.i64(i64 undef, i64 undef)
	; TYPE_BASED_ONLY-NEXT: Cost Model: Found an estimated cost of 76 for instruction: %mask_v16i1_i32 = call <16 x i1> @llvm.get.active.lane.mask.v16i1.i32(i32 undef, i32 undef)			; TYPE_BASED_ONLY-NEXT: Cost Model: Found an estimated cost of 48 for instruction: %mask_v16i1_i32 = call <16 x i1> @llvm.get.active.lane.mask.v16i1.i32(i32 undef, i32 undef)
	; TYPE_BASED_ONLY-NEXT: Cost Model: Found an estimated cost of 36 for instruction: %mask_v8i1_i32 = call <8 x i1> @llvm.get.active.lane.mask.v8i1.i32(i32 undef, i32 undef)			; TYPE_BASED_ONLY-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %mask_v8i1_i32 = call <8 x i1> @llvm.get.active.lane.mask.v8i1.i32(i32 undef, i32 undef)
	; TYPE_BASED_ONLY-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %mask_v4i1_i32 = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i32(i32 undef, i32 undef)			; TYPE_BASED_ONLY-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %mask_v4i1_i32 = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i32(i32 undef, i32 undef)
	; TYPE_BASED_ONLY-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %mask_v2i1_i32 = call <2 x i1> @llvm.get.active.lane.mask.v2i1.i32(i32 undef, i32 undef)			; TYPE_BASED_ONLY-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %mask_v2i1_i32 = call <2 x i1> @llvm.get.active.lane.mask.v2i1.i32(i32 undef, i32 undef)
	; TYPE_BASED_ONLY-NEXT: Cost Model: Found an estimated cost of 152 for instruction: %mask_v32i1_i64 = call <32 x i1> @llvm.get.active.lane.mask.v32i1.i64(i64 undef, i64 undef)			; TYPE_BASED_ONLY-NEXT: Cost Model: Found an estimated cost of 96 for instruction: %mask_v32i1_i64 = call <32 x i1> @llvm.get.active.lane.mask.v32i1.i64(i64 undef, i64 undef)
	; TYPE_BASED_ONLY-NEXT: Cost Model: Found an estimated cost of 76 for instruction: %mask_v16i1_i16 = call <16 x i1> @llvm.get.active.lane.mask.v16i1.i16(i16 undef, i16 undef)			; TYPE_BASED_ONLY-NEXT: Cost Model: Found an estimated cost of 48 for instruction: %mask_v16i1_i16 = call <16 x i1> @llvm.get.active.lane.mask.v16i1.i16(i16 undef, i16 undef)
	; TYPE_BASED_ONLY-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void			; TYPE_BASED_ONLY-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	;			;
	%mask_nxv16i1_i64 = call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 undef, i64 undef)			%mask_nxv16i1_i64 = call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 undef, i64 undef)
	%mask_nxv8i1_i64 = call <vscale x 8 x i1> @llvm.get.active.lane.mask.nxv8i1.i64(i64 undef, i64 undef)			%mask_nxv8i1_i64 = call <vscale x 8 x i1> @llvm.get.active.lane.mask.nxv8i1.i64(i64 undef, i64 undef)
	%mask_nxv4i1_i64 = call <vscale x 4 x i1> @llvm.get.active.lane.mask.nxv4i1.i64(i64 undef, i64 undef)			%mask_nxv4i1_i64 = call <vscale x 4 x i1> @llvm.get.active.lane.mask.nxv4i1.i64(i64 undef, i64 undef)
	%mask_nxv2i1_i64 = call <vscale x 2 x i1> @llvm.get.active.lane.mask.nxv2i1.i64(i64 undef, i64 undef)			%mask_nxv2i1_i64 = call <vscale x 2 x i1> @llvm.get.active.lane.mask.nxv2i1.i64(i64 undef, i64 undef)

	%mask_nxv16i1_i32 = call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i32(i32 undef, i32 undef)			%mask_nxv16i1_i32 = call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i32(i32 undef, i32 undef)
	▲ Show 20 Lines • Show All 87 Lines • ▼ Show 20 Lines
	; TYPE_BASED_ONLY-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <vscale x 8 x i32> %res			; TYPE_BASED_ONLY-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <vscale x 8 x i32> %res
	;			;
	%res = call <vscale x 8 x i32> @llvm.masked.gather.nxv8i32(<vscale x 8 x ptr> %ld, i32 0, <vscale x 8 x i1> %masks, <vscale x 8 x i32> %passthru)			%res = call <vscale x 8 x i32> @llvm.masked.gather.nxv8i32(<vscale x 8 x ptr> %ld, i32 0, <vscale x 8 x i1> %masks, <vscale x 8 x i32> %passthru)
	ret <vscale x 8 x i32> %res			ret <vscale x 8 x i32> %res
	}			}

	define <4 x i32> @masked_gather_v4i32(<4 x ptr> %ld, <4 x i1> %masks, <4 x i32> %passthru) {			define <4 x i32> @masked_gather_v4i32(<4 x ptr> %ld, <4 x i1> %masks, <4 x i32> %passthru) {
	; CHECK-LABEL: 'masked_gather_v4i32'			; CHECK-LABEL: 'masked_gather_v4i32'
	; CHECK-NEXT: Cost Model: Found an estimated cost of 29 for instruction: %res = call <4 x i32> @llvm.masked.gather.v4i32.v4p0(<4 x ptr> %ld, i32 0, <4 x i1> %masks, <4 x i32> %passthru)			; CHECK-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %res = call <4 x i32> @llvm.masked.gather.v4i32.v4p0(<4 x ptr> %ld, i32 0, <4 x i1> %masks, <4 x i32> %passthru)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i32> %res			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i32> %res
	;			;
	; TYPE_BASED_ONLY-LABEL: 'masked_gather_v4i32'			; TYPE_BASED_ONLY-LABEL: 'masked_gather_v4i32'
	; TYPE_BASED_ONLY-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %res = call <4 x i32> @llvm.masked.gather.v4i32.v4p0(<4 x ptr> %ld, i32 0, <4 x i1> %masks, <4 x i32> %passthru)			; TYPE_BASED_ONLY-NEXT: Cost Model: Found an estimated cost of 36 for instruction: %res = call <4 x i32> @llvm.masked.gather.v4i32.v4p0(<4 x ptr> %ld, i32 0, <4 x i1> %masks, <4 x i32> %passthru)
	; TYPE_BASED_ONLY-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i32> %res			; TYPE_BASED_ONLY-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i32> %res
	;			;
	%res = call <4 x i32> @llvm.masked.gather.v4i32(<4 x ptr> %ld, i32 0, <4 x i1> %masks, <4 x i32> %passthru)			%res = call <4 x i32> @llvm.masked.gather.v4i32(<4 x ptr> %ld, i32 0, <4 x i1> %masks, <4 x i32> %passthru)
	ret <4 x i32> %res			ret <4 x i32> %res
	}			}

	define <1 x i128> @masked_gather_v1i128(<1 x ptr> %ld, <1 x i1> %masks, <1 x i128> %passthru) {			define <1 x i128> @masked_gather_v1i128(<1 x ptr> %ld, <1 x i1> %masks, <1 x i128> %passthru) {
	; CHECK-LABEL: 'masked_gather_v1i128'			; CHECK-LABEL: 'masked_gather_v1i128'
	; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %res = call <1 x i128> @llvm.masked.gather.v1i128.v1p0(<1 x ptr> %ld, i32 0, <1 x i1> %masks, <1 x i128> %passthru)			; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %res = call <1 x i128> @llvm.masked.gather.v1i128.v1p0(<1 x ptr> %ld, i32 0, <1 x i1> %masks, <1 x i128> %passthru)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <1 x i128> %res			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <1 x i128> %res
	;			;
	; TYPE_BASED_ONLY-LABEL: 'masked_gather_v1i128'			; TYPE_BASED_ONLY-LABEL: 'masked_gather_v1i128'
	; TYPE_BASED_ONLY-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %res = call <1 x i128> @llvm.masked.gather.v1i128.v1p0(<1 x ptr> %ld, i32 0, <1 x i1> %masks, <1 x i128> %passthru)			; TYPE_BASED_ONLY-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %res = call <1 x i128> @llvm.masked.gather.v1i128.v1p0(<1 x ptr> %ld, i32 0, <1 x i1> %masks, <1 x i128> %passthru)
	; TYPE_BASED_ONLY-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <1 x i128> %res			; TYPE_BASED_ONLY-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <1 x i128> %res
	;			;
	%res = call <1 x i128> @llvm.masked.gather.v1i128.v1p0(<1 x ptr> %ld, i32 0, <1 x i1> %masks, <1 x i128> %passthru)			%res = call <1 x i128> @llvm.masked.gather.v1i128.v1p0(<1 x ptr> %ld, i32 0, <1 x i1> %masks, <1 x i128> %passthru)
	ret <1 x i128> %res			ret <1 x i128> %res
	Show All 24 Lines
	;			;

	call void @llvm.masked.scatter.nxv8i32(<vscale x 8 x i32> %data, <vscale x 8 x ptr> %ptrs, i32 0, <vscale x 8 x i1> %masks)			call void @llvm.masked.scatter.nxv8i32(<vscale x 8 x i32> %data, <vscale x 8 x ptr> %ptrs, i32 0, <vscale x 8 x i1> %masks)
	ret void			ret void
	}			}

	define void @masked_scatter_v4i32(<4 x i32> %data, <4 x ptr> %ptrs, <4 x i1> %masks) {			define void @masked_scatter_v4i32(<4 x i32> %data, <4 x ptr> %ptrs, <4 x i1> %masks) {
	; CHECK-LABEL: 'masked_scatter_v4i32'			; CHECK-LABEL: 'masked_scatter_v4i32'
	; CHECK-NEXT: Cost Model: Found an estimated cost of 29 for instruction: call void @llvm.masked.scatter.v4i32.v4p0(<4 x i32> %data, <4 x ptr> %ptrs, i32 0, <4 x i1> %masks)			; CHECK-NEXT: Cost Model: Found an estimated cost of 28 for instruction: call void @llvm.masked.scatter.v4i32.v4p0(<4 x i32> %data, <4 x ptr> %ptrs, i32 0, <4 x i1> %masks)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	;			;
	; TYPE_BASED_ONLY-LABEL: 'masked_scatter_v4i32'			; TYPE_BASED_ONLY-LABEL: 'masked_scatter_v4i32'
	; TYPE_BASED_ONLY-NEXT: Cost Model: Found an estimated cost of 31 for instruction: call void @llvm.masked.scatter.v4i32.v4p0(<4 x i32> %data, <4 x ptr> %ptrs, i32 0, <4 x i1> %masks)			; TYPE_BASED_ONLY-NEXT: Cost Model: Found an estimated cost of 28 for instruction: call void @llvm.masked.scatter.v4i32.v4p0(<4 x i32> %data, <4 x ptr> %ptrs, i32 0, <4 x i1> %masks)
	; TYPE_BASED_ONLY-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void			; TYPE_BASED_ONLY-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	;			;

	call void @llvm.masked.scatter.v4i32(<4 x i32> %data, <4 x ptr> %ptrs, i32 0, <4 x i1> %masks)			call void @llvm.masked.scatter.v4i32(<4 x i32> %data, <4 x ptr> %ptrs, i32 0, <4 x i1> %masks)
	ret void			ret void
	}			}

	define void @masked_scatter_v1i128(<1 x i128> %data, <1 x ptr> %ptrs, <1 x i1> %masks) {			define void @masked_scatter_v1i128(<1 x i128> %data, <1 x ptr> %ptrs, <1 x i1> %masks) {
	; CHECK-LABEL: 'masked_scatter_v1i128'			; CHECK-LABEL: 'masked_scatter_v1i128'
	; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction: call void @llvm.masked.scatter.v1i128.v1p0(<1 x i128> %data, <1 x ptr> %ptrs, i32 0, <1 x i1> %masks)			; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: call void @llvm.masked.scatter.v1i128.v1p0(<1 x i128> %data, <1 x ptr> %ptrs, i32 0, <1 x i1> %masks)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	;			;
	; TYPE_BASED_ONLY-LABEL: 'masked_scatter_v1i128'			; TYPE_BASED_ONLY-LABEL: 'masked_scatter_v1i128'
	; TYPE_BASED_ONLY-NEXT: Cost Model: Found an estimated cost of 1 for instruction: call void @llvm.masked.scatter.v1i128.v1p0(<1 x i128> %data, <1 x ptr> %ptrs, i32 0, <1 x i1> %masks)			; TYPE_BASED_ONLY-NEXT: Cost Model: Found an estimated cost of 1 for instruction: call void @llvm.masked.scatter.v1i128.v1p0(<1 x i128> %data, <1 x ptr> %ptrs, i32 0, <1 x i1> %masks)
	; TYPE_BASED_ONLY-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void			; TYPE_BASED_ONLY-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	;			;

	call void @llvm.masked.scatter.v1i128.v1p0(<1 x i128> %data, <1 x ptr> %ptrs, i32 0, <1 x i1> %masks)			call void @llvm.masked.scatter.v1i128.v1p0(<1 x i128> %data, <1 x ptr> %ptrs, i32 0, <1 x i1> %masks)
	▲ Show 20 Lines • Show All 41 Lines • Show Last 20 Lines

llvm/test/Analysis/CostModel/AArch64/vector-select.ll

	Show First 20 Lines • Show All 134 Lines • ▼ Show 20 Lines

	define <3 x i64> @v3i64_select_sle(<3 x i64> %a, <3 x i64> %b, <3 x i64> %c) {			define <3 x i64> @v3i64_select_sle(<3 x i64> %a, <3 x i64> %b, <3 x i64> %c) {
	%cmp.1 = icmp sle <3 x i64> %a, %b			%cmp.1 = icmp sle <3 x i64> %a, %b
	%s.1 = select <3 x i1> %cmp.1, <3 x i64> %a, <3 x i64> %c			%s.1 = select <3 x i1> %cmp.1, <3 x i64> %a, <3 x i64> %c
	ret <3 x i64> %s.1			ret <3 x i64> %s.1
	}			}

	; COST-LABEL: v2i64_select_no_cmp			; COST-LABEL: v2i64_select_no_cmp
	; COST-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %s.1 = select <2 x i1> %cond, <2 x i64> %a, <2 x i64> %b			; COST-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %s.1 = select <2 x i1> %cond, <2 x i64> %a, <2 x i64> %b

	; CODE-LABEL: v2i64_select_no_cmp			; CODE-LABEL: v2i64_select_no_cmp
	; CODE: bb.0			; CODE: bb.0
	; CODE-NEXT: ushll v{{.+}}.2d, v{{.+}}.2s, #0			; CODE-NEXT: ushll v{{.+}}.2d, v{{.+}}.2s, #0
	; CODE-NEXT: shl v{{.+}}.2d, v{{.+}}.2d, #63			; CODE-NEXT: shl v{{.+}}.2d, v{{.+}}.2d, #63
	; CODE-NEXT: cmlt v{{.+}}.2d, v{{.+}}.2d, #0			; CODE-NEXT: cmlt v{{.+}}.2d, v{{.+}}.2d, #0
	; CODE-NEXT: bif v{{.+}}.16b, v{{.+}}.16b, v{{.+}}.16b			; CODE-NEXT: bif v{{.+}}.16b, v{{.+}}.16b, v{{.+}}.16b
	; CODE-NEXT: ret			; CODE-NEXT: ret
	Show All 18 Lines
	;			;
	%cmp.1 = fcmp ogt <4 x half> %a, %b			%cmp.1 = fcmp ogt <4 x half> %a, %b
	%s.1 = select <4 x i1> %cmp.1, <4 x half> %a, <4 x half> %c			%s.1 = select <4 x i1> %cmp.1, <4 x half> %a, <4 x half> %c
	ret <4 x half> %s.1			ret <4 x half> %s.1
	}			}

	define <8 x half> @v8f16_select_ogt(<8 x half> %a, <8 x half> %b, <8 x half> %c) {			define <8 x half> @v8f16_select_ogt(<8 x half> %a, <8 x half> %b, <8 x half> %c) {
	; COST-LABEL: v8f16_select_ogt			; COST-LABEL: v8f16_select_ogt
	; COST-NOFP16-NEXT: Cost Model: Found an estimated cost of 29 for instruction: %cmp.1 = fcmp ogt <8 x half> %a, %b			; COST-NOFP16-NEXT: Cost Model: Found an estimated cost of 22 for instruction: %cmp.1 = fcmp ogt <8 x half> %a, %b
	; COST-NOFP16-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %s.1 = select <8 x i1> %cmp.1, <8 x half> %a, <8 x half> %c			; COST-NOFP16-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %s.1 = select <8 x i1> %cmp.1, <8 x half> %a, <8 x half> %c
	; COST-FULLFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %cmp.1 = fcmp ogt <8 x half> %a, %b			; COST-FULLFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %cmp.1 = fcmp ogt <8 x half> %a, %b
	; COST-FULLFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s.1 = select <8 x i1> %cmp.1, <8 x half> %a, <8 x half> %c			; COST-FULLFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s.1 = select <8 x i1> %cmp.1, <8 x half> %a, <8 x half> %c
	;			;
	; CODE-LABEL: v8f16_select_ogt			; CODE-LABEL: v8f16_select_ogt
	; CODE: bb.0			; CODE: bb.0
	; CODE-NEXT: fcmgt v{{.+}}.8h, v{{.+}}.8h, v{{.+}}.8h			; CODE-NEXT: fcmgt v{{.+}}.8h, v{{.+}}.8h, v{{.+}}.8h
	; CODE-NEXT: bif v{{.+}}.16b, v{{.+}}.16b, v{{.+}}.16b			; CODE-NEXT: bif v{{.+}}.16b, v{{.+}}.16b, v{{.+}}.16b
	▲ Show 20 Lines • Show All 67 Lines • ▼ Show 20 Lines
	;			;
	%cmp.1 = fcmp oge <4 x half> %a, %b			%cmp.1 = fcmp oge <4 x half> %a, %b
	%s.1 = select <4 x i1> %cmp.1, <4 x half> %a, <4 x half> %c			%s.1 = select <4 x i1> %cmp.1, <4 x half> %a, <4 x half> %c
	ret <4 x half> %s.1			ret <4 x half> %s.1
	}			}

	define <8 x half> @v8f16_select_oge(<8 x half> %a, <8 x half> %b, <8 x half> %c) {			define <8 x half> @v8f16_select_oge(<8 x half> %a, <8 x half> %b, <8 x half> %c) {
	; COST-LABEL: v8f16_select_oge			; COST-LABEL: v8f16_select_oge
	; COST-NOFP16-NEXT: Cost Model: Found an estimated cost of 29 for instruction: %cmp.1 = fcmp oge <8 x half> %a, %b			; COST-NOFP16-NEXT: Cost Model: Found an estimated cost of 22 for instruction: %cmp.1 = fcmp oge <8 x half> %a, %b
	; COST-NOFP16-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %s.1 = select <8 x i1> %cmp.1, <8 x half> %a, <8 x half> %c			; COST-NOFP16-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %s.1 = select <8 x i1> %cmp.1, <8 x half> %a, <8 x half> %c
	; COST-FULLFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %cmp.1 = fcmp oge <8 x half> %a, %b			; COST-FULLFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %cmp.1 = fcmp oge <8 x half> %a, %b
	; COST-FULLFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s.1 = select <8 x i1> %cmp.1, <8 x half> %a, <8 x half> %c			; COST-FULLFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s.1 = select <8 x i1> %cmp.1, <8 x half> %a, <8 x half> %c
	;			;
	; CODE-LABEL: v8f16_select_oge			; CODE-LABEL: v8f16_select_oge
	; CODE: bb.0			; CODE: bb.0
	; CODE-NEXT: fcmge v{{.+}}.8h, v{{.+}}.8h, v{{.+}}.8h			; CODE-NEXT: fcmge v{{.+}}.8h, v{{.+}}.8h, v{{.+}}.8h
	; CODE-NEXT: bif v{{.+}}.16b, v{{.+}}.16b, v{{.+}}.16b			; CODE-NEXT: bif v{{.+}}.16b, v{{.+}}.16b, v{{.+}}.16b
	▲ Show 20 Lines • Show All 67 Lines • ▼ Show 20 Lines
	;			;
	%cmp.1 = fcmp olt <4 x half> %a, %b			%cmp.1 = fcmp olt <4 x half> %a, %b
	%s.1 = select <4 x i1> %cmp.1, <4 x half> %a, <4 x half> %c			%s.1 = select <4 x i1> %cmp.1, <4 x half> %a, <4 x half> %c
	ret <4 x half> %s.1			ret <4 x half> %s.1
	}			}

	define <8 x half> @v8f16_select_olt(<8 x half> %a, <8 x half> %b, <8 x half> %c) {			define <8 x half> @v8f16_select_olt(<8 x half> %a, <8 x half> %b, <8 x half> %c) {
	; COST-LABEL: v8f16_select_olt			; COST-LABEL: v8f16_select_olt
	; COST-NOFP16-NEXT: Cost Model: Found an estimated cost of 29 for instruction: %cmp.1 = fcmp olt <8 x half> %a, %b			; COST-NOFP16-NEXT: Cost Model: Found an estimated cost of 22 for instruction: %cmp.1 = fcmp olt <8 x half> %a, %b
	; COST-NOFP16-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %s.1 = select <8 x i1> %cmp.1, <8 x half> %a, <8 x half> %c			; COST-NOFP16-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %s.1 = select <8 x i1> %cmp.1, <8 x half> %a, <8 x half> %c
	; COST-FULLFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %cmp.1 = fcmp olt <8 x half> %a, %b			; COST-FULLFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %cmp.1 = fcmp olt <8 x half> %a, %b
	; COST-FULLFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s.1 = select <8 x i1> %cmp.1, <8 x half> %a, <8 x half> %c			; COST-FULLFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s.1 = select <8 x i1> %cmp.1, <8 x half> %a, <8 x half> %c
	;			;
	; CODE-LABEL: v8f16_select_olt			; CODE-LABEL: v8f16_select_olt
	; CODE: bb.0			; CODE: bb.0
	; CODE-NEXT: fcmgt v{{.+}}.8h, v{{.+}}.8h, v{{.+}}.8h			; CODE-NEXT: fcmgt v{{.+}}.8h, v{{.+}}.8h, v{{.+}}.8h
	; CODE-NEXT: bif v{{.+}}.16b, v{{.+}}.16b, v{{.+}}.16b			; CODE-NEXT: bif v{{.+}}.16b, v{{.+}}.16b, v{{.+}}.16b
	▲ Show 20 Lines • Show All 67 Lines • ▼ Show 20 Lines
	;			;
	%cmp.1 = fcmp ole <4 x half> %a, %b			%cmp.1 = fcmp ole <4 x half> %a, %b
	%s.1 = select <4 x i1> %cmp.1, <4 x half> %a, <4 x half> %c			%s.1 = select <4 x i1> %cmp.1, <4 x half> %a, <4 x half> %c
	ret <4 x half> %s.1			ret <4 x half> %s.1
	}			}

	define <8 x half> @v8f16_select_ole(<8 x half> %a, <8 x half> %b, <8 x half> %c) {			define <8 x half> @v8f16_select_ole(<8 x half> %a, <8 x half> %b, <8 x half> %c) {
	; COST-LABEL: v8f16_select_ole			; COST-LABEL: v8f16_select_ole
	; COST-NOFP16-NEXT: Cost Model: Found an estimated cost of 29 for instruction: %cmp.1 = fcmp ole <8 x half> %a, %b			; COST-NOFP16-NEXT: Cost Model: Found an estimated cost of 22 for instruction: %cmp.1 = fcmp ole <8 x half> %a, %b
	; COST-NOFP16-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %s.1 = select <8 x i1> %cmp.1, <8 x half> %a, <8 x half> %c			; COST-NOFP16-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %s.1 = select <8 x i1> %cmp.1, <8 x half> %a, <8 x half> %c
	; COST-FULLFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %cmp.1 = fcmp ole <8 x half> %a, %b			; COST-FULLFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %cmp.1 = fcmp ole <8 x half> %a, %b
	; COST-FULLFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s.1 = select <8 x i1> %cmp.1, <8 x half> %a, <8 x half> %c			; COST-FULLFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s.1 = select <8 x i1> %cmp.1, <8 x half> %a, <8 x half> %c
	;			;
	; CODE-LABEL: v8f16_select_ole			; CODE-LABEL: v8f16_select_ole
	; CODE: bb.0			; CODE: bb.0
	; CODE-NEXT: fcmge v{{.+}}.8h, v{{.+}}.8h, v{{.+}}.8h			; CODE-NEXT: fcmge v{{.+}}.8h, v{{.+}}.8h, v{{.+}}.8h
	; CODE-NEXT: bif v{{.+}}.16b, v{{.+}}.16b, v{{.+}}.16b			; CODE-NEXT: bif v{{.+}}.16b, v{{.+}}.16b, v{{.+}}.16b
	▲ Show 20 Lines • Show All 67 Lines • ▼ Show 20 Lines
	;			;
	%cmp.1 = fcmp oeq <4 x half> %a, %b			%cmp.1 = fcmp oeq <4 x half> %a, %b
	%s.1 = select <4 x i1> %cmp.1, <4 x half> %a, <4 x half> %c			%s.1 = select <4 x i1> %cmp.1, <4 x half> %a, <4 x half> %c
	ret <4 x half> %s.1			ret <4 x half> %s.1
	}			}

	define <8 x half> @v8f16_select_oeq(<8 x half> %a, <8 x half> %b, <8 x half> %c) {			define <8 x half> @v8f16_select_oeq(<8 x half> %a, <8 x half> %b, <8 x half> %c) {
	; COST-LABEL: v8f16_select_oeq			; COST-LABEL: v8f16_select_oeq
	; COST-NOFP16-NEXT: Cost Model: Found an estimated cost of 29 for instruction: %cmp.1 = fcmp oeq <8 x half> %a, %b			; COST-NOFP16-NEXT: Cost Model: Found an estimated cost of 22 for instruction: %cmp.1 = fcmp oeq <8 x half> %a, %b
	; COST-NOFP16-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %s.1 = select <8 x i1> %cmp.1, <8 x half> %a, <8 x half> %c			; COST-NOFP16-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %s.1 = select <8 x i1> %cmp.1, <8 x half> %a, <8 x half> %c
	; COST-FULLFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %cmp.1 = fcmp oeq <8 x half> %a, %b			; COST-FULLFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %cmp.1 = fcmp oeq <8 x half> %a, %b
	; COST-FULLFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s.1 = select <8 x i1> %cmp.1, <8 x half> %a, <8 x half> %c			; COST-FULLFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s.1 = select <8 x i1> %cmp.1, <8 x half> %a, <8 x half> %c
	;			;
	; CODE-LABEL: v8f16_select_oeq			; CODE-LABEL: v8f16_select_oeq
	; CODE: bb.0			; CODE: bb.0
	; CODE-NEXT: fcmeq v{{.+}}.8h, v{{.+}}.8h, v{{.+}}.8h			; CODE-NEXT: fcmeq v{{.+}}.8h, v{{.+}}.8h, v{{.+}}.8h
	; CODE-NEXT: bif v{{.+}}.16b, v{{.+}}.16b, v{{.+}}.16b			; CODE-NEXT: bif v{{.+}}.16b, v{{.+}}.16b, v{{.+}}.16b
	▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines
	;			;
	%cmp.1 = fcmp one <4 x half> %a, %b			%cmp.1 = fcmp one <4 x half> %a, %b
	%s.1 = select <4 x i1> %cmp.1, <4 x half> %a, <4 x half> %c			%s.1 = select <4 x i1> %cmp.1, <4 x half> %a, <4 x half> %c
	ret <4 x half> %s.1			ret <4 x half> %s.1
	}			}

	define <8 x half> @v8f16_select_one(<8 x half> %a, <8 x half> %b, <8 x half> %c) {			define <8 x half> @v8f16_select_one(<8 x half> %a, <8 x half> %b, <8 x half> %c) {
	; COST-LABEL: v8f16_select_one			; COST-LABEL: v8f16_select_one
	; COST-NOFP16-NEXT: Cost Model: Found an estimated cost of 29 for instruction: %cmp.1 = fcmp one <8 x half> %a, %b			; COST-NOFP16-NEXT: Cost Model: Found an estimated cost of 22 for instruction: %cmp.1 = fcmp one <8 x half> %a, %b
	; COST-NOFP16-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %s.1 = select <8 x i1> %cmp.1, <8 x half> %a, <8 x half> %c			; COST-NOFP16-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %s.1 = select <8 x i1> %cmp.1, <8 x half> %a, <8 x half> %c
	; COST-FULLFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %cmp.1 = fcmp one <8 x half> %a, %b			; COST-FULLFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %cmp.1 = fcmp one <8 x half> %a, %b
	; COST-FULLFP16-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %s.1 = select <8 x i1> %cmp.1, <8 x half> %a, <8 x half> %c			; COST-FULLFP16-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %s.1 = select <8 x i1> %cmp.1, <8 x half> %a, <8 x half> %c
	;			;
	; CODE-LABEL: v8f16_select_one			; CODE-LABEL: v8f16_select_one
	; CODE: bb.0			; CODE: bb.0
	; CODE-NEXT: fcmgt v{{.+}}.8h, v{{.+}}.8h, v{{.+}}.8h			; CODE-NEXT: fcmgt v{{.+}}.8h, v{{.+}}.8h, v{{.+}}.8h
	; CODE-NEXT: fcmgt v{{.+}}.8h, v{{.+}}.8h, v{{.+}}.8h			; CODE-NEXT: fcmgt v{{.+}}.8h, v{{.+}}.8h, v{{.+}}.8h
	▲ Show 20 Lines • Show All 75 Lines • ▼ Show 20 Lines
	;			;
	%cmp.1 = fcmp une <4 x half> %a, %b			%cmp.1 = fcmp une <4 x half> %a, %b
	%s.1 = select <4 x i1> %cmp.1, <4 x half> %a, <4 x half> %c			%s.1 = select <4 x i1> %cmp.1, <4 x half> %a, <4 x half> %c
	ret <4 x half> %s.1			ret <4 x half> %s.1
	}			}

	define <8 x half> @v8f16_select_une(<8 x half> %a, <8 x half> %b, <8 x half> %c) {			define <8 x half> @v8f16_select_une(<8 x half> %a, <8 x half> %b, <8 x half> %c) {
	; COST-LABEL: v8f16_select_une			; COST-LABEL: v8f16_select_une
	; COST-NOFP16-NEXT: Cost Model: Found an estimated cost of 29 for instruction: %cmp.1 = fcmp une <8 x half> %a, %b			; COST-NOFP16-NEXT: Cost Model: Found an estimated cost of 22 for instruction: %cmp.1 = fcmp une <8 x half> %a, %b
				efriedmaUnsubmitted Not Done Reply Inline Actions Cost modeling is weird. efriedma: Cost modeling is weird.
				dmgreenAuthorUnsubmitted Done Reply Inline Actions Because it is too low? It is scalarized without +fullfp16. That codegen could be better, and it looks like the cost is a bit low, not accounting for the scalarization cost of the extracts. I don't think we have focussed much in the past on the combination of fp16 code without fullfp16. dmgreen: Because it is too low? It is scalarized without +fullfp16. That codegen could be better, and it…
				efriedmaUnsubmitted Not Done Reply Inline Actions Wait, we scalarize this? I thought I checked this, but must not have. We really shouldn't scalarize, though. efriedma: Wait, we scalarize this? I thought I checked this, but must not have. We really shouldn't…
				SjoerdMeijerUnsubmitted Not Done Reply Inline Actions Without fullfp16 support, which is what this is checking with "COST-NOFP16-NEXT", I expect this to get scalarised. SjoerdMeijer: Without fullfp16 support, which is what this is checking with "COST-NOFP16-NEXT", I expect this…
	; COST-NOFP16-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %s.1 = select <8 x i1> %cmp.1, <8 x half> %a, <8 x half> %c			; COST-NOFP16-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %s.1 = select <8 x i1> %cmp.1, <8 x half> %a, <8 x half> %c
	; COST-FULLFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %cmp.1 = fcmp une <8 x half> %a, %b			; COST-FULLFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %cmp.1 = fcmp une <8 x half> %a, %b
	; COST-FULLFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s.1 = select <8 x i1> %cmp.1, <8 x half> %a, <8 x half> %c			; COST-FULLFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s.1 = select <8 x i1> %cmp.1, <8 x half> %a, <8 x half> %c
	;			;
	; CODE-LABEL: v8f16_select_une			; CODE-LABEL: v8f16_select_une
	; CODE: bb.0			; CODE: bb.0
	; CODE-NEXT: fcmeq v{{.+}}.8h, v{{.+}}.8h, v{{.+}}.8h			; CODE-NEXT: fcmeq v{{.+}}.8h, v{{.+}}.8h, v{{.+}}.8h
	; CODE-NEXT: bit v{{.+}}.16b, v{{.+}}.16b, v{{.+}}.16b			; CODE-NEXT: bit v{{.+}}.16b, v{{.+}}.16b, v{{.+}}.16b
	▲ Show 20 Lines • Show All 108 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/AArch64/aarch64-predication.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; REQUIRES: asserts			; REQUIRES: asserts
	; RUN: opt < %s -passes=loop-vectorize -disable-output -debug-only=loop-vectorize 2>&1 \| FileCheck %s --check-prefix=COST			; RUN: opt < %s -passes=loop-vectorize -disable-output -debug-only=loop-vectorize 2>&1 \| FileCheck %s --check-prefix=COST
	; RUN: opt < %s -passes=loop-vectorize,instcombine,simplifycfg -force-vector-width=2 -simplifycfg-require-and-preserve-domtree=1 -S \| FileCheck %s			; RUN: opt < %s -passes=loop-vectorize,instcombine,simplifycfg -force-vector-width=2 -simplifycfg-require-and-preserve-domtree=1 -S \| FileCheck %s

	target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"
	target triple = "aarch64--linux-gnu"			target triple = "aarch64--linux-gnu"

	; This test checks that we correctly compute the scalarized operands for a			; This test checks that we correctly compute the scalarized operands for a
	; user-specified vectorization factor when interleaving is disabled. We use the			; user-specified vectorization factor when interleaving is disabled. We use the
	; "optsize" attribute to disable all interleaving calculations. A cost of 4			; "optsize" attribute to disable all interleaving calculations. A cost of 4
	; for %var4 indicates that we would scalarize it's operand (%var3), giving			; for %var4 indicates that we would scalarize it's operand (%var3), giving
	; %var4 a lower scalarization overhead.			; %var4 a lower scalarization overhead.
	;			;
	; COST-LABEL: predicated_udiv_scalarized_operand			; COST-LABEL: predicated_udiv_scalarized_operand
	; COST: LV: Found an estimated cost of 4 for VF 2 For instruction: %var4 = udiv i64 %var2, %var3			; COST: LV: Found an estimated cost of 5 for VF 2 For instruction: %var4 = udiv i64 %var2, %var3
	;			;
	;			;
	define i64 @predicated_udiv_scalarized_operand(ptr %a, i64 %x) optsize {			define i64 @predicated_udiv_scalarized_operand(ptr %a, i64 %x) optsize {
	; CHECK-LABEL: @predicated_udiv_scalarized_operand(			; CHECK-LABEL: @predicated_udiv_scalarized_operand(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDEX_NEXT:%.]], [[PRED_UDIV_CONTINUE2:%.]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDEX_NEXT:%.]], [[PRED_UDIV_CONTINUE2:%.]] ]
	▲ Show 20 Lines • Show All 62 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/AArch64/interleaved-vs-scalar.ll

	; REQUIRES: asserts			; REQUIRES: asserts
	; RUN: opt < %s -force-vector-width=2 -force-vector-interleave=1 -passes=loop-vectorize -S --debug-only=loop-vectorize 2>&1 \| FileCheck %s			; RUN: opt < %s -force-vector-width=2 -force-vector-interleave=1 -passes=loop-vectorize -S --debug-only=loop-vectorize 2>&1 \| FileCheck %s

	target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
	target triple = "aarch64--linux-gnu"			target triple = "aarch64--linux-gnu"

	%pair = type { i8, i8 }			%pair = type { i8, i8 }

	; CHECK-LABEL: test			; CHECK-LABEL: test
	; CHECK: Found an estimated cost of 14 for VF 2 For instruction: {{.*}} load i8			; CHECK: Found an estimated cost of 16 for VF 2 For instruction: {{.*}} load i8
	; CHECK: Found an estimated cost of 0 for VF 2 For instruction: {{.*}} load i8			; CHECK: Found an estimated cost of 0 for VF 2 For instruction: {{.*}} load i8
	; CHECK-LABEL: entry:			; CHECK-LABEL: entry:
	; CHECK-LABEL: vector.body:			; CHECK-LABEL: vector.body:
	; CHECK: [[LOAD1:%.*]] = load i8			; CHECK: [[LOAD1:%.*]] = load i8
	; CHECK: [[LOAD2:%.*]] = load i8			; CHECK: [[LOAD2:%.*]] = load i8
	; CHECK: [[INSERT:%.*]] = insertelement <2 x i8> poison, i8 [[LOAD1]], i32 0			; CHECK: [[INSERT:%.*]] = insertelement <2 x i8> poison, i8 [[LOAD1]], i32 0
	; CHECK: insertelement <2 x i8> [[INSERT]], i8 [[LOAD2]], i32 1			; CHECK: insertelement <2 x i8> [[INSERT]], i8 [[LOAD2]], i32 1
	; CHECK: br i1 {{.*}}, label %middle.block, label %vector.body			; CHECK: br i1 {{.*}}, label %middle.block, label %vector.body
	Show All 21 Lines

llvm/test/Transforms/LoopVectorize/AArch64/interleaved_cost.ll

	Show First 20 Lines • Show All 162 Lines • ▼ Show 20 Lines

	; The interleave factor in this test is 8, which is greater than the maximum			; The interleave factor in this test is 8, which is greater than the maximum
	; allowed factor for AArch64 (4). Thus, we will fall back to the basic TTI			; allowed factor for AArch64 (4). Thus, we will fall back to the basic TTI
	; implementation for determining the cost of the interleaved load group. The			; implementation for determining the cost of the interleaved load group. The
	; stores do not form a legal interleaved group because the group would contain			; stores do not form a legal interleaved group because the group would contain
	; gaps.			; gaps.
	;			;
	; VF_2-LABEL: Checking a loop in 'i64_factor_8'			; VF_2-LABEL: Checking a loop in 'i64_factor_8'
	; VF_2: Found an estimated cost of 10 for VF 2 For instruction: %tmp2 = load i64, ptr %tmp0, align 8			; VF_2: Found an estimated cost of 16 for VF 2 For instruction: %tmp2 = load i64, ptr %tmp0, align 8
	; VF_2-NEXT: Found an estimated cost of 0 for VF 2 For instruction: %tmp3 = load i64, ptr %tmp1, align 8			; VF_2-NEXT: Found an estimated cost of 0 for VF 2 For instruction: %tmp3 = load i64, ptr %tmp1, align 8
	; VF_2-NEXT: Found an estimated cost of 7 for VF 2 For instruction: store i64 0, ptr %tmp0, align 8			; VF_2-NEXT: Found an estimated cost of 8 for VF 2 For instruction: store i64 0, ptr %tmp0, align 8
	; VF_2-NEXT: Found an estimated cost of 7 for VF 2 For instruction: store i64 0, ptr %tmp1, align 8			; VF_2-NEXT: Found an estimated cost of 8 for VF 2 For instruction: store i64 0, ptr %tmp1, align 8
	for.body:			for.body:
	%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]			%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]
	%tmp0 = getelementptr inbounds %i64.8, ptr %data, i64 %i, i32 2			%tmp0 = getelementptr inbounds %i64.8, ptr %data, i64 %i, i32 2
	%tmp1 = getelementptr inbounds %i64.8, ptr %data, i64 %i, i32 6			%tmp1 = getelementptr inbounds %i64.8, ptr %data, i64 %i, i32 6
	%tmp2 = load i64, ptr %tmp0, align 8			%tmp2 = load i64, ptr %tmp0, align 8
	%tmp3 = load i64, ptr %tmp1, align 8			%tmp3 = load i64, ptr %tmp1, align 8
	store i64 0, ptr %tmp0, align 8			store i64 0, ptr %tmp0, align 8
	store i64 0, ptr %tmp1, align 8			store i64 0, ptr %tmp1, align 8
	%i.next = add nuw nsw i64 %i, 1			%i.next = add nuw nsw i64 %i, 1
	%cond = icmp slt i64 %i.next, %n			%cond = icmp slt i64 %i.next, %n
	br i1 %cond, label %for.body, label %for.end			br i1 %cond, label %for.body, label %for.end

	for.end:			for.end:
	ret void			ret void
	}			}

llvm/test/Transforms/LoopVectorize/AArch64/masked-op-cost.ll

	; REQUIRES: asserts			; REQUIRES: asserts
	; RUN: opt -passes=loop-vectorize -force-vector-interleave=1 -S -debug-only=loop-vectorize < %s 2>%t \| FileCheck %s			; RUN: opt -passes=loop-vectorize -force-vector-interleave=1 -S -debug-only=loop-vectorize < %s 2>%t \| FileCheck %s
	; RUN: cat %t \| FileCheck %s --check-prefix=CHECK-COST			; RUN: cat %t \| FileCheck %s --check-prefix=CHECK-COST

	target triple = "aarch64-unknown-linux-gnu"			target triple = "aarch64-unknown-linux-gnu"

	; CHECK-COST: Checking a loop in 'fixed_width'			; CHECK-COST: Checking a loop in 'fixed_width'
	; CHECK-COST: Found an estimated cost of 11 for VF 2 For instruction: store i32 2, ptr %arrayidx1, align 4			; CHECK-COST: Found an estimated cost of 12 for VF 2 For instruction: store i32 2, ptr %arrayidx1, align 4
	; CHECK-COST: Found an estimated cost of 25 for VF 4 For instruction: store i32 2, ptr %arrayidx1, align 4			; CHECK-COST: Found an estimated cost of 24 for VF 4 For instruction: store i32 2, ptr %arrayidx1, align 4
	; CHECK-COST: Selecting VF: 1.			; CHECK-COST: Selecting VF: 1.

	; We should decide this loop is not worth vectorising using fixed width vectors			; We should decide this loop is not worth vectorising using fixed width vectors
	define void @fixed_width(ptr noalias nocapture %a, ptr noalias nocapture readonly %b, i64 %n) #0 {			define void @fixed_width(ptr noalias nocapture %a, ptr noalias nocapture readonly %b, i64 %n) #0 {
	; CHECK-LABEL: @fixed_width(			; CHECK-LABEL: @fixed_width(
	; CHECK-NOT: vector.body			; CHECK-NOT: vector.body
	entry:			entry:
	%cmp6 = icmp sgt i64 %n, 0			%cmp6 = icmp sgt i64 %n, 0
	▲ Show 20 Lines • Show All 77 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/AArch64/predication_costs.ll

	Show All 10 Lines

	; CHECK-LABEL: predicated_udiv			; CHECK-LABEL: predicated_udiv
	;			;
	; This test checks that we correctly compute the cost of the predicated udiv			; This test checks that we correctly compute the cost of the predicated udiv
	; instruction. If we assume the block probability is 50%, we compute the cost			; instruction. If we assume the block probability is 50%, we compute the cost
	; as:			; as:
	;			;
	; Cost of udiv:			; Cost of udiv:
	; (udiv(2) + extractelement(6) + insertelement(3)) / 2 = 5			; (udiv(2) + extractelement(8) + insertelement(4)) / 2 = 7
	;			;
	; CHECK: Scalarizing and predicating: %tmp4 = udiv i32 %tmp2, %tmp3			; CHECK: Scalarizing and predicating: %tmp4 = udiv i32 %tmp2, %tmp3
	; CHECK: Found an estimated cost of 5 for VF 2 For instruction: %tmp4 = udiv i32 %tmp2, %tmp3			; CHECK: Found an estimated cost of 7 for VF 2 For instruction: %tmp4 = udiv i32 %tmp2, %tmp3
	;			;
	define i32 @predicated_udiv(ptr %a, ptr %b, i1 %c, i64 %n) {			define i32 @predicated_udiv(ptr %a, ptr %b, i1 %c, i64 %n) {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%i = phi i64 [ 0, %entry ], [ %i.next, %for.inc ]			%i = phi i64 [ 0, %entry ], [ %i.next, %for.inc ]
	%r = phi i32 [ 0, %entry ], [ %tmp6, %for.inc ]			%r = phi i32 [ 0, %entry ], [ %tmp6, %for.inc ]
	Show All 21 Lines

	; CHECK-LABEL: predicated_store			; CHECK-LABEL: predicated_store
	;			;
	; This test checks that we correctly compute the cost of the predicated store			; This test checks that we correctly compute the cost of the predicated store
	; instruction. If we assume the block probability is 50%, we compute the cost			; instruction. If we assume the block probability is 50%, we compute the cost
	; as:			; as:
	;			;
	; Cost of store:			; Cost of store:
	; (store(4) + extractelement(3)) / 2 = 3			; (store(4) + extractelement(4)) / 2 = 4
	;			;
	; CHECK: Scalarizing and predicating: store i32 %tmp2, ptr %tmp0, align 4			; CHECK: Scalarizing and predicating: store i32 %tmp2, ptr %tmp0, align 4
	; CHECK: Found an estimated cost of 3 for VF 2 For instruction: store i32 %tmp2, ptr %tmp0, align 4			; CHECK: Found an estimated cost of 4 for VF 2 For instruction: store i32 %tmp2, ptr %tmp0, align 4
	;			;
	define void @predicated_store(ptr %a, i1 %c, i32 %x, i64 %n) {			define void @predicated_store(ptr %a, i1 %c, i32 %x, i64 %n) {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%i = phi i64 [ 0, %entry ], [ %i.next, %for.inc ]			%i = phi i64 [ 0, %entry ], [ %i.next, %for.inc ]
	%tmp0 = getelementptr inbounds i32, ptr %a, i64 %i			%tmp0 = getelementptr inbounds i32, ptr %a, i64 %i
	Show All 17 Lines
	; CHECK-LABEL: predicated_store_phi			; CHECK-LABEL: predicated_store_phi
	;			;
	; Same as predicate_store except we use a pointer PHI to maintain the address			; Same as predicate_store except we use a pointer PHI to maintain the address
	;			;
	; CHECK: Found scalar instruction: %addr = phi ptr [ %a, %entry ], [ %addr.next, %for.inc ]			; CHECK: Found scalar instruction: %addr = phi ptr [ %a, %entry ], [ %addr.next, %for.inc ]
	; CHECK: Found scalar instruction: %addr.next = getelementptr inbounds i32, ptr %addr, i64 1			; CHECK: Found scalar instruction: %addr.next = getelementptr inbounds i32, ptr %addr, i64 1
	; CHECK: Scalarizing and predicating: store i32 %tmp2, ptr %addr, align 4			; CHECK: Scalarizing and predicating: store i32 %tmp2, ptr %addr, align 4
	; CHECK: Found an estimated cost of 0 for VF 2 For instruction: %addr = phi ptr [ %a, %entry ], [ %addr.next, %for.inc ]			; CHECK: Found an estimated cost of 0 for VF 2 For instruction: %addr = phi ptr [ %a, %entry ], [ %addr.next, %for.inc ]
	; CHECK: Found an estimated cost of 3 for VF 2 For instruction: store i32 %tmp2, ptr %addr, align 4			; CHECK: Found an estimated cost of 4 for VF 2 For instruction: store i32 %tmp2, ptr %addr, align 4
	;			;
	define void @predicated_store_phi(ptr %a, i1 %c, i32 %x, i64 %n) {			define void @predicated_store_phi(ptr %a, i1 %c, i32 %x, i64 %n) {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%i = phi i64 [ 0, %entry ], [ %i.next, %for.inc ]			%i = phi i64 [ 0, %entry ], [ %i.next, %for.inc ]
	%addr = phi ptr [ %a, %entry ], [ %addr.next, %for.inc ]			%addr = phi ptr [ %a, %entry ], [ %addr.next, %for.inc ]
	Show All 18 Lines
	; CHECK-LABEL: predicated_udiv_scalarized_operand			; CHECK-LABEL: predicated_udiv_scalarized_operand
	;			;
	; This test checks that we correctly compute the cost of the predicated udiv			; This test checks that we correctly compute the cost of the predicated udiv
	; instruction and the add instruction it uses. The add is scalarized and sunk			; instruction and the add instruction it uses. The add is scalarized and sunk
	; inside the predicated block. If we assume the block probability is 50%, we			; inside the predicated block. If we assume the block probability is 50%, we
	; compute the cost as:			; compute the cost as:
	;			;
	; Cost of add:			; Cost of add:
	; (add(2) + extractelement(3)) / 2 = 2			; (add(2) + extractelement(4)) / 2 = 3
	; Cost of udiv:			; Cost of udiv:
	; (udiv(2) + extractelement(3) + insertelement(3)) / 2 = 4			; (udiv(2) + extractelement(4) + insertelement(4)) / 2 = 5
	;			;
	; CHECK: Scalarizing: %tmp3 = add nsw i32 %tmp2, %x			; CHECK: Scalarizing: %tmp3 = add nsw i32 %tmp2, %x
	; CHECK: Scalarizing and predicating: %tmp4 = udiv i32 %tmp2, %tmp3			; CHECK: Scalarizing and predicating: %tmp4 = udiv i32 %tmp2, %tmp3
	; CHECK: Found an estimated cost of 2 for VF 2 For instruction: %tmp3 = add nsw i32 %tmp2, %x			; CHECK: Found an estimated cost of 3 for VF 2 For instruction: %tmp3 = add nsw i32 %tmp2, %x
	; CHECK: Found an estimated cost of 4 for VF 2 For instruction: %tmp4 = udiv i32 %tmp2, %tmp3			; CHECK: Found an estimated cost of 5 for VF 2 For instruction: %tmp4 = udiv i32 %tmp2, %tmp3
	;			;
	define i32 @predicated_udiv_scalarized_operand(ptr %a, i1 %c, i32 %x, i64 %n) {			define i32 @predicated_udiv_scalarized_operand(ptr %a, i1 %c, i32 %x, i64 %n) {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%i = phi i64 [ 0, %entry ], [ %i.next, %for.inc ]			%i = phi i64 [ 0, %entry ], [ %i.next, %for.inc ]
	%r = phi i32 [ 0, %entry ], [ %tmp6, %for.inc ]			%r = phi i32 [ 0, %entry ], [ %tmp6, %for.inc ]
	Show All 21 Lines
	; CHECK-LABEL: predicated_store_scalarized_operand			; CHECK-LABEL: predicated_store_scalarized_operand
	;			;
	; This test checks that we correctly compute the cost of the predicated store			; This test checks that we correctly compute the cost of the predicated store
	; instruction and the add instruction it uses. The add is scalarized and sunk			; instruction and the add instruction it uses. The add is scalarized and sunk
	; inside the predicated block. If we assume the block probability is 50%, we			; inside the predicated block. If we assume the block probability is 50%, we
	; compute the cost as:			; compute the cost as:
	;			;
	; Cost of add:			; Cost of add:
	; (add(2) + extractelement(3)) / 2 = 2			; (add(2) + extractelement(4)) / 2 = 3
	; Cost of store:			; Cost of store:
	; store(4) / 2 = 2			; store(4) / 2 = 2
	;			;
	; CHECK: Scalarizing: %tmp2 = add nsw i32 %tmp1, %x			; CHECK: Scalarizing: %tmp2 = add nsw i32 %tmp1, %x
	; CHECK: Scalarizing and predicating: store i32 %tmp2, ptr %tmp0, align 4			; CHECK: Scalarizing and predicating: store i32 %tmp2, ptr %tmp0, align 4
	; CHECK: Found an estimated cost of 2 for VF 2 For instruction: %tmp2 = add nsw i32 %tmp1, %x			; CHECK: Found an estimated cost of 3 for VF 2 For instruction: %tmp2 = add nsw i32 %tmp1, %x
	; CHECK: Found an estimated cost of 2 for VF 2 For instruction: store i32 %tmp2, ptr %tmp0, align 4			; CHECK: Found an estimated cost of 2 for VF 2 For instruction: store i32 %tmp2, ptr %tmp0, align 4
	;			;
	define void @predicated_store_scalarized_operand(ptr %a, i1 %c, i32 %x, i64 %n) {			define void @predicated_store_scalarized_operand(ptr %a, i1 %c, i32 %x, i64 %n) {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%i = phi i64 [ 0, %entry ], [ %i.next, %for.inc ]			%i = phi i64 [ 0, %entry ], [ %i.next, %for.inc ]
	Show All 22 Lines
	; and predicated. The sub feeding the store is scalarized and sunk inside the			; and predicated. The sub feeding the store is scalarized and sunk inside the
	; store's predicated block. However, the add feeding the sdiv and udiv cannot			; store's predicated block. However, the add feeding the sdiv and udiv cannot
	; be sunk and is not scalarized. If we assume the block probability is 50%, we			; be sunk and is not scalarized. If we assume the block probability is 50%, we
	; compute the cost as:			; compute the cost as:
	;			;
	; Cost of add:			; Cost of add:
	; add(1) = 1			; add(1) = 1
	; Cost of sdiv:			; Cost of sdiv:
	; (sdiv(2) + extractelement(6) + insertelement(3)) / 2 = 5			; (sdiv(2) + extractelement(8) + insertelement(4)) / 2 = 7
	; Cost of udiv:			; Cost of udiv:
	; (udiv(2) + extractelement(6) + insertelement(3)) / 2 = 5			; (udiv(2) + extractelement(8) + insertelement(4)) / 2 = 7
	; Cost of sub:			; Cost of sub:
	; (sub(2) + extractelement(3)) / 2 = 2			; (sub(2) + extractelement(4)) / 2 = 3
	; Cost of store:			; Cost of store:
	; store(4) / 2 = 2			; store(4) / 2 = 2
	;			;
	; CHECK-NOT: Scalarizing: %tmp2 = add i32 %tmp1, %x			; CHECK-NOT: Scalarizing: %tmp2 = add i32 %tmp1, %x
	; CHECK: Scalarizing and predicating: %tmp3 = sdiv i32 %tmp1, %tmp2			; CHECK: Scalarizing and predicating: %tmp3 = sdiv i32 %tmp1, %tmp2
	; CHECK: Scalarizing and predicating: %tmp4 = udiv i32 %tmp3, %tmp2			; CHECK: Scalarizing and predicating: %tmp4 = udiv i32 %tmp3, %tmp2
	; CHECK: Scalarizing: %tmp5 = sub i32 %tmp4, %x			; CHECK: Scalarizing: %tmp5 = sub i32 %tmp4, %x
	; CHECK: Scalarizing and predicating: store i32 %tmp5, ptr %tmp0, align 4			; CHECK: Scalarizing and predicating: store i32 %tmp5, ptr %tmp0, align 4
	; CHECK: Found an estimated cost of 1 for VF 2 For instruction: %tmp2 = add i32 %tmp1, %x			; CHECK: Found an estimated cost of 1 for VF 2 For instruction: %tmp2 = add i32 %tmp1, %x
	; CHECK: Found an estimated cost of 5 for VF 2 For instruction: %tmp3 = sdiv i32 %tmp1, %tmp2			; CHECK: Found an estimated cost of 7 for VF 2 For instruction: %tmp3 = sdiv i32 %tmp1, %tmp2
	; CHECK: Found an estimated cost of 5 for VF 2 For instruction: %tmp4 = udiv i32 %tmp3, %tmp2			; CHECK: Found an estimated cost of 7 for VF 2 For instruction: %tmp4 = udiv i32 %tmp3, %tmp2
	; CHECK: Found an estimated cost of 2 for VF 2 For instruction: %tmp5 = sub i32 %tmp4, %x			; CHECK: Found an estimated cost of 3 for VF 2 For instruction: %tmp5 = sub i32 %tmp4, %x
	; CHECK: Found an estimated cost of 2 for VF 2 For instruction: store i32 %tmp5, ptr %tmp0, align 4			; CHECK: Found an estimated cost of 2 for VF 2 For instruction: store i32 %tmp5, ptr %tmp0, align 4
	;			;
	define void @predication_multi_context(ptr %a, i1 %c, i32 %x, i64 %n) {			define void @predication_multi_context(ptr %a, i1 %c, i32 %x, i64 %n) {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%i = phi i64 [ 0, %entry ], [ %i.next, %for.inc ]			%i = phi i64 [ 0, %entry ], [ %i.next, %for.inc ]
	Show All 20 Lines

llvm/test/Transforms/LoopVectorize/AArch64/strict-fadd-cost.ll

	; REQUIRES: asserts			; REQUIRES: asserts
	; RUN: opt < %s -passes=loop-vectorize -debug -disable-output -force-ordered-reductions=true -hints-allow-reordering=false \			; RUN: opt < %s -passes=loop-vectorize -debug -disable-output -force-ordered-reductions=true -hints-allow-reordering=false \
	; RUN: -force-vector-width=4 -force-vector-interleave=1 -S 2>&1 \| FileCheck %s --check-prefix=CHECK-VF4			; RUN: -force-vector-width=4 -force-vector-interleave=1 -S 2>&1 \| FileCheck %s --check-prefix=CHECK-VF4
	; RUN: opt < %s -passes=loop-vectorize -debug -disable-output -force-ordered-reductions=true -hints-allow-reordering=false \			; RUN: opt < %s -passes=loop-vectorize -debug -disable-output -force-ordered-reductions=true -hints-allow-reordering=false \
	; RUN: -force-vector-width=8 -force-vector-interleave=1 -S 2>&1 \| FileCheck %s --check-prefix=CHECK-VF8			; RUN: -force-vector-width=8 -force-vector-interleave=1 -S 2>&1 \| FileCheck %s --check-prefix=CHECK-VF8

	target triple="aarch64-unknown-linux-gnu"			target triple="aarch64-unknown-linux-gnu"

	; CHECK-VF4: Found an estimated cost of 17 for VF 4 For instruction: %add = fadd float %0, %sum.07			; CHECK-VF4: Found an estimated cost of 14 for VF 4 For instruction: %add = fadd float %0, %sum.07
	; CHECK-VF8: Found an estimated cost of 34 for VF 8 For instruction: %add = fadd float %0, %sum.07			; CHECK-VF8: Found an estimated cost of 28 for VF 8 For instruction: %add = fadd float %0, %sum.07

	define float @fadd_strict32(ptr noalias nocapture readonly %a, i64 %n) {			define float @fadd_strict32(ptr noalias nocapture readonly %a, i64 %n) {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]			%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
	%sum.07 = phi float [ 0.000000e+00, %entry ], [ %add, %for.body ]			%sum.07 = phi float [ 0.000000e+00, %entry ], [ %add, %for.body ]
	%arrayidx = getelementptr inbounds float, ptr %a, i64 %iv			%arrayidx = getelementptr inbounds float, ptr %a, i64 %iv
	%0 = load float, ptr %arrayidx, align 4			%0 = load float, ptr %arrayidx, align 4
	%add = fadd float %0, %sum.07			%add = fadd float %0, %sum.07
	%iv.next = add nuw nsw i64 %iv, 1			%iv.next = add nuw nsw i64 %iv, 1
	%exitcond.not = icmp eq i64 %iv.next, %n			%exitcond.not = icmp eq i64 %iv.next, %n
	br i1 %exitcond.not, label %for.end, label %for.body			br i1 %exitcond.not, label %for.end, label %for.body

	for.end:			for.end:
	ret float %add			ret float %add
	}			}


	; CHECK-VF4: Found an estimated cost of 14 for VF 4 For instruction: %add = fadd double %0, %sum.07			; CHECK-VF4: Found an estimated cost of 12 for VF 4 For instruction: %add = fadd double %0, %sum.07
	; CHECK-VF8: Found an estimated cost of 28 for VF 8 For instruction: %add = fadd double %0, %sum.07			; CHECK-VF8: Found an estimated cost of 24 for VF 8 For instruction: %add = fadd double %0, %sum.07

	define double @fadd_strict64(ptr noalias nocapture readonly %a, i64 %n) {			define double @fadd_strict64(ptr noalias nocapture readonly %a, i64 %n) {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]			%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
	%sum.07 = phi double [ 0.000000e+00, %entry ], [ %add, %for.body ]			%sum.07 = phi double [ 0.000000e+00, %entry ], [ %add, %for.body ]
	%arrayidx = getelementptr inbounds double, ptr %a, i64 %iv			%arrayidx = getelementptr inbounds double, ptr %a, i64 %iv
	%0 = load double, ptr %arrayidx, align 4			%0 = load double, ptr %arrayidx, align 4
	%add = fadd double %0, %sum.07			%add = fadd double %0, %sum.07
	%iv.next = add nuw nsw i64 %iv, 1			%iv.next = add nuw nsw i64 %iv, 1
	%exitcond.not = icmp eq i64 %iv.next, %n			%exitcond.not = icmp eq i64 %iv.next, %n
	br i1 %exitcond.not, label %for.end, label %for.body			br i1 %exitcond.not, label %for.end, label %for.body

	for.end:			for.end:
	ret double %add			ret double %add
	}			}

	; CHECK-VF4: Found an estimated cost of 19 for VF 4 For instruction: %muladd = tail call float @llvm.fmuladd.f32(float %0, float %1, float %sum.07)			; CHECK-VF4: Found an estimated cost of 16 for VF 4 For instruction: %muladd = tail call float @llvm.fmuladd.f32(float %0, float %1, float %sum.07)
	; CHECK-VF8: Found an estimated cost of 38 for VF 8 For instruction: %muladd = tail call float @llvm.fmuladd.f32(float %0, float %1, float %sum.07)			; CHECK-VF8: Found an estimated cost of 32 for VF 8 For instruction: %muladd = tail call float @llvm.fmuladd.f32(float %0, float %1, float %sum.07)
				efriedmaUnsubmitted Not Done Reply Inline Actions This cost modeling is weird. efriedma: This cost modeling is weird.
				dmgreenAuthorUnsubmitted Done Reply Inline Actions This is an in-order reduction. dmgreen: This is an in-order reduction.
				efriedmaUnsubmitted Not Done Reply Inline Actions Then why does it cost 1 at VF 4? efriedma: Then why does it cost 1 at VF 4?
				dmgreenAuthorUnsubmitted Done Reply Inline Actions It, for some reason, prints the costs twice and these are matching two different things. There is some details in 2e0bf67df1437cb0156d7f5dd9e1b701749f96ca. I'll rebased on the updated costs. dmgreen: It, for some reason, prints the costs twice and these are matching two different things. There…

	define float @fmuladd_strict32(ptr %a, ptr %b, i64 %n) {			define float @fmuladd_strict32(ptr %a, ptr %b, i64 %n) {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]			%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
	%sum.07 = phi float [ 0.000000e+00, %entry ], [ %muladd, %for.body ]			%sum.07 = phi float [ 0.000000e+00, %entry ], [ %muladd, %for.body ]
	%arrayidx = getelementptr inbounds float, ptr %a, i64 %iv			%arrayidx = getelementptr inbounds float, ptr %a, i64 %iv
	%0 = load float, ptr %arrayidx, align 4			%0 = load float, ptr %arrayidx, align 4
	%arrayidx2 = getelementptr inbounds float, ptr %b, i64 %iv			%arrayidx2 = getelementptr inbounds float, ptr %b, i64 %iv
	%1 = load float, ptr %arrayidx2, align 4			%1 = load float, ptr %arrayidx2, align 4
	%muladd = tail call float @llvm.fmuladd.f32(float %0, float %1, float %sum.07)			%muladd = tail call float @llvm.fmuladd.f32(float %0, float %1, float %sum.07)
	%iv.next = add nuw nsw i64 %iv, 1			%iv.next = add nuw nsw i64 %iv, 1
	%exitcond.not = icmp eq i64 %iv.next, %n			%exitcond.not = icmp eq i64 %iv.next, %n
	br i1 %exitcond.not, label %for.end, label %for.body			br i1 %exitcond.not, label %for.end, label %for.body

	for.end:			for.end:
	ret float %muladd			ret float %muladd
	}			}

	declare float @llvm.fmuladd.f32(float, float, float)			declare float @llvm.fmuladd.f32(float, float, float)

	; CHECK-VF4: Found an estimated cost of 18 for VF 4 For instruction: %muladd = tail call double @llvm.fmuladd.f64(double %0, double %1, double %sum.07)			; CHECK-VF4: Found an estimated cost of 16 for VF 4 For instruction: %muladd = tail call double @llvm.fmuladd.f64(double %0, double %1, double %sum.07)
	; CHECK-VF8: Found an estimated cost of 36 for VF 8 For instruction: %muladd = tail call double @llvm.fmuladd.f64(double %0, double %1, double %sum.07)			; CHECK-VF8: Found an estimated cost of 32 for VF 8 For instruction: %muladd = tail call double @llvm.fmuladd.f64(double %0, double %1, double %sum.07)

	define double @fmuladd_strict64(ptr %a, ptr %b, i64 %n) {			define double @fmuladd_strict64(ptr %a, ptr %b, i64 %n) {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]			%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
	%sum.07 = phi double [ 0.000000e+00, %entry ], [ %muladd, %for.body ]			%sum.07 = phi double [ 0.000000e+00, %entry ], [ %muladd, %for.body ]
	Show All 14 Lines

llvm/test/Transforms/LoopVectorize/AArch64/unsafe-vf-hint-remark.ll

	; REQUIRES: asserts			; REQUIRES: asserts
	; RUN: opt -passes=loop-vectorize -mtriple=arm64-apple-iphoneos -pass-remarks-analysis=loop-vectorize -debug-only=loop-vectorize -S < %s 2>&1 \| FileCheck %s			; RUN: opt -passes=loop-vectorize -mtriple=arm64-apple-iphoneos -pass-remarks-analysis=loop-vectorize -debug-only=loop-vectorize -S < %s 2>&1 \| FileCheck %s

	; Specify a large unsafe vectorization factor of 32 that gets clamped to 16,			; Specify a large unsafe vectorization factor of 32 that gets clamped to 16.
	; then test an even smaller VF of 2 is selected based on the cost-model.

	; CHECK: LV: User VF=32 is unsafe, clamping to max safe VF=16.			; CHECK: LV: User VF=32 is unsafe, clamping to max safe VF=16.
	; CHECK: remark: <unknown>:0:0: User-specified vectorization factor 32 is unsafe, clamping to maximum safe vectorization factor 16			; CHECK: remark: <unknown>:0:0: User-specified vectorization factor 32 is unsafe, clamping to maximum safe vectorization factor 16
	; CHECK: LV: Selecting VF: 2.			; CHECK: LV: Selecting VF: 16.
	; CHECK-LABEL: @test			; CHECK-LABEL: @test
	; CHECK: <2 x i64>			; CHECK: <16 x i64>
	define void @test(ptr nocapture %a, ptr nocapture readonly %b) {			define void @test(ptr nocapture %a, ptr nocapture readonly %b) {
	entry:			entry:
	br label %loop.header			br label %loop.header

	loop.header:			loop.header:
	%iv = phi i64 [ 0, %entry ], [ %iv.next, %latch ]			%iv = phi i64 [ 0, %entry ], [ %iv.next, %latch ]
	%arrayidx = getelementptr inbounds i64, ptr %a, i64 %iv			%arrayidx = getelementptr inbounds i64, ptr %a, i64 %iv
	%0 = load i64, ptr %arrayidx, align 4			%0 = load i64, ptr %arrayidx, align 4
	Show All 24 Lines

llvm/test/Transforms/LowerMatrixIntrinsics/dot-product-float.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; REQUIRES: aarch64-registered-target		; REQUIRES: aarch64-registered-target
; RUN: opt -passes='lower-matrix-intrinsics' -mtriple=arm64-apple-iphoneos -S < %s \| FileCheck %s		; RUN: opt -passes='lower-matrix-intrinsics' -mtriple=arm64-apple-iphoneos -S < %s \| FileCheck %s

define <1 x float> @dotproduct_float_v6(<6 x float> %a, <6 x float> %b) {		define <1 x float> @dotproduct_float_v6(<6 x float> %a, <6 x float> %b) {
; CHECK-LABEL: @dotproduct_float_v6(		; CHECK-LABEL: @dotproduct_float_v6(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[SPLIT:%.]] = shufflevector <6 x float> [[A:%.]], <6 x float> poison, <1 x i32> zeroinitializer		; CHECK-NEXT: [[TMP0:%.]] = fmul <6 x float> [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[SPLIT1:%.*]] = shufflevector <6 x float> [[A]], <6 x float> poison, <1 x i32> <i32 1>		; CHECK-NEXT: [[TMP1:%.*]] = call fast float @llvm.vector.reduce.fadd.v6f32(float 0.000000e+00, <6 x float> [[TMP0]])
; CHECK-NEXT: [[SPLIT2:%.*]] = shufflevector <6 x float> [[A]], <6 x float> poison, <1 x i32> <i32 2>		; CHECK-NEXT: [[TMP2:%.*]] = insertelement <1 x float> poison, float [[TMP1]], i64 0
; CHECK-NEXT: [[SPLIT3:%.*]] = shufflevector <6 x float> [[A]], <6 x float> poison, <1 x i32> <i32 3>		; CHECK-NEXT: ret <1 x float> [[TMP2]]
; CHECK-NEXT: [[SPLIT4:%.*]] = shufflevector <6 x float> [[A]], <6 x float> poison, <1 x i32> <i32 4>
; CHECK-NEXT: [[SPLIT5:%.*]] = shufflevector <6 x float> [[A]], <6 x float> poison, <1 x i32> <i32 5>
; CHECK-NEXT: [[SPLIT6:%.]] = shufflevector <6 x float> [[B:%.]], <6 x float> poison, <6 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5>
; CHECK-NEXT: [[BLOCK:%.*]] = shufflevector <1 x float> [[SPLIT]], <1 x float> poison, <1 x i32> zeroinitializer
; CHECK-NEXT: [[TMP0:%.*]] = extractelement <6 x float> [[SPLIT6]], i64 0
; CHECK-NEXT: [[SPLAT_SPLATINSERT:%.*]] = insertelement <1 x float> poison, float [[TMP0]], i64 0
; CHECK-NEXT: [[SPLAT_SPLAT:%.*]] = shufflevector <1 x float> [[SPLAT_SPLATINSERT]], <1 x float> poison, <1 x i32> zeroinitializer
; CHECK-NEXT: [[TMP1:%.*]] = fmul fast <1 x float> [[BLOCK]], [[SPLAT_SPLAT]]
; CHECK-NEXT: [[BLOCK7:%.*]] = shufflevector <1 x float> [[SPLIT1]], <1 x float> poison, <1 x i32> zeroinitializer
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <6 x float> [[SPLIT6]], i64 1
; CHECK-NEXT: [[SPLAT_SPLATINSERT8:%.*]] = insertelement <1 x float> poison, float [[TMP2]], i64 0
; CHECK-NEXT: [[SPLAT_SPLAT9:%.*]] = shufflevector <1 x float> [[SPLAT_SPLATINSERT8]], <1 x float> poison, <1 x i32> zeroinitializer
; CHECK-NEXT: [[TMP3:%.*]] = call fast <1 x float> @llvm.fmuladd.v1f32(<1 x float> [[BLOCK7]], <1 x float> [[SPLAT_SPLAT9]], <1 x float> [[TMP1]])
; CHECK-NEXT: [[BLOCK10:%.*]] = shufflevector <1 x float> [[SPLIT2]], <1 x float> poison, <1 x i32> zeroinitializer
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <6 x float> [[SPLIT6]], i64 2
; CHECK-NEXT: [[SPLAT_SPLATINSERT11:%.*]] = insertelement <1 x float> poison, float [[TMP4]], i64 0
; CHECK-NEXT: [[SPLAT_SPLAT12:%.*]] = shufflevector <1 x float> [[SPLAT_SPLATINSERT11]], <1 x float> poison, <1 x i32> zeroinitializer
; CHECK-NEXT: [[TMP5:%.*]] = call fast <1 x float> @llvm.fmuladd.v1f32(<1 x float> [[BLOCK10]], <1 x float> [[SPLAT_SPLAT12]], <1 x float> [[TMP3]])
; CHECK-NEXT: [[BLOCK13:%.*]] = shufflevector <1 x float> [[SPLIT3]], <1 x float> poison, <1 x i32> zeroinitializer
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <6 x float> [[SPLIT6]], i64 3
; CHECK-NEXT: [[SPLAT_SPLATINSERT14:%.*]] = insertelement <1 x float> poison, float [[TMP6]], i64 0
; CHECK-NEXT: [[SPLAT_SPLAT15:%.*]] = shufflevector <1 x float> [[SPLAT_SPLATINSERT14]], <1 x float> poison, <1 x i32> zeroinitializer
; CHECK-NEXT: [[TMP7:%.*]] = call fast <1 x float> @llvm.fmuladd.v1f32(<1 x float> [[BLOCK13]], <1 x float> [[SPLAT_SPLAT15]], <1 x float> [[TMP5]])
; CHECK-NEXT: [[BLOCK16:%.*]] = shufflevector <1 x float> [[SPLIT4]], <1 x float> poison, <1 x i32> zeroinitializer
; CHECK-NEXT: [[TMP8:%.*]] = extractelement <6 x float> [[SPLIT6]], i64 4
; CHECK-NEXT: [[SPLAT_SPLATINSERT17:%.*]] = insertelement <1 x float> poison, float [[TMP8]], i64 0
; CHECK-NEXT: [[SPLAT_SPLAT18:%.*]] = shufflevector <1 x float> [[SPLAT_SPLATINSERT17]], <1 x float> poison, <1 x i32> zeroinitializer
; CHECK-NEXT: [[TMP9:%.*]] = call fast <1 x float> @llvm.fmuladd.v1f32(<1 x float> [[BLOCK16]], <1 x float> [[SPLAT_SPLAT18]], <1 x float> [[TMP7]])
; CHECK-NEXT: [[BLOCK19:%.*]] = shufflevector <1 x float> [[SPLIT5]], <1 x float> poison, <1 x i32> zeroinitializer
; CHECK-NEXT: [[TMP10:%.*]] = extractelement <6 x float> [[SPLIT6]], i64 5
; CHECK-NEXT: [[SPLAT_SPLATINSERT20:%.*]] = insertelement <1 x float> poison, float [[TMP10]], i64 0
; CHECK-NEXT: [[SPLAT_SPLAT21:%.*]] = shufflevector <1 x float> [[SPLAT_SPLATINSERT20]], <1 x float> poison, <1 x i32> zeroinitializer
; CHECK-NEXT: [[TMP11:%.*]] = call fast <1 x float> @llvm.fmuladd.v1f32(<1 x float> [[BLOCK19]], <1 x float> [[SPLAT_SPLAT21]], <1 x float> [[TMP9]])
; CHECK-NEXT: [[TMP12:%.*]] = shufflevector <1 x float> [[TMP11]], <1 x float> poison, <1 x i32> zeroinitializer
; CHECK-NEXT: [[TMP13:%.*]] = shufflevector <1 x float> poison, <1 x float> [[TMP12]], <1 x i32> <i32 1>
; CHECK-NEXT: ret <1 x float> [[TMP13]]
;		;
entry:		entry:
%c = tail call fast <1 x float> @llvm.matrix.multiply.v1f32.v6f32.v6f32(<6 x float> %a, <6 x float> %b, i32 1, i32 6, i32 1)		%c = tail call fast <1 x float> @llvm.matrix.multiply.v1f32.v6f32.v6f32(<6 x float> %a, <6 x float> %b, i32 1, i32 6, i32 1)
ret <1 x float> %c		ret <1 x float> %c
}		}

declare <1 x float> @llvm.matrix.multiply.v1f32.v6f32.v6f32(<6 x float>, <6 x float>, i32, i32, i32)		declare <1 x float> @llvm.matrix.multiply.v1f32.v6f32.v6f32(<6 x float>, <6 x float>, i32, i32, i32)

▲ Show 20 Lines • Show All 114 Lines • ▼ Show 20 Lines	entry:
ret <1 x double> %c		ret <1 x double> %c
}		}

declare <1 x double> @llvm.matrix.multiply.v1f64.v6f64.v6f64(<6 x double>, <6 x double>, i32, i32, i32)		declare <1 x double> @llvm.matrix.multiply.v1f64.v6f64.v6f64(<6 x double>, <6 x double>, i32, i32, i32)

define <1 x double> @intrinsic_column_major_load_dot_product_double_v6(ptr %lhs_address, ptr %rhs_address) {		define <1 x double> @intrinsic_column_major_load_dot_product_double_v6(ptr %lhs_address, ptr %rhs_address) {
; CHECK-LABEL: @intrinsic_column_major_load_dot_product_double_v6(		; CHECK-LABEL: @intrinsic_column_major_load_dot_product_double_v6(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[COL_LOAD:%.]] = load <1 x double>, ptr [[LHS_ADDRESS:%.]], align 4		; CHECK-NEXT: [[COL_LOAD:%.]] = load <6 x double>, ptr [[RHS_ADDRESS:%.]], align 4
; CHECK-NEXT: [[VEC_GEP:%.*]] = getelementptr double, ptr [[LHS_ADDRESS]], i64 1		; CHECK-NEXT: [[TMP0:%.]] = load <6 x double>, ptr [[LHS_ADDRESS:%.]], align 64
; CHECK-NEXT: [[COL_LOAD1:%.*]] = load <1 x double>, ptr [[VEC_GEP]], align 4		; CHECK-NEXT: [[TMP1:%.*]] = fmul <6 x double> [[TMP0]], [[COL_LOAD]]
; CHECK-NEXT: [[VEC_GEP2:%.*]] = getelementptr double, ptr [[LHS_ADDRESS]], i64 2		; CHECK-NEXT: [[TMP2:%.*]] = call fast double @llvm.vector.reduce.fadd.v6f64(double 0.000000e+00, <6 x double> [[TMP1]])
; CHECK-NEXT: [[COL_LOAD3:%.*]] = load <1 x double>, ptr [[VEC_GEP2]], align 4		; CHECK-NEXT: [[TMP3:%.*]] = insertelement <1 x double> poison, double [[TMP2]], i64 0
; CHECK-NEXT: [[VEC_GEP4:%.*]] = getelementptr double, ptr [[LHS_ADDRESS]], i64 3		; CHECK-NEXT: ret <1 x double> [[TMP3]]
; CHECK-NEXT: [[COL_LOAD5:%.*]] = load <1 x double>, ptr [[VEC_GEP4]], align 4
; CHECK-NEXT: [[VEC_GEP6:%.*]] = getelementptr double, ptr [[LHS_ADDRESS]], i64 4
; CHECK-NEXT: [[COL_LOAD7:%.*]] = load <1 x double>, ptr [[VEC_GEP6]], align 4
; CHECK-NEXT: [[VEC_GEP8:%.*]] = getelementptr double, ptr [[LHS_ADDRESS]], i64 5
; CHECK-NEXT: [[COL_LOAD9:%.*]] = load <1 x double>, ptr [[VEC_GEP8]], align 4
; CHECK-NEXT: [[COL_LOAD10:%.]] = load <6 x double>, ptr [[RHS_ADDRESS:%.]], align 4
; CHECK-NEXT: [[BLOCK:%.*]] = shufflevector <1 x double> [[COL_LOAD]], <1 x double> poison, <1 x i32> zeroinitializer
; CHECK-NEXT: [[TMP0:%.*]] = extractelement <6 x double> [[COL_LOAD10]], i64 0
; CHECK-NEXT: [[SPLAT_SPLATINSERT:%.*]] = insertelement <1 x double> poison, double [[TMP0]], i64 0
; CHECK-NEXT: [[SPLAT_SPLAT:%.*]] = shufflevector <1 x double> [[SPLAT_SPLATINSERT]], <1 x double> poison, <1 x i32> zeroinitializer
; CHECK-NEXT: [[TMP1:%.*]] = fmul fast <1 x double> [[BLOCK]], [[SPLAT_SPLAT]]
; CHECK-NEXT: [[BLOCK11:%.*]] = shufflevector <1 x double> [[COL_LOAD1]], <1 x double> poison, <1 x i32> zeroinitializer
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <6 x double> [[COL_LOAD10]], i64 1
; CHECK-NEXT: [[SPLAT_SPLATINSERT12:%.*]] = insertelement <1 x double> poison, double [[TMP2]], i64 0
; CHECK-NEXT: [[SPLAT_SPLAT13:%.*]] = shufflevector <1 x double> [[SPLAT_SPLATINSERT12]], <1 x double> poison, <1 x i32> zeroinitializer
; CHECK-NEXT: [[TMP3:%.*]] = call fast <1 x double> @llvm.fmuladd.v1f64(<1 x double> [[BLOCK11]], <1 x double> [[SPLAT_SPLAT13]], <1 x double> [[TMP1]])
; CHECK-NEXT: [[BLOCK14:%.*]] = shufflevector <1 x double> [[COL_LOAD3]], <1 x double> poison, <1 x i32> zeroinitializer
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <6 x double> [[COL_LOAD10]], i64 2
; CHECK-NEXT: [[SPLAT_SPLATINSERT15:%.*]] = insertelement <1 x double> poison, double [[TMP4]], i64 0
; CHECK-NEXT: [[SPLAT_SPLAT16:%.*]] = shufflevector <1 x double> [[SPLAT_SPLATINSERT15]], <1 x double> poison, <1 x i32> zeroinitializer
; CHECK-NEXT: [[TMP5:%.*]] = call fast <1 x double> @llvm.fmuladd.v1f64(<1 x double> [[BLOCK14]], <1 x double> [[SPLAT_SPLAT16]], <1 x double> [[TMP3]])
; CHECK-NEXT: [[BLOCK17:%.*]] = shufflevector <1 x double> [[COL_LOAD5]], <1 x double> poison, <1 x i32> zeroinitializer
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <6 x double> [[COL_LOAD10]], i64 3
; CHECK-NEXT: [[SPLAT_SPLATINSERT18:%.*]] = insertelement <1 x double> poison, double [[TMP6]], i64 0
; CHECK-NEXT: [[SPLAT_SPLAT19:%.*]] = shufflevector <1 x double> [[SPLAT_SPLATINSERT18]], <1 x double> poison, <1 x i32> zeroinitializer
; CHECK-NEXT: [[TMP7:%.*]] = call fast <1 x double> @llvm.fmuladd.v1f64(<1 x double> [[BLOCK17]], <1 x double> [[SPLAT_SPLAT19]], <1 x double> [[TMP5]])
; CHECK-NEXT: [[BLOCK20:%.*]] = shufflevector <1 x double> [[COL_LOAD7]], <1 x double> poison, <1 x i32> zeroinitializer
; CHECK-NEXT: [[TMP8:%.*]] = extractelement <6 x double> [[COL_LOAD10]], i64 4
; CHECK-NEXT: [[SPLAT_SPLATINSERT21:%.*]] = insertelement <1 x double> poison, double [[TMP8]], i64 0
; CHECK-NEXT: [[SPLAT_SPLAT22:%.*]] = shufflevector <1 x double> [[SPLAT_SPLATINSERT21]], <1 x double> poison, <1 x i32> zeroinitializer
; CHECK-NEXT: [[TMP9:%.*]] = call fast <1 x double> @llvm.fmuladd.v1f64(<1 x double> [[BLOCK20]], <1 x double> [[SPLAT_SPLAT22]], <1 x double> [[TMP7]])
; CHECK-NEXT: [[BLOCK23:%.*]] = shufflevector <1 x double> [[COL_LOAD9]], <1 x double> poison, <1 x i32> zeroinitializer
; CHECK-NEXT: [[TMP10:%.*]] = extractelement <6 x double> [[COL_LOAD10]], i64 5
; CHECK-NEXT: [[SPLAT_SPLATINSERT24:%.*]] = insertelement <1 x double> poison, double [[TMP10]], i64 0
; CHECK-NEXT: [[SPLAT_SPLAT25:%.*]] = shufflevector <1 x double> [[SPLAT_SPLATINSERT24]], <1 x double> poison, <1 x i32> zeroinitializer
; CHECK-NEXT: [[TMP11:%.*]] = call fast <1 x double> @llvm.fmuladd.v1f64(<1 x double> [[BLOCK23]], <1 x double> [[SPLAT_SPLAT25]], <1 x double> [[TMP9]])
; CHECK-NEXT: [[TMP12:%.*]] = shufflevector <1 x double> [[TMP11]], <1 x double> poison, <1 x i32> zeroinitializer
; CHECK-NEXT: [[TMP13:%.*]] = shufflevector <1 x double> poison, <1 x double> [[TMP12]], <1 x i32> <i32 1>
; CHECK-NEXT: ret <1 x double> [[TMP13]]
;		;
entry:		entry:
%lhs = tail call fast <6 x double> @llvm.matrix.column.major.load.v6f64.i64(ptr nonnull align 4 %lhs_address, i64 1, i1 false, i32 1, i32 6)		%lhs = tail call fast <6 x double> @llvm.matrix.column.major.load.v6f64.i64(ptr nonnull align 4 %lhs_address, i64 1, i1 false, i32 1, i32 6)
%rhs = tail call fast <6 x double> @llvm.matrix.column.major.load.v6f64.i64(ptr nonnull align 4 %rhs_address, i64 6, i1 false, i32 6, i32 1)		%rhs = tail call fast <6 x double> @llvm.matrix.column.major.load.v6f64.i64(ptr nonnull align 4 %rhs_address, i64 6, i1 false, i32 6, i32 1)
%result = tail call fast <1 x double> @llvm.matrix.multiply.v1f64.v6f64.v6f64(<6 x double> %lhs, <6 x double> %rhs, i32 1, i32 6, i32 1)		%result = tail call fast <1 x double> @llvm.matrix.multiply.v1f64.v6f64.v6f64(<6 x double> %lhs, <6 x double> %rhs, i32 1, i32 6, i32 1)
ret <1 x double> %result		ret <1 x double> %result
}		}

Show All 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/ext-trunc.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -S -passes=slp-vectorizer -mtriple=aarch64--linux-gnu < %s \| FileCheck %s			; RUN: opt -S -passes=slp-vectorizer -mtriple=aarch64--linux-gnu < %s \| FileCheck %s

	target datalayout = "e-m:e-i32:64-i128:128-n32:64-S128"			target datalayout = "e-m:e-i32:64-i128:128-n32:64-S128"

	declare void @foo(i64, i64, i64, i64)			declare void @foo(i64, i64, i64, i64)

	define void @test1(<4 x i16> %a, <4 x i16> %b, ptr %p) {			define void @test1(<4 x i16> %a, <4 x i16> %b, ptr %p) {
	; Make sure types of sub and its sources are not extended.			; Make sure types of sub and its sources are not extended.
	; CHECK-LABEL: @test1(			; CHECK-LABEL: @test1(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[Z0:%.]] = zext <4 x i16> [[A:%.]] to <4 x i32>			; CHECK-NEXT: [[Z0:%.]] = zext <4 x i16> [[A:%.]] to <4 x i32>
	; CHECK-NEXT: [[Z1:%.]] = zext <4 x i16> [[B:%.]] to <4 x i32>			; CHECK-NEXT: [[Z1:%.]] = zext <4 x i16> [[B:%.]] to <4 x i32>
	; CHECK-NEXT: [[SUB0:%.*]] = sub <4 x i32> [[Z0]], [[Z1]]			; CHECK-NEXT: [[SUB0:%.*]] = sub <4 x i32> [[Z0]], [[Z1]]
	; CHECK-NEXT: [[TMP0:%.*]] = sext <4 x i32> [[SUB0]] to <4 x i64>			; CHECK-NEXT: [[E0:%.*]] = extractelement <4 x i32> [[SUB0]], i32 0
	; CHECK-NEXT: [[TMP1:%.*]] = extractelement <4 x i64> [[TMP0]], i32 0			; CHECK-NEXT: [[S0:%.*]] = sext i32 [[E0]] to i64
	; CHECK-NEXT: [[GEP0:%.]] = getelementptr inbounds i64, ptr [[P:%.]], i64 [[TMP1]]			; CHECK-NEXT: [[GEP0:%.]] = getelementptr inbounds i64, ptr [[P:%.]], i64 [[S0]]
	; CHECK-NEXT: [[LOAD0:%.*]] = load i64, ptr [[GEP0]], align 4			; CHECK-NEXT: [[LOAD0:%.*]] = load i64, ptr [[GEP0]], align 4
	; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x i64> [[TMP0]], i32 1			; CHECK-NEXT: [[TMP0:%.*]] = shufflevector <4 x i32> [[SUB0]], <4 x i32> poison, <2 x i32> <i32 1, i32 2>
				; CHECK-NEXT: [[TMP1:%.*]] = sext <2 x i32> [[TMP0]] to <2 x i64>
				; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x i64> [[TMP1]], i32 0
	; CHECK-NEXT: [[GEP1:%.*]] = getelementptr inbounds i64, ptr [[P]], i64 [[TMP2]]			; CHECK-NEXT: [[GEP1:%.*]] = getelementptr inbounds i64, ptr [[P]], i64 [[TMP2]]
	; CHECK-NEXT: [[LOAD1:%.*]] = load i64, ptr [[GEP1]], align 4			; CHECK-NEXT: [[LOAD1:%.*]] = load i64, ptr [[GEP1]], align 4
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x i64> [[TMP0]], i32 2			; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x i64> [[TMP1]], i32 1
	; CHECK-NEXT: [[GEP2:%.*]] = getelementptr inbounds i64, ptr [[P]], i64 [[TMP3]]			; CHECK-NEXT: [[GEP2:%.*]] = getelementptr inbounds i64, ptr [[P]], i64 [[TMP3]]
	; CHECK-NEXT: [[LOAD2:%.*]] = load i64, ptr [[GEP2]], align 4			; CHECK-NEXT: [[LOAD2:%.*]] = load i64, ptr [[GEP2]], align 4
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP0]], i32 3			; CHECK-NEXT: [[E3:%.*]] = extractelement <4 x i32> [[SUB0]], i32 3
	; CHECK-NEXT: [[GEP3:%.*]] = getelementptr inbounds i64, ptr [[P]], i64 [[TMP4]]			; CHECK-NEXT: [[S3:%.*]] = sext i32 [[E3]] to i64
				; CHECK-NEXT: [[GEP3:%.*]] = getelementptr inbounds i64, ptr [[P]], i64 [[S3]]
	; CHECK-NEXT: [[LOAD3:%.*]] = load i64, ptr [[GEP3]], align 4			; CHECK-NEXT: [[LOAD3:%.*]] = load i64, ptr [[GEP3]], align 4
	; CHECK-NEXT: call void @foo(i64 [[LOAD0]], i64 [[LOAD1]], i64 [[LOAD2]], i64 [[LOAD3]])			; CHECK-NEXT: call void @foo(i64 [[LOAD0]], i64 [[LOAD1]], i64 [[LOAD2]], i64 [[LOAD3]])
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%z0 = zext <4 x i16> %a to <4 x i32>			%z0 = zext <4 x i16> %a to <4 x i32>
	%z1 = zext <4 x i16> %b to <4 x i32>			%z1 = zext <4 x i16> %b to <4 x i32>
	%sub0 = sub <4 x i32> %z0, %z1			%sub0 = sub <4 x i32> %z0, %z1
	Show All 18 Lines
	}			}

	define void @test2(<4 x i16> %a, <4 x i16> %b, i64 %c0, i64 %c1, i64 %c2, i64 %c3, ptr %p) {			define void @test2(<4 x i16> %a, <4 x i16> %b, i64 %c0, i64 %c1, i64 %c2, i64 %c3, ptr %p) {
	; CHECK-LABEL: @test2(			; CHECK-LABEL: @test2(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[Z0:%.]] = zext <4 x i16> [[A:%.]] to <4 x i32>			; CHECK-NEXT: [[Z0:%.]] = zext <4 x i16> [[A:%.]] to <4 x i32>
	; CHECK-NEXT: [[Z1:%.]] = zext <4 x i16> [[B:%.]] to <4 x i32>			; CHECK-NEXT: [[Z1:%.]] = zext <4 x i16> [[B:%.]] to <4 x i32>
	; CHECK-NEXT: [[SUB0:%.*]] = sub <4 x i32> [[Z0]], [[Z1]]			; CHECK-NEXT: [[SUB0:%.*]] = sub <4 x i32> [[Z0]], [[Z1]]
	; CHECK-NEXT: [[TMP0:%.*]] = sext <4 x i32> [[SUB0]] to <4 x i64>			; CHECK-NEXT: [[E0:%.*]] = extractelement <4 x i32> [[SUB0]], i32 0
	; CHECK-NEXT: [[TMP1:%.]] = insertelement <4 x i64> poison, i64 [[C0:%.]], i32 0			; CHECK-NEXT: [[S0:%.*]] = sext i32 [[E0]] to i64
	; CHECK-NEXT: [[TMP2:%.]] = insertelement <4 x i64> [[TMP1]], i64 [[C1:%.]], i32 1			; CHECK-NEXT: [[A0:%.]] = add i64 [[S0]], [[C0:%.]]
	; CHECK-NEXT: [[TMP3:%.]] = insertelement <4 x i64> [[TMP2]], i64 [[C2:%.]], i32 2			; CHECK-NEXT: [[GEP0:%.]] = getelementptr inbounds i64, ptr [[P:%.]], i64 [[A0]]
	; CHECK-NEXT: [[TMP4:%.]] = insertelement <4 x i64> [[TMP3]], i64 [[C3:%.]], i32 3
	; CHECK-NEXT: [[TMP5:%.*]] = add <4 x i64> [[TMP0]], [[TMP4]]
	; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP5]], i32 0
	; CHECK-NEXT: [[GEP0:%.]] = getelementptr inbounds i64, ptr [[P:%.]], i64 [[TMP6]]
	; CHECK-NEXT: [[LOAD0:%.*]] = load i64, ptr [[GEP0]], align 4			; CHECK-NEXT: [[LOAD0:%.*]] = load i64, ptr [[GEP0]], align 4
	; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP5]], i32 1			; CHECK-NEXT: [[E1:%.*]] = extractelement <4 x i32> [[SUB0]], i32 1
	; CHECK-NEXT: [[GEP1:%.*]] = getelementptr inbounds i64, ptr [[P]], i64 [[TMP7]]			; CHECK-NEXT: [[S1:%.*]] = sext i32 [[E1]] to i64
				; CHECK-NEXT: [[A1:%.]] = add i64 [[S1]], [[C1:%.]]
				; CHECK-NEXT: [[GEP1:%.*]] = getelementptr inbounds i64, ptr [[P]], i64 [[A1]]
	; CHECK-NEXT: [[LOAD1:%.*]] = load i64, ptr [[GEP1]], align 4			; CHECK-NEXT: [[LOAD1:%.*]] = load i64, ptr [[GEP1]], align 4
	; CHECK-NEXT: [[TMP8:%.*]] = extractelement <4 x i64> [[TMP5]], i32 2			; CHECK-NEXT: [[E2:%.*]] = extractelement <4 x i32> [[SUB0]], i32 2
	; CHECK-NEXT: [[GEP2:%.*]] = getelementptr inbounds i64, ptr [[P]], i64 [[TMP8]]			; CHECK-NEXT: [[S2:%.*]] = sext i32 [[E2]] to i64
				; CHECK-NEXT: [[A2:%.]] = add i64 [[S2]], [[C2:%.]]
				; CHECK-NEXT: [[GEP2:%.*]] = getelementptr inbounds i64, ptr [[P]], i64 [[A2]]
	; CHECK-NEXT: [[LOAD2:%.*]] = load i64, ptr [[GEP2]], align 4			; CHECK-NEXT: [[LOAD2:%.*]] = load i64, ptr [[GEP2]], align 4
	; CHECK-NEXT: [[TMP9:%.*]] = extractelement <4 x i64> [[TMP5]], i32 3			; CHECK-NEXT: [[E3:%.*]] = extractelement <4 x i32> [[SUB0]], i32 3
	; CHECK-NEXT: [[GEP3:%.*]] = getelementptr inbounds i64, ptr [[P]], i64 [[TMP9]]			; CHECK-NEXT: [[S3:%.*]] = sext i32 [[E3]] to i64
				; CHECK-NEXT: [[A3:%.]] = add i64 [[S3]], [[C3:%.]]
				; CHECK-NEXT: [[GEP3:%.*]] = getelementptr inbounds i64, ptr [[P]], i64 [[A3]]
	; CHECK-NEXT: [[LOAD3:%.*]] = load i64, ptr [[GEP3]], align 4			; CHECK-NEXT: [[LOAD3:%.*]] = load i64, ptr [[GEP3]], align 4
	; CHECK-NEXT: call void @foo(i64 [[LOAD0]], i64 [[LOAD1]], i64 [[LOAD2]], i64 [[LOAD3]])			; CHECK-NEXT: call void @foo(i64 [[LOAD0]], i64 [[LOAD1]], i64 [[LOAD2]], i64 [[LOAD3]])
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%z0 = zext <4 x i16> %a to <4 x i32>			%z0 = zext <4 x i16> %a to <4 x i32>
	%z1 = zext <4 x i16> %b to <4 x i32>			%z1 = zext <4 x i16> %b to <4 x i32>
	%sub0 = sub <4 x i32> %z0, %z1			%sub0 = sub <4 x i32> %z0, %z1
	Show All 23 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/gather-cost.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -S -passes=slp-vectorizer,instcombine -pass-remarks-output=%t \| FileCheck %s			; RUN: opt < %s -S -passes=slp-vectorizer,instcombine -pass-remarks-output=%t \| FileCheck %s
	; RUN: cat %t \| FileCheck -check-prefix=REMARK %s			; RUN: cat %t \| FileCheck -check-prefix=REMARK %s
	; RUN: opt < %s -S -aa-pipeline=basic-aa -passes='slp-vectorizer,instcombine' -pass-remarks-output=%t \| FileCheck %s			; RUN: opt < %s -S -aa-pipeline=basic-aa -passes='slp-vectorizer,instcombine' -pass-remarks-output=%t \| FileCheck %s
	; RUN: cat %t \| FileCheck -check-prefix=REMARK %s			; RUN: cat %t \| FileCheck -check-prefix=REMARK %s

	target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
	target triple = "aarch64--linux-gnu"			target triple = "aarch64--linux-gnu"

	; REMARK-LABEL: Function: gather_multiple_use			; REMARK-LABEL: Function: gather_multiple_use
	; REMARK: Args:			; REMARK: Args:
	; REMARK-NEXT: - String: 'Vectorized horizontal reduction with cost '			; REMARK-NEXT: - String: 'Vectorized horizontal reduction with cost '
	; REMARK-NEXT: - Cost: '-7'			; REMARK-NEXT: - Cost: '-8'
	;			;
	; REMARK-NOT: Function: gather_load			; REMARK-NOT: Function: gather_load

	define internal i32 @gather_multiple_use(i32 %a, i32 %b, i32 %c, i32 %d) {			define internal i32 @gather_multiple_use(i32 %a, i32 %b, i32 %c, i32 %d) {
	; CHECK-LABEL: @gather_multiple_use(			; CHECK-LABEL: @gather_multiple_use(
	; CHECK-NEXT: [[TMP1:%.]] = insertelement <4 x i32> poison, i32 [[C:%.]], i64 0			; CHECK-NEXT: [[TMP1:%.]] = insertelement <4 x i32> poison, i32 [[C:%.]], i64 0
	; CHECK-NEXT: [[TMP2:%.]] = insertelement <4 x i32> [[TMP1]], i32 [[A:%.]], i64 1			; CHECK-NEXT: [[TMP2:%.]] = insertelement <4 x i32> [[TMP1]], i32 [[A:%.]], i64 1
	; CHECK-NEXT: [[TMP3:%.]] = insertelement <4 x i32> [[TMP2]], i32 [[B:%.]], i64 2			; CHECK-NEXT: [[TMP3:%.]] = insertelement <4 x i32> [[TMP2]], i32 [[B:%.]], i64 2
	▲ Show 20 Lines • Show All 86 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/getelementptr.ll

	Show All 24 Lines

	; YAML-LABEL: Function: getelementptr_4x32			; YAML-LABEL: Function: getelementptr_4x32
	; YAML: --- !Passed			; YAML: --- !Passed
	; YAML-NEXT: Pass: slp-vectorizer			; YAML-NEXT: Pass: slp-vectorizer
	; YAML-NEXT: Name: VectorizedList			; YAML-NEXT: Name: VectorizedList
	; YAML-NEXT: Function: getelementptr_4x32			; YAML-NEXT: Function: getelementptr_4x32
	; YAML-NEXT: Args:			; YAML-NEXT: Args:
	; YAML-NEXT: - String: 'SLP vectorized with cost '			; YAML-NEXT: - String: 'SLP vectorized with cost '
	; YAML-NEXT: - Cost: '6'			; YAML-NEXT: - Cost: '4'
	; YAML-NEXT: - String: ' and with tree size '			; YAML-NEXT: - String: ' and with tree size '
	; YAML-NEXT: - TreeSize: '3'			; YAML-NEXT: - TreeSize: '3'

	; YAML: --- !Passed			; YAML: --- !Passed
	; YAML-NEXT: Pass: slp-vectorizer			; YAML-NEXT: Pass: slp-vectorizer
	; YAML-NEXT: Name: VectorizedList			; YAML-NEXT: Name: VectorizedList
	; YAML-NEXT: Function: getelementptr_4x32			; YAML-NEXT: Function: getelementptr_4x32
	; YAML-NEXT: Args:			; YAML-NEXT: Args:
	▲ Show 20 Lines • Show All 339 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/landing_pad.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -passes=slp-vectorizer,verify -slp-threshold=-99999 -mtriple=aarch64-unknown-linux -S -pass-remarks-output=%t \| FileCheck %s			; RUN: opt < %s -passes=slp-vectorizer,verify -slp-threshold=-99999 -mtriple=aarch64-unknown-linux -S -pass-remarks-output=%t \| FileCheck %s
	; RUN: FileCheck --input-file=%t --check-prefix=YAML %s			; RUN: FileCheck --input-file=%t --check-prefix=YAML %s

	; YAML-LABEL: --- !Passed			; YAML-LABEL: --- !Passed
	; YAML-NEXT: Pass: slp-vectorizer			; YAML-NEXT: Pass: slp-vectorizer
	; YAML-NEXT: Name: VectorizedList			; YAML-NEXT: Name: VectorizedList
	; YAML-NEXT: Function: foo			; YAML-NEXT: Function: foo
	; YAML-NEXT: Args:			; YAML-NEXT: Args:
	; YAML-NEXT: - String: 'SLP vectorized with cost '			; YAML-NEXT: - String: 'SLP vectorized with cost '
	; YAML-NEXT: - Cost: '3'			; YAML-NEXT: - Cost: '2'
	; YAML-NEXT: - String: ' and with tree size '			; YAML-NEXT: - String: ' and with tree size '
	; YAML-NEXT: - TreeSize: '2'			; YAML-NEXT: - TreeSize: '2'

	; YAML-LABEL: --- !Passed			; YAML-LABEL: --- !Passed
	; YAML-NEXT: Pass: slp-vectorizer			; YAML-NEXT: Pass: slp-vectorizer
	; YAML-NEXT: Name: VectorizedList			; YAML-NEXT: Name: VectorizedList
	; YAML-NEXT: Function: foo			; YAML-NEXT: Function: foo
	; YAML-NEXT: Args:			; YAML-NEXT: Args:
	; YAML-NEXT: - String: 'SLP vectorized with cost '			; YAML-NEXT: - String: 'SLP vectorized with cost '
	; YAML-NEXT: - Cost: '0'			; YAML-NEXT: - Cost: '0'
	; YAML-NEXT: - String: ' and with tree size '			; YAML-NEXT: - String: ' and with tree size '
	; YAML-NEXT: - TreeSize: '3'			; YAML-NEXT: - TreeSize: '3'

	; YAML-LABEL: --- !Passed			; YAML-LABEL: --- !Passed
	; YAML-NEXT: Pass: slp-vectorizer			; YAML-NEXT: Pass: slp-vectorizer
	; YAML-NEXT: Name: VectorizedList			; YAML-NEXT: Name: VectorizedList
	; YAML-NEXT: Function: foo			; YAML-NEXT: Function: foo
	; YAML-NEXT: Args:			; YAML-NEXT: Args:
	; YAML-NEXT: - String: 'SLP vectorized with cost '			; YAML-NEXT: - String: 'SLP vectorized with cost '
	; YAML-NEXT: - Cost: '1'			; YAML-NEXT: - Cost: '2'
	; YAML-NEXT: - String: ' and with tree size '			; YAML-NEXT: - String: ' and with tree size '
	; YAML-NEXT: - TreeSize: '9'			; YAML-NEXT: - TreeSize: '9'

	define void @foo() personality ptr @bar {			define void @foo() personality ptr @bar {
	; CHECK-LABEL: @foo(			; CHECK-LABEL: @foo(
	; CHECK-NEXT: bb1:			; CHECK-NEXT: bb1:
	; CHECK-NEXT: br label [[BB3:%.*]]			; CHECK-NEXT: br label [[BB3:%.*]]
	; CHECK: bb2.loopexit:			; CHECK: bb2.loopexit:
	▲ Show 20 Lines • Show All 105 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/matmul.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -passes=slp-vectorizer -S -mtriple=aarch64-unknown-unknown -mcpu=cortex-a53 \| FileCheck %s			; RUN: opt < %s -passes=slp-vectorizer -S -mtriple=aarch64-unknown-unknown -mcpu=cortex-a53 \| FileCheck %s

	target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"

	; This test is reduced from the matrix multiplication benchmark in the test-suite:			; This test is reduced from the matrix multiplication benchmark in the test-suite:
	; https://github.com/llvm/llvm-test-suite/tree/main/SingleSource/Benchmarks/Misc/matmul_f64_4x4.c			; https://github.com/llvm/llvm-test-suite/tree/main/SingleSource/Benchmarks/Misc/matmul_f64_4x4.c
	; The operations here are expected to be vectorized to <2 x double>.			; The operations here are expected to be vectorized to <2 x double>.
	; Otherwise, performance will suffer on Cortex-A53.			; Otherwise, performance will suffer on Cortex-A53.

	define void @wrap_mul4(ptr nocapture %Out, ptr nocapture readonly %A, ptr nocapture readonly %B) {			define void @wrap_mul4(ptr nocapture %Out, ptr nocapture readonly %A, ptr nocapture readonly %B) {
	; CHECK-LABEL: @wrap_mul4(			; CHECK-LABEL: @wrap_mul4(
	; CHECK-NEXT: [[TEMP:%.]] = load double, ptr [[A:%.]], align 8			; CHECK-NEXT: [[TEMP:%.]] = load double, ptr [[A:%.]], align 8
	; CHECK-NEXT: [[TEMP1:%.]] = load double, ptr [[B:%.]], align 8
	; CHECK-NEXT: [[MUL_I:%.*]] = fmul double [[TEMP]], [[TEMP1]]
	; CHECK-NEXT: [[ARRAYIDX5_I:%.*]] = getelementptr inbounds [2 x double], ptr [[A]], i64 0, i64 1			; CHECK-NEXT: [[ARRAYIDX5_I:%.*]] = getelementptr inbounds [2 x double], ptr [[A]], i64 0, i64 1
	; CHECK-NEXT: [[TEMP2:%.*]] = load double, ptr [[ARRAYIDX5_I]], align 8			; CHECK-NEXT: [[TEMP2:%.*]] = load double, ptr [[ARRAYIDX5_I]], align 8
	; CHECK-NEXT: [[ARRAYIDX7_I:%.*]] = getelementptr inbounds [4 x double], ptr [[B]], i64 1, i64 0			; CHECK-NEXT: [[ARRAYIDX7_I:%.]] = getelementptr inbounds [4 x double], ptr [[B:%.]], i64 1, i64 0
	; CHECK-NEXT: [[TEMP3:%.*]] = load double, ptr [[ARRAYIDX7_I]], align 8
	; CHECK-NEXT: [[MUL8_I:%.*]] = fmul double [[TEMP2]], [[TEMP3]]
	; CHECK-NEXT: [[ADD_I:%.*]] = fadd double [[MUL_I]], [[MUL8_I]]
	; CHECK-NEXT: [[ARRAYIDX13_I:%.*]] = getelementptr inbounds [4 x double], ptr [[B]], i64 0, i64 1
	; CHECK-NEXT: [[TEMP4:%.*]] = load double, ptr [[ARRAYIDX13_I]], align 8
	; CHECK-NEXT: [[MUL14_I:%.*]] = fmul double [[TEMP]], [[TEMP4]]
	; CHECK-NEXT: [[ARRAYIDX18_I:%.*]] = getelementptr inbounds [4 x double], ptr [[B]], i64 1, i64 1
	; CHECK-NEXT: [[TEMP5:%.*]] = load double, ptr [[ARRAYIDX18_I]], align 8
	; CHECK-NEXT: [[MUL19_I:%.*]] = fmul double [[TEMP2]], [[TEMP5]]
	; CHECK-NEXT: [[ADD20_I:%.*]] = fadd double [[MUL14_I]], [[MUL19_I]]
	; CHECK-NEXT: [[ARRAYIDX25_I:%.*]] = getelementptr inbounds [4 x double], ptr [[B]], i64 0, i64 2			; CHECK-NEXT: [[ARRAYIDX25_I:%.*]] = getelementptr inbounds [4 x double], ptr [[B]], i64 0, i64 2
	; CHECK-NEXT: [[TEMP6:%.*]] = load double, ptr [[ARRAYIDX25_I]], align 8
	; CHECK-NEXT: [[MUL26_I:%.*]] = fmul double [[TEMP]], [[TEMP6]]
	; CHECK-NEXT: [[ARRAYIDX30_I:%.*]] = getelementptr inbounds [4 x double], ptr [[B]], i64 1, i64 2			; CHECK-NEXT: [[ARRAYIDX30_I:%.*]] = getelementptr inbounds [4 x double], ptr [[B]], i64 1, i64 2
	; CHECK-NEXT: [[TEMP7:%.*]] = load double, ptr [[ARRAYIDX30_I]], align 8
	; CHECK-NEXT: [[MUL31_I:%.*]] = fmul double [[TEMP2]], [[TEMP7]]
	; CHECK-NEXT: [[ADD32_I:%.*]] = fadd double [[MUL26_I]], [[MUL31_I]]
	; CHECK-NEXT: [[ARRAYIDX37_I:%.*]] = getelementptr inbounds [4 x double], ptr [[B]], i64 0, i64 3
	; CHECK-NEXT: [[TEMP8:%.*]] = load double, ptr [[ARRAYIDX37_I]], align 8
	; CHECK-NEXT: [[MUL38_I:%.*]] = fmul double [[TEMP]], [[TEMP8]]
	; CHECK-NEXT: [[ARRAYIDX42_I:%.*]] = getelementptr inbounds [4 x double], ptr [[B]], i64 1, i64 3
	; CHECK-NEXT: [[TEMP9:%.*]] = load double, ptr [[ARRAYIDX42_I]], align 8
	; CHECK-NEXT: [[MUL43_I:%.*]] = fmul double [[TEMP2]], [[TEMP9]]
	; CHECK-NEXT: [[ADD44_I:%.*]] = fadd double [[MUL38_I]], [[MUL43_I]]
	; CHECK-NEXT: [[ARRAYIDX47_I:%.*]] = getelementptr inbounds [2 x double], ptr [[A]], i64 1, i64 0			; CHECK-NEXT: [[ARRAYIDX47_I:%.*]] = getelementptr inbounds [2 x double], ptr [[A]], i64 1, i64 0
	; CHECK-NEXT: [[TEMP10:%.*]] = load double, ptr [[ARRAYIDX47_I]], align 8			; CHECK-NEXT: [[TEMP10:%.*]] = load double, ptr [[ARRAYIDX47_I]], align 8
	; CHECK-NEXT: [[MUL50_I:%.*]] = fmul double [[TEMP1]], [[TEMP10]]
	; CHECK-NEXT: [[ARRAYIDX52_I:%.*]] = getelementptr inbounds [2 x double], ptr [[A]], i64 1, i64 1			; CHECK-NEXT: [[ARRAYIDX52_I:%.*]] = getelementptr inbounds [2 x double], ptr [[A]], i64 1, i64 1
	; CHECK-NEXT: [[TEMP11:%.*]] = load double, ptr [[ARRAYIDX52_I]], align 8			; CHECK-NEXT: [[TEMP11:%.*]] = load double, ptr [[ARRAYIDX52_I]], align 8
	; CHECK-NEXT: [[MUL55_I:%.*]] = fmul double [[TEMP3]], [[TEMP11]]			; CHECK-NEXT: [[TMP1:%.*]] = load <2 x double>, ptr [[B]], align 8
	; CHECK-NEXT: [[ADD56_I:%.*]] = fadd double [[MUL50_I]], [[MUL55_I]]			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[TEMP]], i32 0
	; CHECK-NEXT: [[MUL62_I:%.*]] = fmul double [[TEMP4]], [[TEMP10]]			; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x double> [[TMP2]], <2 x double> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[MUL67_I:%.*]] = fmul double [[TEMP5]], [[TEMP11]]			; CHECK-NEXT: [[TMP4:%.*]] = fmul <2 x double> [[TMP3]], [[TMP1]]
	; CHECK-NEXT: [[ADD68_I:%.*]] = fadd double [[MUL62_I]], [[MUL67_I]]			; CHECK-NEXT: [[TMP5:%.*]] = load <2 x double>, ptr [[ARRAYIDX7_I]], align 8
	; CHECK-NEXT: [[MUL74_I:%.*]] = fmul double [[TEMP6]], [[TEMP10]]			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x double> poison, double [[TEMP2]], i32 0
	; CHECK-NEXT: [[MUL79_I:%.*]] = fmul double [[TEMP7]], [[TEMP11]]			; CHECK-NEXT: [[TMP7:%.*]] = shufflevector <2 x double> [[TMP6]], <2 x double> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[ADD80_I:%.*]] = fadd double [[MUL74_I]], [[MUL79_I]]			; CHECK-NEXT: [[TMP8:%.*]] = fmul <2 x double> [[TMP7]], [[TMP5]]
	; CHECK-NEXT: [[MUL86_I:%.*]] = fmul double [[TEMP8]], [[TEMP10]]			; CHECK-NEXT: [[TMP9:%.*]] = fadd <2 x double> [[TMP4]], [[TMP8]]
	; CHECK-NEXT: [[MUL91_I:%.*]] = fmul double [[TEMP9]], [[TEMP11]]			; CHECK-NEXT: [[RES_I_SROA_5_0_OUT2_I_SROA_IDX4:%.]] = getelementptr inbounds double, ptr [[OUT:%.]], i64 2
	; CHECK-NEXT: [[ADD92_I:%.*]] = fadd double [[MUL86_I]], [[MUL91_I]]			; CHECK-NEXT: [[TMP10:%.*]] = load <2 x double>, ptr [[ARRAYIDX25_I]], align 8
	; CHECK-NEXT: store double [[ADD_I]], ptr [[OUT:%.*]], align 8			; CHECK-NEXT: [[TMP11:%.*]] = fmul <2 x double> [[TMP3]], [[TMP10]]
	; CHECK-NEXT: [[RES_I_SROA_4_0_OUT2_I_SROA_IDX2:%.*]] = getelementptr inbounds double, ptr [[OUT]], i64 1			; CHECK-NEXT: [[TMP12:%.*]] = load <2 x double>, ptr [[ARRAYIDX30_I]], align 8
	; CHECK-NEXT: store double [[ADD20_I]], ptr [[RES_I_SROA_4_0_OUT2_I_SROA_IDX2]], align 8			; CHECK-NEXT: [[TMP13:%.*]] = fmul <2 x double> [[TMP7]], [[TMP12]]
	; CHECK-NEXT: [[RES_I_SROA_5_0_OUT2_I_SROA_IDX4:%.*]] = getelementptr inbounds double, ptr [[OUT]], i64 2			; CHECK-NEXT: [[TMP14:%.*]] = fadd <2 x double> [[TMP11]], [[TMP13]]
	; CHECK-NEXT: store double [[ADD32_I]], ptr [[RES_I_SROA_5_0_OUT2_I_SROA_IDX4]], align 8			; CHECK-NEXT: store <2 x double> [[TMP9]], ptr [[OUT]], align 8
	; CHECK-NEXT: [[RES_I_SROA_6_0_OUT2_I_SROA_IDX6:%.*]] = getelementptr inbounds double, ptr [[OUT]], i64 3			; CHECK-NEXT: store <2 x double> [[TMP14]], ptr [[RES_I_SROA_5_0_OUT2_I_SROA_IDX4]], align 8
	; CHECK-NEXT: store double [[ADD44_I]], ptr [[RES_I_SROA_6_0_OUT2_I_SROA_IDX6]], align 8
	; CHECK-NEXT: [[RES_I_SROA_7_0_OUT2_I_SROA_IDX8:%.*]] = getelementptr inbounds double, ptr [[OUT]], i64 4			; CHECK-NEXT: [[RES_I_SROA_7_0_OUT2_I_SROA_IDX8:%.*]] = getelementptr inbounds double, ptr [[OUT]], i64 4
	; CHECK-NEXT: store double [[ADD56_I]], ptr [[RES_I_SROA_7_0_OUT2_I_SROA_IDX8]], align 8			; CHECK-NEXT: [[TMP15:%.*]] = insertelement <2 x double> poison, double [[TEMP10]], i32 0
	; CHECK-NEXT: [[RES_I_SROA_8_0_OUT2_I_SROA_IDX10:%.*]] = getelementptr inbounds double, ptr [[OUT]], i64 5			; CHECK-NEXT: [[TMP16:%.*]] = shufflevector <2 x double> [[TMP15]], <2 x double> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: store double [[ADD68_I]], ptr [[RES_I_SROA_8_0_OUT2_I_SROA_IDX10]], align 8			; CHECK-NEXT: [[TMP17:%.*]] = fmul <2 x double> [[TMP1]], [[TMP16]]
				; CHECK-NEXT: [[TMP18:%.*]] = insertelement <2 x double> poison, double [[TEMP11]], i32 0
				; CHECK-NEXT: [[TMP19:%.*]] = shufflevector <2 x double> [[TMP18]], <2 x double> poison, <2 x i32> zeroinitializer
				; CHECK-NEXT: [[TMP20:%.*]] = fmul <2 x double> [[TMP5]], [[TMP19]]
				; CHECK-NEXT: [[TMP21:%.*]] = fadd <2 x double> [[TMP17]], [[TMP20]]
				; CHECK-NEXT: store <2 x double> [[TMP21]], ptr [[RES_I_SROA_7_0_OUT2_I_SROA_IDX8]], align 8
	; CHECK-NEXT: [[RES_I_SROA_9_0_OUT2_I_SROA_IDX12:%.*]] = getelementptr inbounds double, ptr [[OUT]], i64 6			; CHECK-NEXT: [[RES_I_SROA_9_0_OUT2_I_SROA_IDX12:%.*]] = getelementptr inbounds double, ptr [[OUT]], i64 6
	; CHECK-NEXT: store double [[ADD80_I]], ptr [[RES_I_SROA_9_0_OUT2_I_SROA_IDX12]], align 8			; CHECK-NEXT: [[TMP22:%.*]] = fmul <2 x double> [[TMP10]], [[TMP16]]
	; CHECK-NEXT: [[RES_I_SROA_10_0_OUT2_I_SROA_IDX14:%.*]] = getelementptr inbounds double, ptr [[OUT]], i64 7			; CHECK-NEXT: [[TMP23:%.*]] = fmul <2 x double> [[TMP12]], [[TMP19]]
	; CHECK-NEXT: store double [[ADD92_I]], ptr [[RES_I_SROA_10_0_OUT2_I_SROA_IDX14]], align 8			; CHECK-NEXT: [[TMP24:%.*]] = fadd <2 x double> [[TMP22]], [[TMP23]]
				; CHECK-NEXT: store <2 x double> [[TMP24]], ptr [[RES_I_SROA_9_0_OUT2_I_SROA_IDX12]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%temp = load double, ptr %A, align 8			%temp = load double, ptr %A, align 8
	%temp1 = load double, ptr %B, align 8			%temp1 = load double, ptr %B, align 8
	%mul.i = fmul double %temp, %temp1			%mul.i = fmul double %temp, %temp1
	%arrayidx5.i = getelementptr inbounds [2 x double], ptr %A, i64 0, i64 1			%arrayidx5.i = getelementptr inbounds [2 x double], ptr %A, i64 0, i64 1
	%temp2 = load double, ptr %arrayidx5.i, align 8			%temp2 = load double, ptr %arrayidx5.i, align 8
	%arrayidx7.i = getelementptr inbounds [4 x double], ptr %B, i64 1, i64 0			%arrayidx7.i = getelementptr inbounds [4 x double], ptr %B, i64 1, i64 0
	▲ Show 20 Lines • Show All 58 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/memory-runtime-checks.ll

	Show First 20 Lines • Show All 648 Lines • ▼ Show 20 Lines
	bb23:			bb23:
	ret void			ret void
	}			}

	; In this test there's a single bound, do not generate runtime checks.			; In this test there's a single bound, do not generate runtime checks.
	define void @single_membound(ptr %arg, ptr %arg1, double %x) {			define void @single_membound(ptr %arg, ptr %arg1, double %x) {
	; CHECK-LABEL: @single_membound(			; CHECK-LABEL: @single_membound(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP:%.]] = fsub double [[X:%.]], 9.900000e+01
	; CHECK-NEXT: [[TMP9:%.]] = getelementptr inbounds double, ptr [[ARG:%.]], i64 1			; CHECK-NEXT: [[TMP9:%.]] = getelementptr inbounds double, ptr [[ARG:%.]], i64 1
				; CHECK-NEXT: [[TMP:%.]] = fsub double [[X:%.]], 9.900000e+01
	; CHECK-NEXT: store double [[TMP]], ptr [[TMP9]], align 8			; CHECK-NEXT: store double [[TMP]], ptr [[TMP9]], align 8
	; CHECK-NEXT: [[TMP12:%.]] = load double, ptr [[ARG1:%.]], align 8			; CHECK-NEXT: [[TMP12:%.]] = load double, ptr [[ARG1:%.]], align 8
	; CHECK-NEXT: [[TMP13:%.*]] = fsub double 1.000000e+00, [[TMP12]]			; CHECK-NEXT: [[TMP13:%.*]] = fsub double 1.000000e+00, [[TMP12]]
	; CHECK-NEXT: [[TMP14:%.*]] = getelementptr inbounds double, ptr [[ARG]], i64 2
	; CHECK-NEXT: br label [[BB15:%.*]]			; CHECK-NEXT: br label [[BB15:%.*]]
	; CHECK: bb15:			; CHECK: bb15:
	; CHECK-NEXT: [[TMP16:%.*]] = fmul double [[TMP]], 2.000000e+01			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> poison, double [[TMP]], i32 0
	; CHECK-NEXT: store double [[TMP16]], ptr [[TMP9]], align 8			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> [[TMP0]], double [[TMP13]], i32 1
	; CHECK-NEXT: [[TMP17:%.*]] = fmul double [[TMP13]], 3.000000e+01			; CHECK-NEXT: [[TMP2:%.*]] = fmul <2 x double> [[TMP1]], <double 2.000000e+01, double 3.000000e+01>
	; CHECK-NEXT: store double [[TMP17]], ptr [[TMP14]], align 8			; CHECK-NEXT: store <2 x double> [[TMP2]], ptr [[TMP9]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%tmp = fsub double %x, 99.0			%tmp = fsub double %x, 99.0
	%tmp9 = getelementptr inbounds double, ptr %arg, i64 1			%tmp9 = getelementptr inbounds double, ptr %arg, i64 1
	store double %tmp, ptr %tmp9, align 8			store double %tmp, ptr %tmp9, align 8
	%tmp12 = load double, ptr %arg1, align 8			%tmp12 = load double, ptr %arg1, align 8
	%tmp13 = fsub double 1.0, %tmp12			%tmp13 = fsub double 1.0, %tmp12
	▲ Show 20 Lines • Show All 550 Lines • ▼ Show 20 Lines

	; A test case where there are no instructions accessing a tracked object in a			; A test case where there are no instructions accessing a tracked object in a
	; block for which versioning was requested.			; block for which versioning was requested.
	define void @crash_no_tracked_instructions(ptr %arg, ptr %arg.2, ptr %arg.3, i1 %c) {			define void @crash_no_tracked_instructions(ptr %arg, ptr %arg.2, ptr %arg.3, i1 %c) {
	; CHECK-LABEL: @crash_no_tracked_instructions(			; CHECK-LABEL: @crash_no_tracked_instructions(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[T19:%.]] = load ptr, ptr [[ARG:%.]], align 8			; CHECK-NEXT: [[T19:%.]] = load ptr, ptr [[ARG:%.]], align 8
	; CHECK-NEXT: [[T20:%.]] = load float, ptr [[ARG_3:%.]], align 4			; CHECK-NEXT: [[T20:%.]] = load float, ptr [[ARG_3:%.]], align 4
				; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x float> <float 0.000000e+00, float poison>, float [[T20]], i32 1
	; CHECK-NEXT: br i1 [[C:%.]], label [[BB22:%.]], label [[BB30:%.*]]			; CHECK-NEXT: br i1 [[C:%.]], label [[BB22:%.]], label [[BB30:%.*]]
	; CHECK: bb22:			; CHECK: bb22:
	; CHECK-NEXT: [[T23:%.*]] = fmul float [[T20]], 9.900000e+01			; CHECK-NEXT: [[T23:%.*]] = fmul float [[T20]], 9.900000e+01
	; CHECK-NEXT: [[T24:%.*]] = fmul float [[T23]], 9.900000e+01
	; CHECK-NEXT: [[T25:%.*]] = getelementptr inbounds float, ptr [[T19]], i64 2			; CHECK-NEXT: [[T25:%.*]] = getelementptr inbounds float, ptr [[T19]], i64 2
	; CHECK-NEXT: [[T26:%.*]] = fmul float [[T23]], 1.000000e+01			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x float> poison, float [[T23]], i32 0
	; CHECK-NEXT: store float [[T26]], ptr [[T25]], align 4			; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <2 x float> [[TMP1]], <2 x float> poison, <2 x i32> zeroinitializer
				; CHECK-NEXT: [[TMP3:%.*]] = fmul <2 x float> [[TMP2]], <float 9.900000e+01, float 1.000000e+01>
				; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x float> [[TMP3]], i32 1
				; CHECK-NEXT: store float [[TMP4]], ptr [[T25]], align 4
	; CHECK-NEXT: [[T27:%.]] = load float, ptr [[ARG_2:%.]], align 8			; CHECK-NEXT: [[T27:%.]] = load float, ptr [[ARG_2:%.]], align 8
	; CHECK-NEXT: [[T28:%.*]] = fadd float [[T24]], 2.000000e+01			; CHECK-NEXT: [[TMP5:%.*]] = fadd <2 x float> [[TMP3]], <float 2.000000e+01, float 2.000000e+01>
	; CHECK-NEXT: [[T29:%.*]] = fadd float [[T26]], 2.000000e+01
	; CHECK-NEXT: br label [[BB30]]			; CHECK-NEXT: br label [[BB30]]
	; CHECK: bb30:			; CHECK: bb30:
	; CHECK-NEXT: [[T31:%.]] = phi float [ [[T28]], [[BB22]] ], [ 0.000000e+00, [[ENTRY:%.]] ]			; CHECK-NEXT: [[TMP6:%.]] = phi <2 x float> [ [[TMP5]], [[BB22]] ], [ [[TMP0]], [[ENTRY:%.]] ]
	; CHECK-NEXT: [[T32:%.*]] = phi float [ [[T29]], [[BB22]] ], [ [[T20]], [[ENTRY]] ]
	; CHECK-NEXT: br label [[BB36:%.*]]			; CHECK-NEXT: br label [[BB36:%.*]]
	; CHECK: bb36:			; CHECK: bb36:
	; CHECK-NEXT: [[T37:%.*]] = fmul float [[T31]], 3.000000e+00			; CHECK-NEXT: [[TMP7:%.*]] = fmul <2 x float> [[TMP6]], <float 3.000000e+00, float 3.000000e+00>
	; CHECK-NEXT: store float [[T37]], ptr [[ARG_3]], align 4			; CHECK-NEXT: store <2 x float> [[TMP7]], ptr [[ARG_3]], align 4
	; CHECK-NEXT: [[T39:%.*]] = fmul float [[T32]], 3.000000e+00
	; CHECK-NEXT: [[T40:%.*]] = getelementptr inbounds float, ptr [[ARG_3]], i64 1
	; CHECK-NEXT: store float [[T39]], ptr [[T40]], align 4
	; CHECK-NEXT: br label [[BB41:%.*]]			; CHECK-NEXT: br label [[BB41:%.*]]
	; CHECK: bb41:			; CHECK: bb41:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%t19 = load ptr, ptr %arg			%t19 = load ptr, ptr %arg
	%t20 = load float, ptr %arg.3, align 4			%t20 = load float, ptr %arg.3, align 4
	br i1 %c, label %bb22, label %bb30			br i1 %c, label %bb22, label %bb30
	Show All 28 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/multiple_reduction.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -passes=slp-vectorizer -S \| FileCheck %s			; RUN: opt < %s -passes=slp-vectorizer -S \| FileCheck %s

	target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
	target triple = "aarch64"			target triple = "aarch64"

	; This test has mutual reductions, referencing the same data:			; This test has mutual reductions, referencing the same data:
	; for i = ...			; for i = ...
	; sm += x[i];			; sm += x[i];
	; sq += xptr x[i];			; sq += xptr x[i];
	; It currently doesn't SLP vectorize, but should.			; It currently doesn't SLP vectorize, but should.

	define i64 @straight(ptr nocapture noundef readonly %p, i32 noundef %st) {			define i64 @straight(ptr nocapture noundef readonly %p, i32 noundef %st) {
	; CHECK-LABEL: @straight(			; CHECK-LABEL: @straight(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[IDX_EXT:%.]] = sext i32 [[ST:%.]] to i64			; CHECK-NEXT: [[IDX_EXT:%.]] = sext i32 [[ST:%.]] to i64
	; CHECK-NEXT: [[TMP0:%.]] = load i16, ptr [[P:%.]], align 2			; CHECK-NEXT: [[TMP0:%.]] = load <8 x i16>, ptr [[P:%.]], align 2
	; CHECK-NEXT: [[CONV:%.*]] = zext i16 [[TMP0]] to i32
	; CHECK-NEXT: [[MUL:%.*]] = mul nuw nsw i32 [[CONV]], [[CONV]]
	; CHECK-NEXT: [[ARRAYIDX_1:%.*]] = getelementptr inbounds i16, ptr [[P]], i64 1
	; CHECK-NEXT: [[TMP1:%.*]] = load i16, ptr [[ARRAYIDX_1]], align 2
	; CHECK-NEXT: [[CONV_1:%.*]] = zext i16 [[TMP1]] to i32
	; CHECK-NEXT: [[ADD_1:%.*]] = add nuw nsw i32 [[CONV]], [[CONV_1]]
	; CHECK-NEXT: [[MUL_1:%.*]] = mul nuw nsw i32 [[CONV_1]], [[CONV_1]]
	; CHECK-NEXT: [[ADD11_1:%.*]] = add nuw i32 [[MUL_1]], [[MUL]]
	; CHECK-NEXT: [[ARRAYIDX_2:%.*]] = getelementptr inbounds i16, ptr [[P]], i64 2
	; CHECK-NEXT: [[TMP2:%.*]] = load i16, ptr [[ARRAYIDX_2]], align 2
	; CHECK-NEXT: [[CONV_2:%.*]] = zext i16 [[TMP2]] to i32
	; CHECK-NEXT: [[ADD_2:%.*]] = add nuw nsw i32 [[ADD_1]], [[CONV_2]]
	; CHECK-NEXT: [[MUL_2:%.*]] = mul nuw nsw i32 [[CONV_2]], [[CONV_2]]
	; CHECK-NEXT: [[ADD11_2:%.*]] = add i32 [[MUL_2]], [[ADD11_1]]
	; CHECK-NEXT: [[ARRAYIDX_3:%.*]] = getelementptr inbounds i16, ptr [[P]], i64 3
	; CHECK-NEXT: [[TMP3:%.*]] = load i16, ptr [[ARRAYIDX_3]], align 2
	; CHECK-NEXT: [[CONV_3:%.*]] = zext i16 [[TMP3]] to i32
	; CHECK-NEXT: [[ADD_3:%.*]] = add nuw nsw i32 [[ADD_2]], [[CONV_3]]
	; CHECK-NEXT: [[MUL_3:%.*]] = mul nuw nsw i32 [[CONV_3]], [[CONV_3]]
	; CHECK-NEXT: [[ADD11_3:%.*]] = add i32 [[MUL_3]], [[ADD11_2]]
	; CHECK-NEXT: [[ARRAYIDX_4:%.*]] = getelementptr inbounds i16, ptr [[P]], i64 4
	; CHECK-NEXT: [[TMP4:%.*]] = load i16, ptr [[ARRAYIDX_4]], align 2
	; CHECK-NEXT: [[CONV_4:%.*]] = zext i16 [[TMP4]] to i32
	; CHECK-NEXT: [[ADD_4:%.*]] = add nuw nsw i32 [[ADD_3]], [[CONV_4]]
	; CHECK-NEXT: [[MUL_4:%.*]] = mul nuw nsw i32 [[CONV_4]], [[CONV_4]]
	; CHECK-NEXT: [[ADD11_4:%.*]] = add i32 [[MUL_4]], [[ADD11_3]]
	; CHECK-NEXT: [[ARRAYIDX_5:%.*]] = getelementptr inbounds i16, ptr [[P]], i64 5
	; CHECK-NEXT: [[TMP5:%.*]] = load i16, ptr [[ARRAYIDX_5]], align 2
	; CHECK-NEXT: [[CONV_5:%.*]] = zext i16 [[TMP5]] to i32
	; CHECK-NEXT: [[ADD_5:%.*]] = add nuw nsw i32 [[ADD_4]], [[CONV_5]]
	; CHECK-NEXT: [[MUL_5:%.*]] = mul nuw nsw i32 [[CONV_5]], [[CONV_5]]
	; CHECK-NEXT: [[ADD11_5:%.*]] = add i32 [[MUL_5]], [[ADD11_4]]
	; CHECK-NEXT: [[ARRAYIDX_6:%.*]] = getelementptr inbounds i16, ptr [[P]], i64 6
	; CHECK-NEXT: [[TMP6:%.*]] = load i16, ptr [[ARRAYIDX_6]], align 2
	; CHECK-NEXT: [[CONV_6:%.*]] = zext i16 [[TMP6]] to i32
	; CHECK-NEXT: [[ADD_6:%.*]] = add nuw nsw i32 [[ADD_5]], [[CONV_6]]
	; CHECK-NEXT: [[MUL_6:%.*]] = mul nuw nsw i32 [[CONV_6]], [[CONV_6]]
	; CHECK-NEXT: [[ADD11_6:%.*]] = add i32 [[MUL_6]], [[ADD11_5]]
	; CHECK-NEXT: [[ARRAYIDX_7:%.*]] = getelementptr inbounds i16, ptr [[P]], i64 7
	; CHECK-NEXT: [[TMP7:%.*]] = load i16, ptr [[ARRAYIDX_7]], align 2
	; CHECK-NEXT: [[CONV_7:%.*]] = zext i16 [[TMP7]] to i32
	; CHECK-NEXT: [[ADD_7:%.*]] = add nuw nsw i32 [[ADD_6]], [[CONV_7]]
	; CHECK-NEXT: [[MUL_7:%.*]] = mul nuw nsw i32 [[CONV_7]], [[CONV_7]]
	; CHECK-NEXT: [[ADD11_7:%.*]] = add i32 [[MUL_7]], [[ADD11_6]]
	; CHECK-NEXT: [[ADD_PTR:%.*]] = getelementptr inbounds i16, ptr [[P]], i64 [[IDX_EXT]]			; CHECK-NEXT: [[ADD_PTR:%.*]] = getelementptr inbounds i16, ptr [[P]], i64 [[IDX_EXT]]
	; CHECK-NEXT: [[TMP8:%.*]] = load i16, ptr [[ADD_PTR]], align 2			; CHECK-NEXT: [[TMP1:%.*]] = load <8 x i16>, ptr [[ADD_PTR]], align 2
	; CHECK-NEXT: [[CONV_140:%.*]] = zext i16 [[TMP8]] to i32
	; CHECK-NEXT: [[ADD_141:%.*]] = add nuw nsw i32 [[ADD_7]], [[CONV_140]]
	; CHECK-NEXT: [[MUL_142:%.*]] = mul nuw nsw i32 [[CONV_140]], [[CONV_140]]
	; CHECK-NEXT: [[ADD11_143:%.*]] = add i32 [[MUL_142]], [[ADD11_7]]
	; CHECK-NEXT: [[ARRAYIDX_1_1:%.*]] = getelementptr inbounds i16, ptr [[ADD_PTR]], i64 1
	; CHECK-NEXT: [[TMP9:%.*]] = load i16, ptr [[ARRAYIDX_1_1]], align 2
	; CHECK-NEXT: [[CONV_1_1:%.*]] = zext i16 [[TMP9]] to i32
	; CHECK-NEXT: [[ADD_1_1:%.*]] = add nuw nsw i32 [[ADD_141]], [[CONV_1_1]]
	; CHECK-NEXT: [[MUL_1_1:%.*]] = mul nuw nsw i32 [[CONV_1_1]], [[CONV_1_1]]
	; CHECK-NEXT: [[ADD11_1_1:%.*]] = add i32 [[MUL_1_1]], [[ADD11_143]]
	; CHECK-NEXT: [[ARRAYIDX_2_1:%.*]] = getelementptr inbounds i16, ptr [[ADD_PTR]], i64 2
	; CHECK-NEXT: [[TMP10:%.*]] = load i16, ptr [[ARRAYIDX_2_1]], align 2
	; CHECK-NEXT: [[CONV_2_1:%.*]] = zext i16 [[TMP10]] to i32
	; CHECK-NEXT: [[ADD_2_1:%.*]] = add nuw nsw i32 [[ADD_1_1]], [[CONV_2_1]]
	; CHECK-NEXT: [[MUL_2_1:%.*]] = mul nuw nsw i32 [[CONV_2_1]], [[CONV_2_1]]
	; CHECK-NEXT: [[ADD11_2_1:%.*]] = add i32 [[MUL_2_1]], [[ADD11_1_1]]
	; CHECK-NEXT: [[ARRAYIDX_3_1:%.*]] = getelementptr inbounds i16, ptr [[ADD_PTR]], i64 3
	; CHECK-NEXT: [[TMP11:%.*]] = load i16, ptr [[ARRAYIDX_3_1]], align 2
	; CHECK-NEXT: [[CONV_3_1:%.*]] = zext i16 [[TMP11]] to i32
	; CHECK-NEXT: [[ADD_3_1:%.*]] = add nuw nsw i32 [[ADD_2_1]], [[CONV_3_1]]
	; CHECK-NEXT: [[MUL_3_1:%.*]] = mul nuw nsw i32 [[CONV_3_1]], [[CONV_3_1]]
	; CHECK-NEXT: [[ADD11_3_1:%.*]] = add i32 [[MUL_3_1]], [[ADD11_2_1]]
	; CHECK-NEXT: [[ARRAYIDX_4_1:%.*]] = getelementptr inbounds i16, ptr [[ADD_PTR]], i64 4
	; CHECK-NEXT: [[TMP12:%.*]] = load i16, ptr [[ARRAYIDX_4_1]], align 2
	; CHECK-NEXT: [[CONV_4_1:%.*]] = zext i16 [[TMP12]] to i32
	; CHECK-NEXT: [[ADD_4_1:%.*]] = add nuw nsw i32 [[ADD_3_1]], [[CONV_4_1]]
	; CHECK-NEXT: [[MUL_4_1:%.*]] = mul nuw nsw i32 [[CONV_4_1]], [[CONV_4_1]]
	; CHECK-NEXT: [[ADD11_4_1:%.*]] = add i32 [[MUL_4_1]], [[ADD11_3_1]]
	; CHECK-NEXT: [[ARRAYIDX_5_1:%.*]] = getelementptr inbounds i16, ptr [[ADD_PTR]], i64 5
	; CHECK-NEXT: [[TMP13:%.*]] = load i16, ptr [[ARRAYIDX_5_1]], align 2
	; CHECK-NEXT: [[CONV_5_1:%.*]] = zext i16 [[TMP13]] to i32
	; CHECK-NEXT: [[ADD_5_1:%.*]] = add nuw nsw i32 [[ADD_4_1]], [[CONV_5_1]]
	; CHECK-NEXT: [[MUL_5_1:%.*]] = mul nuw nsw i32 [[CONV_5_1]], [[CONV_5_1]]
	; CHECK-NEXT: [[ADD11_5_1:%.*]] = add i32 [[MUL_5_1]], [[ADD11_4_1]]
	; CHECK-NEXT: [[ARRAYIDX_6_1:%.*]] = getelementptr inbounds i16, ptr [[ADD_PTR]], i64 6
	; CHECK-NEXT: [[TMP14:%.*]] = load i16, ptr [[ARRAYIDX_6_1]], align 2
	; CHECK-NEXT: [[CONV_6_1:%.*]] = zext i16 [[TMP14]] to i32
	; CHECK-NEXT: [[ADD_6_1:%.*]] = add nuw nsw i32 [[ADD_5_1]], [[CONV_6_1]]
	; CHECK-NEXT: [[MUL_6_1:%.*]] = mul nuw nsw i32 [[CONV_6_1]], [[CONV_6_1]]
	; CHECK-NEXT: [[ADD11_6_1:%.*]] = add i32 [[MUL_6_1]], [[ADD11_5_1]]
	; CHECK-NEXT: [[ARRAYIDX_7_1:%.*]] = getelementptr inbounds i16, ptr [[ADD_PTR]], i64 7
	; CHECK-NEXT: [[TMP15:%.*]] = load i16, ptr [[ARRAYIDX_7_1]], align 2
	; CHECK-NEXT: [[CONV_7_1:%.*]] = zext i16 [[TMP15]] to i32
	; CHECK-NEXT: [[ADD_7_1:%.*]] = add nuw nsw i32 [[ADD_6_1]], [[CONV_7_1]]
	; CHECK-NEXT: [[MUL_7_1:%.*]] = mul nuw nsw i32 [[CONV_7_1]], [[CONV_7_1]]
	; CHECK-NEXT: [[ADD11_7_1:%.*]] = add i32 [[MUL_7_1]], [[ADD11_6_1]]
	; CHECK-NEXT: [[ADD_PTR_1:%.*]] = getelementptr inbounds i16, ptr [[ADD_PTR]], i64 [[IDX_EXT]]			; CHECK-NEXT: [[ADD_PTR_1:%.*]] = getelementptr inbounds i16, ptr [[ADD_PTR]], i64 [[IDX_EXT]]
	; CHECK-NEXT: [[TMP16:%.*]] = load i16, ptr [[ADD_PTR_1]], align 2			; CHECK-NEXT: [[TMP2:%.*]] = load <8 x i16>, ptr [[ADD_PTR_1]], align 2
	; CHECK-NEXT: [[CONV_244:%.*]] = zext i16 [[TMP16]] to i32
	; CHECK-NEXT: [[ADD_245:%.*]] = add nuw nsw i32 [[ADD_7_1]], [[CONV_244]]
	; CHECK-NEXT: [[MUL_246:%.*]] = mul nuw nsw i32 [[CONV_244]], [[CONV_244]]
	; CHECK-NEXT: [[ADD11_247:%.*]] = add i32 [[MUL_246]], [[ADD11_7_1]]
	; CHECK-NEXT: [[ARRAYIDX_1_2:%.*]] = getelementptr inbounds i16, ptr [[ADD_PTR_1]], i64 1
	; CHECK-NEXT: [[TMP17:%.*]] = load i16, ptr [[ARRAYIDX_1_2]], align 2
	; CHECK-NEXT: [[CONV_1_2:%.*]] = zext i16 [[TMP17]] to i32
	; CHECK-NEXT: [[ADD_1_2:%.*]] = add nuw nsw i32 [[ADD_245]], [[CONV_1_2]]
	; CHECK-NEXT: [[MUL_1_2:%.*]] = mul nuw nsw i32 [[CONV_1_2]], [[CONV_1_2]]
	; CHECK-NEXT: [[ADD11_1_2:%.*]] = add i32 [[MUL_1_2]], [[ADD11_247]]
	; CHECK-NEXT: [[ARRAYIDX_2_2:%.*]] = getelementptr inbounds i16, ptr [[ADD_PTR_1]], i64 2
	; CHECK-NEXT: [[TMP18:%.*]] = load i16, ptr [[ARRAYIDX_2_2]], align 2
	; CHECK-NEXT: [[CONV_2_2:%.*]] = zext i16 [[TMP18]] to i32
	; CHECK-NEXT: [[ADD_2_2:%.*]] = add nuw nsw i32 [[ADD_1_2]], [[CONV_2_2]]
	; CHECK-NEXT: [[MUL_2_2:%.*]] = mul nuw nsw i32 [[CONV_2_2]], [[CONV_2_2]]
	; CHECK-NEXT: [[ADD11_2_2:%.*]] = add i32 [[MUL_2_2]], [[ADD11_1_2]]
	; CHECK-NEXT: [[ARRAYIDX_3_2:%.*]] = getelementptr inbounds i16, ptr [[ADD_PTR_1]], i64 3
	; CHECK-NEXT: [[TMP19:%.*]] = load i16, ptr [[ARRAYIDX_3_2]], align 2
	; CHECK-NEXT: [[CONV_3_2:%.*]] = zext i16 [[TMP19]] to i32
	; CHECK-NEXT: [[ADD_3_2:%.*]] = add nuw nsw i32 [[ADD_2_2]], [[CONV_3_2]]
	; CHECK-NEXT: [[MUL_3_2:%.*]] = mul nuw nsw i32 [[CONV_3_2]], [[CONV_3_2]]
	; CHECK-NEXT: [[ADD11_3_2:%.*]] = add i32 [[MUL_3_2]], [[ADD11_2_2]]
	; CHECK-NEXT: [[ARRAYIDX_4_2:%.*]] = getelementptr inbounds i16, ptr [[ADD_PTR_1]], i64 4
	; CHECK-NEXT: [[TMP20:%.*]] = load i16, ptr [[ARRAYIDX_4_2]], align 2
	; CHECK-NEXT: [[CONV_4_2:%.*]] = zext i16 [[TMP20]] to i32
	; CHECK-NEXT: [[ADD_4_2:%.*]] = add nuw nsw i32 [[ADD_3_2]], [[CONV_4_2]]
	; CHECK-NEXT: [[MUL_4_2:%.*]] = mul nuw nsw i32 [[CONV_4_2]], [[CONV_4_2]]
	; CHECK-NEXT: [[ADD11_4_2:%.*]] = add i32 [[MUL_4_2]], [[ADD11_3_2]]
	; CHECK-NEXT: [[ARRAYIDX_5_2:%.*]] = getelementptr inbounds i16, ptr [[ADD_PTR_1]], i64 5
	; CHECK-NEXT: [[TMP21:%.*]] = load i16, ptr [[ARRAYIDX_5_2]], align 2
	; CHECK-NEXT: [[CONV_5_2:%.*]] = zext i16 [[TMP21]] to i32
	; CHECK-NEXT: [[ADD_5_2:%.*]] = add nuw nsw i32 [[ADD_4_2]], [[CONV_5_2]]
	; CHECK-NEXT: [[MUL_5_2:%.*]] = mul nuw nsw i32 [[CONV_5_2]], [[CONV_5_2]]
	; CHECK-NEXT: [[ADD11_5_2:%.*]] = add i32 [[MUL_5_2]], [[ADD11_4_2]]
	; CHECK-NEXT: [[ARRAYIDX_6_2:%.*]] = getelementptr inbounds i16, ptr [[ADD_PTR_1]], i64 6
	; CHECK-NEXT: [[TMP22:%.*]] = load i16, ptr [[ARRAYIDX_6_2]], align 2
	; CHECK-NEXT: [[CONV_6_2:%.*]] = zext i16 [[TMP22]] to i32
	; CHECK-NEXT: [[ADD_6_2:%.*]] = add nuw nsw i32 [[ADD_5_2]], [[CONV_6_2]]
	; CHECK-NEXT: [[MUL_6_2:%.*]] = mul nuw nsw i32 [[CONV_6_2]], [[CONV_6_2]]
	; CHECK-NEXT: [[ADD11_6_2:%.*]] = add i32 [[MUL_6_2]], [[ADD11_5_2]]
	; CHECK-NEXT: [[ARRAYIDX_7_2:%.*]] = getelementptr inbounds i16, ptr [[ADD_PTR_1]], i64 7
	; CHECK-NEXT: [[TMP23:%.*]] = load i16, ptr [[ARRAYIDX_7_2]], align 2
	; CHECK-NEXT: [[CONV_7_2:%.*]] = zext i16 [[TMP23]] to i32
	; CHECK-NEXT: [[ADD_7_2:%.*]] = add nuw nsw i32 [[ADD_6_2]], [[CONV_7_2]]
	; CHECK-NEXT: [[MUL_7_2:%.*]] = mul nuw nsw i32 [[CONV_7_2]], [[CONV_7_2]]
	; CHECK-NEXT: [[ADD11_7_2:%.*]] = add i32 [[MUL_7_2]], [[ADD11_6_2]]
	; CHECK-NEXT: [[ADD_PTR_2:%.*]] = getelementptr inbounds i16, ptr [[ADD_PTR_1]], i64 [[IDX_EXT]]			; CHECK-NEXT: [[ADD_PTR_2:%.*]] = getelementptr inbounds i16, ptr [[ADD_PTR_1]], i64 [[IDX_EXT]]
	; CHECK-NEXT: [[TMP24:%.*]] = load i16, ptr [[ADD_PTR_2]], align 2			; CHECK-NEXT: [[TMP3:%.*]] = load <8 x i16>, ptr [[ADD_PTR_2]], align 2
	; CHECK-NEXT: [[CONV_348:%.*]] = zext i16 [[TMP24]] to i32
	; CHECK-NEXT: [[ADD_349:%.*]] = add nuw nsw i32 [[ADD_7_2]], [[CONV_348]]
	; CHECK-NEXT: [[MUL_350:%.*]] = mul nuw nsw i32 [[CONV_348]], [[CONV_348]]
	; CHECK-NEXT: [[ADD11_351:%.*]] = add i32 [[MUL_350]], [[ADD11_7_2]]
	; CHECK-NEXT: [[ARRAYIDX_1_3:%.*]] = getelementptr inbounds i16, ptr [[ADD_PTR_2]], i64 1
	; CHECK-NEXT: [[TMP25:%.*]] = load i16, ptr [[ARRAYIDX_1_3]], align 2
	; CHECK-NEXT: [[CONV_1_3:%.*]] = zext i16 [[TMP25]] to i32
	; CHECK-NEXT: [[ADD_1_3:%.*]] = add nuw nsw i32 [[ADD_349]], [[CONV_1_3]]
	; CHECK-NEXT: [[MUL_1_3:%.*]] = mul nuw nsw i32 [[CONV_1_3]], [[CONV_1_3]]
	; CHECK-NEXT: [[ADD11_1_3:%.*]] = add i32 [[MUL_1_3]], [[ADD11_351]]
	; CHECK-NEXT: [[ARRAYIDX_2_3:%.*]] = getelementptr inbounds i16, ptr [[ADD_PTR_2]], i64 2
	; CHECK-NEXT: [[TMP26:%.*]] = load i16, ptr [[ARRAYIDX_2_3]], align 2
	; CHECK-NEXT: [[CONV_2_3:%.*]] = zext i16 [[TMP26]] to i32
	; CHECK-NEXT: [[ADD_2_3:%.*]] = add nuw nsw i32 [[ADD_1_3]], [[CONV_2_3]]
	; CHECK-NEXT: [[MUL_2_3:%.*]] = mul nuw nsw i32 [[CONV_2_3]], [[CONV_2_3]]
	; CHECK-NEXT: [[ADD11_2_3:%.*]] = add i32 [[MUL_2_3]], [[ADD11_1_3]]
	; CHECK-NEXT: [[ARRAYIDX_3_3:%.*]] = getelementptr inbounds i16, ptr [[ADD_PTR_2]], i64 3
	; CHECK-NEXT: [[TMP27:%.*]] = load i16, ptr [[ARRAYIDX_3_3]], align 2
	; CHECK-NEXT: [[CONV_3_3:%.*]] = zext i16 [[TMP27]] to i32
	; CHECK-NEXT: [[ADD_3_3:%.*]] = add nuw nsw i32 [[ADD_2_3]], [[CONV_3_3]]
	; CHECK-NEXT: [[MUL_3_3:%.*]] = mul nuw nsw i32 [[CONV_3_3]], [[CONV_3_3]]
	; CHECK-NEXT: [[ADD11_3_3:%.*]] = add i32 [[MUL_3_3]], [[ADD11_2_3]]
	; CHECK-NEXT: [[ARRAYIDX_4_3:%.*]] = getelementptr inbounds i16, ptr [[ADD_PTR_2]], i64 4
	; CHECK-NEXT: [[TMP28:%.*]] = load i16, ptr [[ARRAYIDX_4_3]], align 2
	; CHECK-NEXT: [[CONV_4_3:%.*]] = zext i16 [[TMP28]] to i32
	; CHECK-NEXT: [[ADD_4_3:%.*]] = add nuw nsw i32 [[ADD_3_3]], [[CONV_4_3]]
	; CHECK-NEXT: [[MUL_4_3:%.*]] = mul nuw nsw i32 [[CONV_4_3]], [[CONV_4_3]]
	; CHECK-NEXT: [[ADD11_4_3:%.*]] = add i32 [[MUL_4_3]], [[ADD11_3_3]]
	; CHECK-NEXT: [[ARRAYIDX_5_3:%.*]] = getelementptr inbounds i16, ptr [[ADD_PTR_2]], i64 5
	; CHECK-NEXT: [[TMP29:%.*]] = load i16, ptr [[ARRAYIDX_5_3]], align 2
	; CHECK-NEXT: [[CONV_5_3:%.*]] = zext i16 [[TMP29]] to i32
	; CHECK-NEXT: [[ADD_5_3:%.*]] = add nuw nsw i32 [[ADD_4_3]], [[CONV_5_3]]
	; CHECK-NEXT: [[MUL_5_3:%.*]] = mul nuw nsw i32 [[CONV_5_3]], [[CONV_5_3]]
	; CHECK-NEXT: [[ADD11_5_3:%.*]] = add i32 [[MUL_5_3]], [[ADD11_4_3]]
	; CHECK-NEXT: [[ARRAYIDX_6_3:%.*]] = getelementptr inbounds i16, ptr [[ADD_PTR_2]], i64 6
	; CHECK-NEXT: [[TMP30:%.*]] = load i16, ptr [[ARRAYIDX_6_3]], align 2
	; CHECK-NEXT: [[CONV_6_3:%.*]] = zext i16 [[TMP30]] to i32
	; CHECK-NEXT: [[ADD_6_3:%.*]] = add nuw nsw i32 [[ADD_5_3]], [[CONV_6_3]]
	; CHECK-NEXT: [[MUL_6_3:%.*]] = mul nuw nsw i32 [[CONV_6_3]], [[CONV_6_3]]
	; CHECK-NEXT: [[ADD11_6_3:%.*]] = add i32 [[MUL_6_3]], [[ADD11_5_3]]
	; CHECK-NEXT: [[ARRAYIDX_7_3:%.*]] = getelementptr inbounds i16, ptr [[ADD_PTR_2]], i64 7
	; CHECK-NEXT: [[TMP31:%.*]] = load i16, ptr [[ARRAYIDX_7_3]], align 2
	; CHECK-NEXT: [[CONV_7_3:%.*]] = zext i16 [[TMP31]] to i32
	; CHECK-NEXT: [[ADD_7_3:%.*]] = add nuw nsw i32 [[ADD_6_3]], [[CONV_7_3]]
	; CHECK-NEXT: [[MUL_7_3:%.*]] = mul nuw nsw i32 [[CONV_7_3]], [[CONV_7_3]]
	; CHECK-NEXT: [[ADD11_7_3:%.*]] = add i32 [[MUL_7_3]], [[ADD11_6_3]]
	; CHECK-NEXT: [[ADD_PTR_3:%.*]] = getelementptr inbounds i16, ptr [[ADD_PTR_2]], i64 [[IDX_EXT]]			; CHECK-NEXT: [[ADD_PTR_3:%.*]] = getelementptr inbounds i16, ptr [[ADD_PTR_2]], i64 [[IDX_EXT]]
	; CHECK-NEXT: [[TMP32:%.*]] = load i16, ptr [[ADD_PTR_3]], align 2			; CHECK-NEXT: [[TMP4:%.*]] = load <8 x i16>, ptr [[ADD_PTR_3]], align 2
	; CHECK-NEXT: [[CONV_452:%.*]] = zext i16 [[TMP32]] to i32
	; CHECK-NEXT: [[ADD_453:%.*]] = add nuw nsw i32 [[ADD_7_3]], [[CONV_452]]
	; CHECK-NEXT: [[MUL_454:%.*]] = mul nuw nsw i32 [[CONV_452]], [[CONV_452]]
	; CHECK-NEXT: [[ADD11_455:%.*]] = add i32 [[MUL_454]], [[ADD11_7_3]]
	; CHECK-NEXT: [[ARRAYIDX_1_4:%.*]] = getelementptr inbounds i16, ptr [[ADD_PTR_3]], i64 1
	; CHECK-NEXT: [[TMP33:%.*]] = load i16, ptr [[ARRAYIDX_1_4]], align 2
	; CHECK-NEXT: [[CONV_1_4:%.*]] = zext i16 [[TMP33]] to i32
	; CHECK-NEXT: [[ADD_1_4:%.*]] = add nuw nsw i32 [[ADD_453]], [[CONV_1_4]]
	; CHECK-NEXT: [[MUL_1_4:%.*]] = mul nuw nsw i32 [[CONV_1_4]], [[CONV_1_4]]
	; CHECK-NEXT: [[ADD11_1_4:%.*]] = add i32 [[MUL_1_4]], [[ADD11_455]]
	; CHECK-NEXT: [[ARRAYIDX_2_4:%.*]] = getelementptr inbounds i16, ptr [[ADD_PTR_3]], i64 2
	; CHECK-NEXT: [[TMP34:%.*]] = load i16, ptr [[ARRAYIDX_2_4]], align 2
	; CHECK-NEXT: [[CONV_2_4:%.*]] = zext i16 [[TMP34]] to i32
	; CHECK-NEXT: [[ADD_2_4:%.*]] = add nuw nsw i32 [[ADD_1_4]], [[CONV_2_4]]
	; CHECK-NEXT: [[MUL_2_4:%.*]] = mul nuw nsw i32 [[CONV_2_4]], [[CONV_2_4]]
	; CHECK-NEXT: [[ADD11_2_4:%.*]] = add i32 [[MUL_2_4]], [[ADD11_1_4]]
	; CHECK-NEXT: [[ARRAYIDX_3_4:%.*]] = getelementptr inbounds i16, ptr [[ADD_PTR_3]], i64 3
	; CHECK-NEXT: [[TMP35:%.*]] = load i16, ptr [[ARRAYIDX_3_4]], align 2
	; CHECK-NEXT: [[CONV_3_4:%.*]] = zext i16 [[TMP35]] to i32
	; CHECK-NEXT: [[ADD_3_4:%.*]] = add nuw nsw i32 [[ADD_2_4]], [[CONV_3_4]]
	; CHECK-NEXT: [[MUL_3_4:%.*]] = mul nuw nsw i32 [[CONV_3_4]], [[CONV_3_4]]
	; CHECK-NEXT: [[ADD11_3_4:%.*]] = add i32 [[MUL_3_4]], [[ADD11_2_4]]
	; CHECK-NEXT: [[ARRAYIDX_4_4:%.*]] = getelementptr inbounds i16, ptr [[ADD_PTR_3]], i64 4
	; CHECK-NEXT: [[TMP36:%.*]] = load i16, ptr [[ARRAYIDX_4_4]], align 2
	; CHECK-NEXT: [[CONV_4_4:%.*]] = zext i16 [[TMP36]] to i32
	; CHECK-NEXT: [[ADD_4_4:%.*]] = add nuw nsw i32 [[ADD_3_4]], [[CONV_4_4]]
	; CHECK-NEXT: [[MUL_4_4:%.*]] = mul nuw nsw i32 [[CONV_4_4]], [[CONV_4_4]]
	; CHECK-NEXT: [[ADD11_4_4:%.*]] = add i32 [[MUL_4_4]], [[ADD11_3_4]]
	; CHECK-NEXT: [[ARRAYIDX_5_4:%.*]] = getelementptr inbounds i16, ptr [[ADD_PTR_3]], i64 5
	; CHECK-NEXT: [[TMP37:%.*]] = load i16, ptr [[ARRAYIDX_5_4]], align 2
	; CHECK-NEXT: [[CONV_5_4:%.*]] = zext i16 [[TMP37]] to i32
	; CHECK-NEXT: [[ADD_5_4:%.*]] = add nuw nsw i32 [[ADD_4_4]], [[CONV_5_4]]
	; CHECK-NEXT: [[MUL_5_4:%.*]] = mul nuw nsw i32 [[CONV_5_4]], [[CONV_5_4]]
	; CHECK-NEXT: [[ADD11_5_4:%.*]] = add i32 [[MUL_5_4]], [[ADD11_4_4]]
	; CHECK-NEXT: [[ARRAYIDX_6_4:%.*]] = getelementptr inbounds i16, ptr [[ADD_PTR_3]], i64 6
	; CHECK-NEXT: [[TMP38:%.*]] = load i16, ptr [[ARRAYIDX_6_4]], align 2
	; CHECK-NEXT: [[CONV_6_4:%.*]] = zext i16 [[TMP38]] to i32
	; CHECK-NEXT: [[ADD_6_4:%.*]] = add nuw nsw i32 [[ADD_5_4]], [[CONV_6_4]]
	; CHECK-NEXT: [[MUL_6_4:%.*]] = mul nuw nsw i32 [[CONV_6_4]], [[CONV_6_4]]
	; CHECK-NEXT: [[ADD11_6_4:%.*]] = add i32 [[MUL_6_4]], [[ADD11_5_4]]
	; CHECK-NEXT: [[ARRAYIDX_7_4:%.*]] = getelementptr inbounds i16, ptr [[ADD_PTR_3]], i64 7
	; CHECK-NEXT: [[TMP39:%.*]] = load i16, ptr [[ARRAYIDX_7_4]], align 2
	; CHECK-NEXT: [[CONV_7_4:%.*]] = zext i16 [[TMP39]] to i32
	; CHECK-NEXT: [[ADD_7_4:%.*]] = add nuw nsw i32 [[ADD_6_4]], [[CONV_7_4]]
	; CHECK-NEXT: [[MUL_7_4:%.*]] = mul nuw nsw i32 [[CONV_7_4]], [[CONV_7_4]]
	; CHECK-NEXT: [[ADD11_7_4:%.*]] = add i32 [[MUL_7_4]], [[ADD11_6_4]]
	; CHECK-NEXT: [[ADD_PTR_4:%.*]] = getelementptr inbounds i16, ptr [[ADD_PTR_3]], i64 [[IDX_EXT]]			; CHECK-NEXT: [[ADD_PTR_4:%.*]] = getelementptr inbounds i16, ptr [[ADD_PTR_3]], i64 [[IDX_EXT]]
	; CHECK-NEXT: [[TMP40:%.*]] = load i16, ptr [[ADD_PTR_4]], align 2			; CHECK-NEXT: [[TMP5:%.*]] = load <8 x i16>, ptr [[ADD_PTR_4]], align 2
	; CHECK-NEXT: [[CONV_556:%.*]] = zext i16 [[TMP40]] to i32
	; CHECK-NEXT: [[ADD_557:%.*]] = add nuw nsw i32 [[ADD_7_4]], [[CONV_556]]
	; CHECK-NEXT: [[MUL_558:%.*]] = mul nuw nsw i32 [[CONV_556]], [[CONV_556]]
	; CHECK-NEXT: [[ADD11_559:%.*]] = add i32 [[MUL_558]], [[ADD11_7_4]]
	; CHECK-NEXT: [[ARRAYIDX_1_5:%.*]] = getelementptr inbounds i16, ptr [[ADD_PTR_4]], i64 1
	; CHECK-NEXT: [[TMP41:%.*]] = load i16, ptr [[ARRAYIDX_1_5]], align 2
	; CHECK-NEXT: [[CONV_1_5:%.*]] = zext i16 [[TMP41]] to i32
	; CHECK-NEXT: [[ADD_1_5:%.*]] = add nuw nsw i32 [[ADD_557]], [[CONV_1_5]]
	; CHECK-NEXT: [[MUL_1_5:%.*]] = mul nuw nsw i32 [[CONV_1_5]], [[CONV_1_5]]
	; CHECK-NEXT: [[ADD11_1_5:%.*]] = add i32 [[MUL_1_5]], [[ADD11_559]]
	; CHECK-NEXT: [[ARRAYIDX_2_5:%.*]] = getelementptr inbounds i16, ptr [[ADD_PTR_4]], i64 2
	; CHECK-NEXT: [[TMP42:%.*]] = load i16, ptr [[ARRAYIDX_2_5]], align 2
	; CHECK-NEXT: [[CONV_2_5:%.*]] = zext i16 [[TMP42]] to i32
	; CHECK-NEXT: [[ADD_2_5:%.*]] = add nuw nsw i32 [[ADD_1_5]], [[CONV_2_5]]
	; CHECK-NEXT: [[MUL_2_5:%.*]] = mul nuw nsw i32 [[CONV_2_5]], [[CONV_2_5]]
	; CHECK-NEXT: [[ADD11_2_5:%.*]] = add i32 [[MUL_2_5]], [[ADD11_1_5]]
	; CHECK-NEXT: [[ARRAYIDX_3_5:%.*]] = getelementptr inbounds i16, ptr [[ADD_PTR_4]], i64 3
	; CHECK-NEXT: [[TMP43:%.*]] = load i16, ptr [[ARRAYIDX_3_5]], align 2
	; CHECK-NEXT: [[CONV_3_5:%.*]] = zext i16 [[TMP43]] to i32
	; CHECK-NEXT: [[ADD_3_5:%.*]] = add nuw nsw i32 [[ADD_2_5]], [[CONV_3_5]]
	; CHECK-NEXT: [[MUL_3_5:%.*]] = mul nuw nsw i32 [[CONV_3_5]], [[CONV_3_5]]
	; CHECK-NEXT: [[ADD11_3_5:%.*]] = add i32 [[MUL_3_5]], [[ADD11_2_5]]
	; CHECK-NEXT: [[ARRAYIDX_4_5:%.*]] = getelementptr inbounds i16, ptr [[ADD_PTR_4]], i64 4
	; CHECK-NEXT: [[TMP44:%.*]] = load i16, ptr [[ARRAYIDX_4_5]], align 2
	; CHECK-NEXT: [[CONV_4_5:%.*]] = zext i16 [[TMP44]] to i32
	; CHECK-NEXT: [[ADD_4_5:%.*]] = add nuw nsw i32 [[ADD_3_5]], [[CONV_4_5]]
	; CHECK-NEXT: [[MUL_4_5:%.*]] = mul nuw nsw i32 [[CONV_4_5]], [[CONV_4_5]]
	; CHECK-NEXT: [[ADD11_4_5:%.*]] = add i32 [[MUL_4_5]], [[ADD11_3_5]]
	; CHECK-NEXT: [[ARRAYIDX_5_5:%.*]] = getelementptr inbounds i16, ptr [[ADD_PTR_4]], i64 5
	; CHECK-NEXT: [[TMP45:%.*]] = load i16, ptr [[ARRAYIDX_5_5]], align 2
	; CHECK-NEXT: [[CONV_5_5:%.*]] = zext i16 [[TMP45]] to i32
	; CHECK-NEXT: [[ADD_5_5:%.*]] = add nuw nsw i32 [[ADD_4_5]], [[CONV_5_5]]
	; CHECK-NEXT: [[MUL_5_5:%.*]] = mul nuw nsw i32 [[CONV_5_5]], [[CONV_5_5]]
	; CHECK-NEXT: [[ADD11_5_5:%.*]] = add i32 [[MUL_5_5]], [[ADD11_4_5]]
	; CHECK-NEXT: [[ARRAYIDX_6_5:%.*]] = getelementptr inbounds i16, ptr [[ADD_PTR_4]], i64 6
	; CHECK-NEXT: [[TMP46:%.*]] = load i16, ptr [[ARRAYIDX_6_5]], align 2
	; CHECK-NEXT: [[CONV_6_5:%.*]] = zext i16 [[TMP46]] to i32
	; CHECK-NEXT: [[ADD_6_5:%.*]] = add nuw nsw i32 [[ADD_5_5]], [[CONV_6_5]]
	; CHECK-NEXT: [[MUL_6_5:%.*]] = mul nuw nsw i32 [[CONV_6_5]], [[CONV_6_5]]
	; CHECK-NEXT: [[ADD11_6_5:%.*]] = add i32 [[MUL_6_5]], [[ADD11_5_5]]
	; CHECK-NEXT: [[ARRAYIDX_7_5:%.*]] = getelementptr inbounds i16, ptr [[ADD_PTR_4]], i64 7
	; CHECK-NEXT: [[TMP47:%.*]] = load i16, ptr [[ARRAYIDX_7_5]], align 2
	; CHECK-NEXT: [[CONV_7_5:%.*]] = zext i16 [[TMP47]] to i32
	; CHECK-NEXT: [[ADD_7_5:%.*]] = add nuw nsw i32 [[ADD_6_5]], [[CONV_7_5]]
	; CHECK-NEXT: [[MUL_7_5:%.*]] = mul nuw nsw i32 [[CONV_7_5]], [[CONV_7_5]]
	; CHECK-NEXT: [[ADD11_7_5:%.*]] = add i32 [[MUL_7_5]], [[ADD11_6_5]]
	; CHECK-NEXT: [[ADD_PTR_5:%.*]] = getelementptr inbounds i16, ptr [[ADD_PTR_4]], i64 [[IDX_EXT]]			; CHECK-NEXT: [[ADD_PTR_5:%.*]] = getelementptr inbounds i16, ptr [[ADD_PTR_4]], i64 [[IDX_EXT]]
	; CHECK-NEXT: [[TMP48:%.*]] = load i16, ptr [[ADD_PTR_5]], align 2			; CHECK-NEXT: [[TMP6:%.*]] = load <8 x i16>, ptr [[ADD_PTR_5]], align 2
	; CHECK-NEXT: [[CONV_660:%.*]] = zext i16 [[TMP48]] to i32
	; CHECK-NEXT: [[ADD_661:%.*]] = add nuw nsw i32 [[ADD_7_5]], [[CONV_660]]
	; CHECK-NEXT: [[MUL_662:%.*]] = mul nuw nsw i32 [[CONV_660]], [[CONV_660]]
	; CHECK-NEXT: [[ADD11_663:%.*]] = add i32 [[MUL_662]], [[ADD11_7_5]]
	; CHECK-NEXT: [[ARRAYIDX_1_6:%.*]] = getelementptr inbounds i16, ptr [[ADD_PTR_5]], i64 1
	; CHECK-NEXT: [[TMP49:%.*]] = load i16, ptr [[ARRAYIDX_1_6]], align 2
	; CHECK-NEXT: [[CONV_1_6:%.*]] = zext i16 [[TMP49]] to i32
	; CHECK-NEXT: [[ADD_1_6:%.*]] = add nuw nsw i32 [[ADD_661]], [[CONV_1_6]]
	; CHECK-NEXT: [[MUL_1_6:%.*]] = mul nuw nsw i32 [[CONV_1_6]], [[CONV_1_6]]
	; CHECK-NEXT: [[ADD11_1_6:%.*]] = add i32 [[MUL_1_6]], [[ADD11_663]]
	; CHECK-NEXT: [[ARRAYIDX_2_6:%.*]] = getelementptr inbounds i16, ptr [[ADD_PTR_5]], i64 2
	; CHECK-NEXT: [[TMP50:%.*]] = load i16, ptr [[ARRAYIDX_2_6]], align 2
	; CHECK-NEXT: [[CONV_2_6:%.*]] = zext i16 [[TMP50]] to i32
	; CHECK-NEXT: [[ADD_2_6:%.*]] = add nuw nsw i32 [[ADD_1_6]], [[CONV_2_6]]
	; CHECK-NEXT: [[MUL_2_6:%.*]] = mul nuw nsw i32 [[CONV_2_6]], [[CONV_2_6]]
	; CHECK-NEXT: [[ADD11_2_6:%.*]] = add i32 [[MUL_2_6]], [[ADD11_1_6]]
	; CHECK-NEXT: [[ARRAYIDX_3_6:%.*]] = getelementptr inbounds i16, ptr [[ADD_PTR_5]], i64 3
	; CHECK-NEXT: [[TMP51:%.*]] = load i16, ptr [[ARRAYIDX_3_6]], align 2
	; CHECK-NEXT: [[CONV_3_6:%.*]] = zext i16 [[TMP51]] to i32
	; CHECK-NEXT: [[ADD_3_6:%.*]] = add nuw nsw i32 [[ADD_2_6]], [[CONV_3_6]]
	; CHECK-NEXT: [[MUL_3_6:%.*]] = mul nuw nsw i32 [[CONV_3_6]], [[CONV_3_6]]
	; CHECK-NEXT: [[ADD11_3_6:%.*]] = add i32 [[MUL_3_6]], [[ADD11_2_6]]
	; CHECK-NEXT: [[ARRAYIDX_4_6:%.*]] = getelementptr inbounds i16, ptr [[ADD_PTR_5]], i64 4
	; CHECK-NEXT: [[TMP52:%.*]] = load i16, ptr [[ARRAYIDX_4_6]], align 2
	; CHECK-NEXT: [[CONV_4_6:%.*]] = zext i16 [[TMP52]] to i32
	; CHECK-NEXT: [[ADD_4_6:%.*]] = add nuw nsw i32 [[ADD_3_6]], [[CONV_4_6]]
	; CHECK-NEXT: [[MUL_4_6:%.*]] = mul nuw nsw i32 [[CONV_4_6]], [[CONV_4_6]]
	; CHECK-NEXT: [[ADD11_4_6:%.*]] = add i32 [[MUL_4_6]], [[ADD11_3_6]]
	; CHECK-NEXT: [[ARRAYIDX_5_6:%.*]] = getelementptr inbounds i16, ptr [[ADD_PTR_5]], i64 5
	; CHECK-NEXT: [[TMP53:%.*]] = load i16, ptr [[ARRAYIDX_5_6]], align 2
	; CHECK-NEXT: [[CONV_5_6:%.*]] = zext i16 [[TMP53]] to i32
	; CHECK-NEXT: [[ADD_5_6:%.*]] = add nuw nsw i32 [[ADD_4_6]], [[CONV_5_6]]
	; CHECK-NEXT: [[MUL_5_6:%.*]] = mul nuw nsw i32 [[CONV_5_6]], [[CONV_5_6]]
	; CHECK-NEXT: [[ADD11_5_6:%.*]] = add i32 [[MUL_5_6]], [[ADD11_4_6]]
	; CHECK-NEXT: [[ARRAYIDX_6_6:%.*]] = getelementptr inbounds i16, ptr [[ADD_PTR_5]], i64 6
	; CHECK-NEXT: [[TMP54:%.*]] = load i16, ptr [[ARRAYIDX_6_6]], align 2
	; CHECK-NEXT: [[CONV_6_6:%.*]] = zext i16 [[TMP54]] to i32
	; CHECK-NEXT: [[ADD_6_6:%.*]] = add nuw nsw i32 [[ADD_5_6]], [[CONV_6_6]]
	; CHECK-NEXT: [[MUL_6_6:%.*]] = mul nuw nsw i32 [[CONV_6_6]], [[CONV_6_6]]
	; CHECK-NEXT: [[ADD11_6_6:%.*]] = add i32 [[MUL_6_6]], [[ADD11_5_6]]
	; CHECK-NEXT: [[ARRAYIDX_7_6:%.*]] = getelementptr inbounds i16, ptr [[ADD_PTR_5]], i64 7
	; CHECK-NEXT: [[TMP55:%.*]] = load i16, ptr [[ARRAYIDX_7_6]], align 2
	; CHECK-NEXT: [[CONV_7_6:%.*]] = zext i16 [[TMP55]] to i32
	; CHECK-NEXT: [[ADD_7_6:%.*]] = add nuw nsw i32 [[ADD_6_6]], [[CONV_7_6]]
	; CHECK-NEXT: [[MUL_7_6:%.*]] = mul nuw nsw i32 [[CONV_7_6]], [[CONV_7_6]]
	; CHECK-NEXT: [[ADD11_7_6:%.*]] = add i32 [[MUL_7_6]], [[ADD11_6_6]]
	; CHECK-NEXT: [[ADD_PTR_6:%.*]] = getelementptr inbounds i16, ptr [[ADD_PTR_5]], i64 [[IDX_EXT]]			; CHECK-NEXT: [[ADD_PTR_6:%.*]] = getelementptr inbounds i16, ptr [[ADD_PTR_5]], i64 [[IDX_EXT]]
	; CHECK-NEXT: [[TMP56:%.*]] = load i16, ptr [[ADD_PTR_6]], align 2			; CHECK-NEXT: [[TMP7:%.*]] = load <8 x i16>, ptr [[ADD_PTR_6]], align 2
	; CHECK-NEXT: [[CONV_764:%.*]] = zext i16 [[TMP56]] to i32			; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <8 x i16> [[TMP0]], <8 x i16> poison, <64 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
	; CHECK-NEXT: [[ADD_765:%.*]] = add nuw nsw i32 [[ADD_7_6]], [[CONV_764]]			; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <8 x i16> [[TMP1]], <8 x i16> poison, <64 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
	; CHECK-NEXT: [[MUL_766:%.*]] = mul nuw nsw i32 [[CONV_764]], [[CONV_764]]			; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <64 x i16> [[TMP8]], <64 x i16> [[TMP9]], <64 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 64, i32 65, i32 66, i32 67, i32 68, i32 69, i32 70, i32 71, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31, i32 32, i32 33, i32 34, i32 35, i32 36, i32 37, i32 38, i32 39, i32 40, i32 41, i32 42, i32 43, i32 44, i32 45, i32 46, i32 47, i32 48, i32 49, i32 50, i32 51, i32 52, i32 53, i32 54, i32 55, i32 56, i32 57, i32 58, i32 59, i32 60, i32 61, i32 62, i32 63>
	; CHECK-NEXT: [[ADD11_767:%.*]] = add i32 [[MUL_766]], [[ADD11_7_6]]			; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <8 x i16> [[TMP2]], <8 x i16> poison, <64 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
	; CHECK-NEXT: [[ARRAYIDX_1_7:%.*]] = getelementptr inbounds i16, ptr [[ADD_PTR_6]], i64 1			; CHECK-NEXT: [[TMP12:%.*]] = shufflevector <64 x i16> [[TMP10]], <64 x i16> [[TMP11]], <64 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 64, i32 65, i32 66, i32 67, i32 68, i32 69, i32 70, i32 71, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31, i32 32, i32 33, i32 34, i32 35, i32 36, i32 37, i32 38, i32 39, i32 40, i32 41, i32 42, i32 43, i32 44, i32 45, i32 46, i32 47, i32 48, i32 49, i32 50, i32 51, i32 52, i32 53, i32 54, i32 55, i32 56, i32 57, i32 58, i32 59, i32 60, i32 61, i32 62, i32 63>
	; CHECK-NEXT: [[TMP57:%.*]] = load i16, ptr [[ARRAYIDX_1_7]], align 2			; CHECK-NEXT: [[TMP13:%.*]] = shufflevector <8 x i16> [[TMP3]], <8 x i16> poison, <64 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
	; CHECK-NEXT: [[CONV_1_7:%.*]] = zext i16 [[TMP57]] to i32			; CHECK-NEXT: [[TMP14:%.*]] = shufflevector <64 x i16> [[TMP12]], <64 x i16> [[TMP13]], <64 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 64, i32 65, i32 66, i32 67, i32 68, i32 69, i32 70, i32 71, i32 32, i32 33, i32 34, i32 35, i32 36, i32 37, i32 38, i32 39, i32 40, i32 41, i32 42, i32 43, i32 44, i32 45, i32 46, i32 47, i32 48, i32 49, i32 50, i32 51, i32 52, i32 53, i32 54, i32 55, i32 56, i32 57, i32 58, i32 59, i32 60, i32 61, i32 62, i32 63>
	; CHECK-NEXT: [[ADD_1_7:%.*]] = add nuw nsw i32 [[ADD_765]], [[CONV_1_7]]			; CHECK-NEXT: [[TMP15:%.*]] = shufflevector <8 x i16> [[TMP4]], <8 x i16> poison, <64 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
	; CHECK-NEXT: [[MUL_1_7:%.*]] = mul nuw nsw i32 [[CONV_1_7]], [[CONV_1_7]]			; CHECK-NEXT: [[TMP16:%.*]] = shufflevector <64 x i16> [[TMP14]], <64 x i16> [[TMP15]], <64 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31, i32 64, i32 65, i32 66, i32 67, i32 68, i32 69, i32 70, i32 71, i32 40, i32 41, i32 42, i32 43, i32 44, i32 45, i32 46, i32 47, i32 48, i32 49, i32 50, i32 51, i32 52, i32 53, i32 54, i32 55, i32 56, i32 57, i32 58, i32 59, i32 60, i32 61, i32 62, i32 63>
	; CHECK-NEXT: [[ADD11_1_7:%.*]] = add i32 [[MUL_1_7]], [[ADD11_767]]			; CHECK-NEXT: [[TMP17:%.*]] = shufflevector <8 x i16> [[TMP5]], <8 x i16> poison, <64 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
	; CHECK-NEXT: [[ARRAYIDX_2_7:%.*]] = getelementptr inbounds i16, ptr [[ADD_PTR_6]], i64 2			; CHECK-NEXT: [[TMP18:%.*]] = shufflevector <64 x i16> [[TMP16]], <64 x i16> [[TMP17]], <64 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31, i32 32, i32 33, i32 34, i32 35, i32 36, i32 37, i32 38, i32 39, i32 64, i32 65, i32 66, i32 67, i32 68, i32 69, i32 70, i32 71, i32 48, i32 49, i32 50, i32 51, i32 52, i32 53, i32 54, i32 55, i32 56, i32 57, i32 58, i32 59, i32 60, i32 61, i32 62, i32 63>
	; CHECK-NEXT: [[TMP58:%.*]] = load i16, ptr [[ARRAYIDX_2_7]], align 2			; CHECK-NEXT: [[TMP19:%.*]] = shufflevector <8 x i16> [[TMP6]], <8 x i16> poison, <64 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
	; CHECK-NEXT: [[CONV_2_7:%.*]] = zext i16 [[TMP58]] to i32			; CHECK-NEXT: [[TMP20:%.*]] = shufflevector <64 x i16> [[TMP18]], <64 x i16> [[TMP19]], <64 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31, i32 32, i32 33, i32 34, i32 35, i32 36, i32 37, i32 38, i32 39, i32 40, i32 41, i32 42, i32 43, i32 44, i32 45, i32 46, i32 47, i32 64, i32 65, i32 66, i32 67, i32 68, i32 69, i32 70, i32 71, i32 56, i32 57, i32 58, i32 59, i32 60, i32 61, i32 62, i32 63>
	; CHECK-NEXT: [[ADD_2_7:%.*]] = add nuw nsw i32 [[ADD_1_7]], [[CONV_2_7]]			; CHECK-NEXT: [[TMP21:%.*]] = shufflevector <8 x i16> [[TMP7]], <8 x i16> poison, <64 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
	; CHECK-NEXT: [[MUL_2_7:%.*]] = mul nuw nsw i32 [[CONV_2_7]], [[CONV_2_7]]			; CHECK-NEXT: [[TMP22:%.*]] = shufflevector <64 x i16> [[TMP20]], <64 x i16> [[TMP21]], <64 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31, i32 32, i32 33, i32 34, i32 35, i32 36, i32 37, i32 38, i32 39, i32 40, i32 41, i32 42, i32 43, i32 44, i32 45, i32 46, i32 47, i32 48, i32 49, i32 50, i32 51, i32 52, i32 53, i32 54, i32 55, i32 64, i32 65, i32 66, i32 67, i32 68, i32 69, i32 70, i32 71>
	; CHECK-NEXT: [[ADD11_2_7:%.*]] = add i32 [[MUL_2_7]], [[ADD11_1_7]]			; CHECK-NEXT: [[TMP23:%.*]] = zext <64 x i16> [[TMP22]] to <64 x i32>
	; CHECK-NEXT: [[ARRAYIDX_3_7:%.*]] = getelementptr inbounds i16, ptr [[ADD_PTR_6]], i64 3			; CHECK-NEXT: [[TMP24:%.*]] = extractelement <64 x i32> [[TMP23]], i32 0
	; CHECK-NEXT: [[TMP59:%.*]] = load i16, ptr [[ARRAYIDX_3_7]], align 2			; CHECK-NEXT: [[TMP25:%.*]] = extractelement <64 x i32> [[TMP23]], i32 1
	; CHECK-NEXT: [[CONV_3_7:%.*]] = zext i16 [[TMP59]] to i32			; CHECK-NEXT: [[ADD_1:%.*]] = add nuw nsw i32 [[TMP24]], [[TMP25]]
	; CHECK-NEXT: [[ADD_3_7:%.*]] = add nuw nsw i32 [[ADD_2_7]], [[CONV_3_7]]			; CHECK-NEXT: [[TMP26:%.*]] = mul nuw nsw <64 x i32> [[TMP23]], [[TMP23]]
	; CHECK-NEXT: [[MUL_3_7:%.*]] = mul nuw nsw i32 [[CONV_3_7]], [[CONV_3_7]]			; CHECK-NEXT: [[TMP27:%.*]] = extractelement <64 x i32> [[TMP23]], i32 2
	; CHECK-NEXT: [[ADD11_3_7:%.*]] = add i32 [[MUL_3_7]], [[ADD11_2_7]]			; CHECK-NEXT: [[ADD_2:%.*]] = add nuw nsw i32 [[ADD_1]], [[TMP27]]
	; CHECK-NEXT: [[ARRAYIDX_4_7:%.*]] = getelementptr inbounds i16, ptr [[ADD_PTR_6]], i64 4			; CHECK-NEXT: [[TMP28:%.*]] = extractelement <64 x i32> [[TMP23]], i32 3
	; CHECK-NEXT: [[TMP60:%.*]] = load i16, ptr [[ARRAYIDX_4_7]], align 2			; CHECK-NEXT: [[ADD_3:%.*]] = add nuw nsw i32 [[ADD_2]], [[TMP28]]
	; CHECK-NEXT: [[CONV_4_7:%.*]] = zext i16 [[TMP60]] to i32			; CHECK-NEXT: [[TMP29:%.*]] = extractelement <64 x i32> [[TMP23]], i32 4
	; CHECK-NEXT: [[ADD_4_7:%.*]] = add nuw nsw i32 [[ADD_3_7]], [[CONV_4_7]]			; CHECK-NEXT: [[ADD_4:%.*]] = add nuw nsw i32 [[ADD_3]], [[TMP29]]
	; CHECK-NEXT: [[MUL_4_7:%.*]] = mul nuw nsw i32 [[CONV_4_7]], [[CONV_4_7]]			; CHECK-NEXT: [[TMP30:%.*]] = extractelement <64 x i32> [[TMP23]], i32 5
	; CHECK-NEXT: [[ADD11_4_7:%.*]] = add i32 [[MUL_4_7]], [[ADD11_3_7]]			; CHECK-NEXT: [[ADD_5:%.*]] = add nuw nsw i32 [[ADD_4]], [[TMP30]]
	; CHECK-NEXT: [[ARRAYIDX_5_7:%.*]] = getelementptr inbounds i16, ptr [[ADD_PTR_6]], i64 5			; CHECK-NEXT: [[TMP31:%.*]] = extractelement <64 x i32> [[TMP23]], i32 6
	; CHECK-NEXT: [[TMP61:%.*]] = load i16, ptr [[ARRAYIDX_5_7]], align 2			; CHECK-NEXT: [[ADD_6:%.*]] = add nuw nsw i32 [[ADD_5]], [[TMP31]]
	; CHECK-NEXT: [[CONV_5_7:%.*]] = zext i16 [[TMP61]] to i32			; CHECK-NEXT: [[TMP32:%.*]] = extractelement <64 x i32> [[TMP23]], i32 7
	; CHECK-NEXT: [[ADD_5_7:%.*]] = add nuw nsw i32 [[ADD_4_7]], [[CONV_5_7]]			; CHECK-NEXT: [[ADD_7:%.*]] = add nuw nsw i32 [[ADD_6]], [[TMP32]]
	; CHECK-NEXT: [[MUL_5_7:%.*]] = mul nuw nsw i32 [[CONV_5_7]], [[CONV_5_7]]			; CHECK-NEXT: [[TMP33:%.*]] = extractelement <64 x i32> [[TMP23]], i32 8
	; CHECK-NEXT: [[ADD11_5_7:%.*]] = add i32 [[MUL_5_7]], [[ADD11_4_7]]			; CHECK-NEXT: [[ADD_141:%.*]] = add nuw nsw i32 [[ADD_7]], [[TMP33]]
	; CHECK-NEXT: [[ARRAYIDX_6_7:%.*]] = getelementptr inbounds i16, ptr [[ADD_PTR_6]], i64 6			; CHECK-NEXT: [[TMP34:%.*]] = extractelement <64 x i32> [[TMP23]], i32 9
	; CHECK-NEXT: [[TMP62:%.*]] = load i16, ptr [[ARRAYIDX_6_7]], align 2			; CHECK-NEXT: [[ADD_1_1:%.*]] = add nuw nsw i32 [[ADD_141]], [[TMP34]]
	; CHECK-NEXT: [[CONV_6_7:%.*]] = zext i16 [[TMP62]] to i32			; CHECK-NEXT: [[TMP35:%.*]] = extractelement <64 x i32> [[TMP23]], i32 10
	; CHECK-NEXT: [[ADD_6_7:%.*]] = add nuw nsw i32 [[ADD_5_7]], [[CONV_6_7]]			; CHECK-NEXT: [[ADD_2_1:%.*]] = add nuw nsw i32 [[ADD_1_1]], [[TMP35]]
	; CHECK-NEXT: [[MUL_6_7:%.*]] = mul nuw nsw i32 [[CONV_6_7]], [[CONV_6_7]]			; CHECK-NEXT: [[TMP36:%.*]] = extractelement <64 x i32> [[TMP23]], i32 11
	; CHECK-NEXT: [[ADD11_6_7:%.*]] = add i32 [[MUL_6_7]], [[ADD11_5_7]]			; CHECK-NEXT: [[ADD_3_1:%.*]] = add nuw nsw i32 [[ADD_2_1]], [[TMP36]]
	; CHECK-NEXT: [[ARRAYIDX_7_7:%.*]] = getelementptr inbounds i16, ptr [[ADD_PTR_6]], i64 7			; CHECK-NEXT: [[TMP37:%.*]] = extractelement <64 x i32> [[TMP23]], i32 12
	; CHECK-NEXT: [[TMP63:%.*]] = load i16, ptr [[ARRAYIDX_7_7]], align 2			; CHECK-NEXT: [[ADD_4_1:%.*]] = add nuw nsw i32 [[ADD_3_1]], [[TMP37]]
	; CHECK-NEXT: [[CONV_7_7:%.*]] = zext i16 [[TMP63]] to i32			; CHECK-NEXT: [[TMP38:%.*]] = extractelement <64 x i32> [[TMP23]], i32 13
	; CHECK-NEXT: [[ADD_7_7:%.*]] = add nuw nsw i32 [[ADD_6_7]], [[CONV_7_7]]			; CHECK-NEXT: [[ADD_5_1:%.*]] = add nuw nsw i32 [[ADD_4_1]], [[TMP38]]
	; CHECK-NEXT: [[MUL_7_7:%.*]] = mul nuw nsw i32 [[CONV_7_7]], [[CONV_7_7]]			; CHECK-NEXT: [[TMP39:%.*]] = extractelement <64 x i32> [[TMP23]], i32 14
	; CHECK-NEXT: [[ADD11_7_7:%.*]] = add i32 [[MUL_7_7]], [[ADD11_6_7]]			; CHECK-NEXT: [[ADD_6_1:%.*]] = add nuw nsw i32 [[ADD_5_1]], [[TMP39]]
				; CHECK-NEXT: [[TMP40:%.*]] = extractelement <64 x i32> [[TMP23]], i32 15
				; CHECK-NEXT: [[ADD_7_1:%.*]] = add nuw nsw i32 [[ADD_6_1]], [[TMP40]]
				; CHECK-NEXT: [[TMP41:%.*]] = extractelement <64 x i32> [[TMP23]], i32 16
				; CHECK-NEXT: [[ADD_245:%.*]] = add nuw nsw i32 [[ADD_7_1]], [[TMP41]]
				; CHECK-NEXT: [[TMP42:%.*]] = extractelement <64 x i32> [[TMP23]], i32 17
				; CHECK-NEXT: [[ADD_1_2:%.*]] = add nuw nsw i32 [[ADD_245]], [[TMP42]]
				; CHECK-NEXT: [[TMP43:%.*]] = extractelement <64 x i32> [[TMP23]], i32 18
				; CHECK-NEXT: [[ADD_2_2:%.*]] = add nuw nsw i32 [[ADD_1_2]], [[TMP43]]
				; CHECK-NEXT: [[TMP44:%.*]] = extractelement <64 x i32> [[TMP23]], i32 19
				; CHECK-NEXT: [[ADD_3_2:%.*]] = add nuw nsw i32 [[ADD_2_2]], [[TMP44]]
				; CHECK-NEXT: [[TMP45:%.*]] = extractelement <64 x i32> [[TMP23]], i32 20
				; CHECK-NEXT: [[ADD_4_2:%.*]] = add nuw nsw i32 [[ADD_3_2]], [[TMP45]]
				; CHECK-NEXT: [[TMP46:%.*]] = extractelement <64 x i32> [[TMP23]], i32 21
				; CHECK-NEXT: [[ADD_5_2:%.*]] = add nuw nsw i32 [[ADD_4_2]], [[TMP46]]
				; CHECK-NEXT: [[TMP47:%.*]] = extractelement <64 x i32> [[TMP23]], i32 22
				; CHECK-NEXT: [[ADD_6_2:%.*]] = add nuw nsw i32 [[ADD_5_2]], [[TMP47]]
				; CHECK-NEXT: [[TMP48:%.*]] = extractelement <64 x i32> [[TMP23]], i32 23
				; CHECK-NEXT: [[ADD_7_2:%.*]] = add nuw nsw i32 [[ADD_6_2]], [[TMP48]]
				; CHECK-NEXT: [[TMP49:%.*]] = extractelement <64 x i32> [[TMP23]], i32 24
				; CHECK-NEXT: [[ADD_349:%.*]] = add nuw nsw i32 [[ADD_7_2]], [[TMP49]]
				; CHECK-NEXT: [[TMP50:%.*]] = extractelement <64 x i32> [[TMP23]], i32 25
				; CHECK-NEXT: [[ADD_1_3:%.*]] = add nuw nsw i32 [[ADD_349]], [[TMP50]]
				; CHECK-NEXT: [[TMP51:%.*]] = extractelement <64 x i32> [[TMP23]], i32 26
				; CHECK-NEXT: [[ADD_2_3:%.*]] = add nuw nsw i32 [[ADD_1_3]], [[TMP51]]
				; CHECK-NEXT: [[TMP52:%.*]] = extractelement <64 x i32> [[TMP23]], i32 27
				; CHECK-NEXT: [[ADD_3_3:%.*]] = add nuw nsw i32 [[ADD_2_3]], [[TMP52]]
				; CHECK-NEXT: [[TMP53:%.*]] = extractelement <64 x i32> [[TMP23]], i32 28
				; CHECK-NEXT: [[ADD_4_3:%.*]] = add nuw nsw i32 [[ADD_3_3]], [[TMP53]]
				; CHECK-NEXT: [[TMP54:%.*]] = extractelement <64 x i32> [[TMP23]], i32 29
				; CHECK-NEXT: [[ADD_5_3:%.*]] = add nuw nsw i32 [[ADD_4_3]], [[TMP54]]
				; CHECK-NEXT: [[TMP55:%.*]] = extractelement <64 x i32> [[TMP23]], i32 30
				; CHECK-NEXT: [[ADD_6_3:%.*]] = add nuw nsw i32 [[ADD_5_3]], [[TMP55]]
				; CHECK-NEXT: [[TMP56:%.*]] = extractelement <64 x i32> [[TMP23]], i32 31
				; CHECK-NEXT: [[ADD_7_3:%.*]] = add nuw nsw i32 [[ADD_6_3]], [[TMP56]]
				; CHECK-NEXT: [[TMP57:%.*]] = extractelement <64 x i32> [[TMP23]], i32 32
				; CHECK-NEXT: [[ADD_453:%.*]] = add nuw nsw i32 [[ADD_7_3]], [[TMP57]]
				; CHECK-NEXT: [[TMP58:%.*]] = extractelement <64 x i32> [[TMP23]], i32 33
				; CHECK-NEXT: [[ADD_1_4:%.*]] = add nuw nsw i32 [[ADD_453]], [[TMP58]]
				; CHECK-NEXT: [[TMP59:%.*]] = extractelement <64 x i32> [[TMP23]], i32 34
				; CHECK-NEXT: [[ADD_2_4:%.*]] = add nuw nsw i32 [[ADD_1_4]], [[TMP59]]
				; CHECK-NEXT: [[TMP60:%.*]] = extractelement <64 x i32> [[TMP23]], i32 35
				; CHECK-NEXT: [[ADD_3_4:%.*]] = add nuw nsw i32 [[ADD_2_4]], [[TMP60]]
				; CHECK-NEXT: [[TMP61:%.*]] = extractelement <64 x i32> [[TMP23]], i32 36
				; CHECK-NEXT: [[ADD_4_4:%.*]] = add nuw nsw i32 [[ADD_3_4]], [[TMP61]]
				; CHECK-NEXT: [[TMP62:%.*]] = extractelement <64 x i32> [[TMP23]], i32 37
				; CHECK-NEXT: [[ADD_5_4:%.*]] = add nuw nsw i32 [[ADD_4_4]], [[TMP62]]
				; CHECK-NEXT: [[TMP63:%.*]] = extractelement <64 x i32> [[TMP23]], i32 38
				; CHECK-NEXT: [[ADD_6_4:%.*]] = add nuw nsw i32 [[ADD_5_4]], [[TMP63]]
				; CHECK-NEXT: [[TMP64:%.*]] = extractelement <64 x i32> [[TMP23]], i32 39
				; CHECK-NEXT: [[ADD_7_4:%.*]] = add nuw nsw i32 [[ADD_6_4]], [[TMP64]]
				; CHECK-NEXT: [[TMP65:%.*]] = extractelement <64 x i32> [[TMP23]], i32 40
				; CHECK-NEXT: [[ADD_557:%.*]] = add nuw nsw i32 [[ADD_7_4]], [[TMP65]]
				; CHECK-NEXT: [[TMP66:%.*]] = extractelement <64 x i32> [[TMP23]], i32 41
				; CHECK-NEXT: [[ADD_1_5:%.*]] = add nuw nsw i32 [[ADD_557]], [[TMP66]]
				; CHECK-NEXT: [[TMP67:%.*]] = extractelement <64 x i32> [[TMP23]], i32 42
				; CHECK-NEXT: [[ADD_2_5:%.*]] = add nuw nsw i32 [[ADD_1_5]], [[TMP67]]
				; CHECK-NEXT: [[TMP68:%.*]] = extractelement <64 x i32> [[TMP23]], i32 43
				; CHECK-NEXT: [[ADD_3_5:%.*]] = add nuw nsw i32 [[ADD_2_5]], [[TMP68]]
				; CHECK-NEXT: [[TMP69:%.*]] = extractelement <64 x i32> [[TMP23]], i32 44
				; CHECK-NEXT: [[ADD_4_5:%.*]] = add nuw nsw i32 [[ADD_3_5]], [[TMP69]]
				; CHECK-NEXT: [[TMP70:%.*]] = extractelement <64 x i32> [[TMP23]], i32 45
				; CHECK-NEXT: [[ADD_5_5:%.*]] = add nuw nsw i32 [[ADD_4_5]], [[TMP70]]
				; CHECK-NEXT: [[TMP71:%.*]] = extractelement <64 x i32> [[TMP23]], i32 46
				; CHECK-NEXT: [[ADD_6_5:%.*]] = add nuw nsw i32 [[ADD_5_5]], [[TMP71]]
				; CHECK-NEXT: [[TMP72:%.*]] = extractelement <64 x i32> [[TMP23]], i32 47
				; CHECK-NEXT: [[ADD_7_5:%.*]] = add nuw nsw i32 [[ADD_6_5]], [[TMP72]]
				; CHECK-NEXT: [[TMP73:%.*]] = extractelement <64 x i32> [[TMP23]], i32 48
				; CHECK-NEXT: [[ADD_661:%.*]] = add nuw nsw i32 [[ADD_7_5]], [[TMP73]]
				; CHECK-NEXT: [[TMP74:%.*]] = extractelement <64 x i32> [[TMP23]], i32 49
				; CHECK-NEXT: [[ADD_1_6:%.*]] = add nuw nsw i32 [[ADD_661]], [[TMP74]]
				; CHECK-NEXT: [[TMP75:%.*]] = extractelement <64 x i32> [[TMP23]], i32 50
				; CHECK-NEXT: [[ADD_2_6:%.*]] = add nuw nsw i32 [[ADD_1_6]], [[TMP75]]
				; CHECK-NEXT: [[TMP76:%.*]] = extractelement <64 x i32> [[TMP23]], i32 51
				; CHECK-NEXT: [[ADD_3_6:%.*]] = add nuw nsw i32 [[ADD_2_6]], [[TMP76]]
				; CHECK-NEXT: [[TMP77:%.*]] = extractelement <64 x i32> [[TMP23]], i32 52
				; CHECK-NEXT: [[ADD_4_6:%.*]] = add nuw nsw i32 [[ADD_3_6]], [[TMP77]]
				; CHECK-NEXT: [[TMP78:%.*]] = extractelement <64 x i32> [[TMP23]], i32 53
				; CHECK-NEXT: [[ADD_5_6:%.*]] = add nuw nsw i32 [[ADD_4_6]], [[TMP78]]
				; CHECK-NEXT: [[TMP79:%.*]] = extractelement <64 x i32> [[TMP23]], i32 54
				; CHECK-NEXT: [[ADD_6_6:%.*]] = add nuw nsw i32 [[ADD_5_6]], [[TMP79]]
				; CHECK-NEXT: [[TMP80:%.*]] = extractelement <64 x i32> [[TMP23]], i32 55
				; CHECK-NEXT: [[ADD_7_6:%.*]] = add nuw nsw i32 [[ADD_6_6]], [[TMP80]]
				; CHECK-NEXT: [[TMP81:%.*]] = extractelement <64 x i32> [[TMP23]], i32 56
				; CHECK-NEXT: [[ADD_765:%.*]] = add nuw nsw i32 [[ADD_7_6]], [[TMP81]]
				; CHECK-NEXT: [[TMP82:%.*]] = extractelement <64 x i32> [[TMP23]], i32 57
				; CHECK-NEXT: [[ADD_1_7:%.*]] = add nuw nsw i32 [[ADD_765]], [[TMP82]]
				; CHECK-NEXT: [[TMP83:%.*]] = extractelement <64 x i32> [[TMP23]], i32 58
				; CHECK-NEXT: [[ADD_2_7:%.*]] = add nuw nsw i32 [[ADD_1_7]], [[TMP83]]
				; CHECK-NEXT: [[TMP84:%.*]] = extractelement <64 x i32> [[TMP23]], i32 59
				; CHECK-NEXT: [[ADD_3_7:%.*]] = add nuw nsw i32 [[ADD_2_7]], [[TMP84]]
				; CHECK-NEXT: [[TMP85:%.*]] = extractelement <64 x i32> [[TMP23]], i32 60
				; CHECK-NEXT: [[ADD_4_7:%.*]] = add nuw nsw i32 [[ADD_3_7]], [[TMP85]]
				; CHECK-NEXT: [[TMP86:%.*]] = extractelement <64 x i32> [[TMP23]], i32 61
				; CHECK-NEXT: [[ADD_5_7:%.*]] = add nuw nsw i32 [[ADD_4_7]], [[TMP86]]
				; CHECK-NEXT: [[TMP87:%.*]] = extractelement <64 x i32> [[TMP23]], i32 62
				; CHECK-NEXT: [[ADD_6_7:%.*]] = add nuw nsw i32 [[ADD_5_7]], [[TMP87]]
				; CHECK-NEXT: [[TMP88:%.*]] = extractelement <64 x i32> [[TMP23]], i32 63
				; CHECK-NEXT: [[ADD_7_7:%.*]] = add nuw nsw i32 [[ADD_6_7]], [[TMP88]]
				; CHECK-NEXT: [[TMP89:%.*]] = call i32 @llvm.vector.reduce.add.v64i32(<64 x i32> [[TMP26]])
	; CHECK-NEXT: [[CONV15:%.*]] = zext i32 [[ADD_7_7]] to i64			; CHECK-NEXT: [[CONV15:%.*]] = zext i32 [[ADD_7_7]] to i64
	; CHECK-NEXT: [[CONV16:%.*]] = zext i32 [[ADD11_7_7]] to i64			; CHECK-NEXT: [[CONV16:%.*]] = zext i32 [[TMP89]] to i64
	; CHECK-NEXT: [[SHL:%.*]] = shl nuw i64 [[CONV16]], 32			; CHECK-NEXT: [[SHL:%.*]] = shl nuw i64 [[CONV16]], 32
	; CHECK-NEXT: [[ADD17:%.*]] = or i64 [[SHL]], [[CONV15]]			; CHECK-NEXT: [[ADD17:%.*]] = or i64 [[SHL]], [[CONV15]]
	; CHECK-NEXT: ret i64 [[ADD17]]			; CHECK-NEXT: ret i64 [[ADD17]]
	;			;
	entry:			entry:
	%idx.ext = sext i32 %st to i64			%idx.ext = sext i32 %st to i64
	%0 = load i16, ptr %p, align 2			%0 = load i16, ptr %p, align 2
	%conv = zext i16 %0 to i32			%conv = zext i16 %0 to i32
	▲ Show 20 Lines • Show All 385 Lines • ▼ Show 20 Lines

	define i64 @looped(ptr nocapture noundef readonly %p, i32 noundef %st) {			define i64 @looped(ptr nocapture noundef readonly %p, i32 noundef %st) {
	; CHECK-LABEL: @looped(			; CHECK-LABEL: @looped(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[IDX_EXT:%.]] = sext i32 [[ST:%.]] to i64			; CHECK-NEXT: [[IDX_EXT:%.]] = sext i32 [[ST:%.]] to i64
	; CHECK-NEXT: br label [[FOR_COND1_PREHEADER:%.*]]			; CHECK-NEXT: br label [[FOR_COND1_PREHEADER:%.*]]
	; CHECK: for.cond1.preheader:			; CHECK: for.cond1.preheader:
	; CHECK-NEXT: [[Y_038:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[INC13:%.*]], [[FOR_COND1_PREHEADER]] ]			; CHECK-NEXT: [[Y_038:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[INC13:%.*]], [[FOR_COND1_PREHEADER]] ]
	; CHECK-NEXT: [[SQ_037:%.]] = phi i32 [ 0, [[ENTRY]] ], [ [[ADD11_15:%.]], [[FOR_COND1_PREHEADER]] ]			; CHECK-NEXT: [[SQ_037:%.]] = phi i32 [ 0, [[ENTRY]] ], [ [[OP_RDX:%.]], [[FOR_COND1_PREHEADER]] ]
	; CHECK-NEXT: [[SM_036:%.]] = phi i32 [ 0, [[ENTRY]] ], [ [[ADD_15:%.]], [[FOR_COND1_PREHEADER]] ]			; CHECK-NEXT: [[SM_036:%.]] = phi i32 [ 0, [[ENTRY]] ], [ [[OP_RDX1:%.]], [[FOR_COND1_PREHEADER]] ]
	; CHECK-NEXT: [[P_ADDR_035:%.]] = phi ptr [ [[P:%.]], [[ENTRY]] ], [ [[ADD_PTR:%.*]], [[FOR_COND1_PREHEADER]] ]			; CHECK-NEXT: [[P_ADDR_035:%.]] = phi ptr [ [[P:%.]], [[ENTRY]] ], [ [[ADD_PTR:%.*]], [[FOR_COND1_PREHEADER]] ]
	; CHECK-NEXT: [[TMP0:%.*]] = load i16, ptr [[P_ADDR_035]], align 2			; CHECK-NEXT: [[TMP0:%.*]] = load <16 x i16>, ptr [[P_ADDR_035]], align 2
	; CHECK-NEXT: [[CONV:%.*]] = zext i16 [[TMP0]] to i32			; CHECK-NEXT: [[TMP1:%.*]] = zext <16 x i16> [[TMP0]] to <16 x i32>
	; CHECK-NEXT: [[ADD:%.*]] = add i32 [[SM_036]], [[CONV]]			; CHECK-NEXT: [[TMP2:%.*]] = mul nuw nsw <16 x i32> [[TMP1]], [[TMP1]]
	; CHECK-NEXT: [[MUL:%.*]] = mul nuw nsw i32 [[CONV]], [[CONV]]			; CHECK-NEXT: [[TMP3:%.*]] = call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> [[TMP1]])
	; CHECK-NEXT: [[ADD11:%.*]] = add i32 [[MUL]], [[SQ_037]]			; CHECK-NEXT: [[OP_RDX1]] = add i32 [[TMP3]], [[SM_036]]
	; CHECK-NEXT: [[ARRAYIDX_1:%.*]] = getelementptr inbounds i16, ptr [[P_ADDR_035]], i64 1			; CHECK-NEXT: [[TMP4:%.*]] = call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> [[TMP2]])
	; CHECK-NEXT: [[TMP1:%.*]] = load i16, ptr [[ARRAYIDX_1]], align 2			; CHECK-NEXT: [[OP_RDX]] = add i32 [[TMP4]], [[SQ_037]]
	; CHECK-NEXT: [[CONV_1:%.*]] = zext i16 [[TMP1]] to i32
	; CHECK-NEXT: [[ADD_1:%.*]] = add i32 [[ADD]], [[CONV_1]]
	; CHECK-NEXT: [[MUL_1:%.*]] = mul nuw nsw i32 [[CONV_1]], [[CONV_1]]
	; CHECK-NEXT: [[ADD11_1:%.*]] = add i32 [[MUL_1]], [[ADD11]]
	; CHECK-NEXT: [[ARRAYIDX_2:%.*]] = getelementptr inbounds i16, ptr [[P_ADDR_035]], i64 2
	; CHECK-NEXT: [[TMP2:%.*]] = load i16, ptr [[ARRAYIDX_2]], align 2
	; CHECK-NEXT: [[CONV_2:%.*]] = zext i16 [[TMP2]] to i32
	; CHECK-NEXT: [[ADD_2:%.*]] = add i32 [[ADD_1]], [[CONV_2]]
	; CHECK-NEXT: [[MUL_2:%.*]] = mul nuw nsw i32 [[CONV_2]], [[CONV_2]]
	; CHECK-NEXT: [[ADD11_2:%.*]] = add i32 [[MUL_2]], [[ADD11_1]]
	; CHECK-NEXT: [[ARRAYIDX_3:%.*]] = getelementptr inbounds i16, ptr [[P_ADDR_035]], i64 3
	; CHECK-NEXT: [[TMP3:%.*]] = load i16, ptr [[ARRAYIDX_3]], align 2
	; CHECK-NEXT: [[CONV_3:%.*]] = zext i16 [[TMP3]] to i32
	; CHECK-NEXT: [[ADD_3:%.*]] = add i32 [[ADD_2]], [[CONV_3]]
	; CHECK-NEXT: [[MUL_3:%.*]] = mul nuw nsw i32 [[CONV_3]], [[CONV_3]]
	; CHECK-NEXT: [[ADD11_3:%.*]] = add i32 [[MUL_3]], [[ADD11_2]]
	; CHECK-NEXT: [[ARRAYIDX_4:%.*]] = getelementptr inbounds i16, ptr [[P_ADDR_035]], i64 4
	; CHECK-NEXT: [[TMP4:%.*]] = load i16, ptr [[ARRAYIDX_4]], align 2
	; CHECK-NEXT: [[CONV_4:%.*]] = zext i16 [[TMP4]] to i32
	; CHECK-NEXT: [[ADD_4:%.*]] = add i32 [[ADD_3]], [[CONV_4]]
	; CHECK-NEXT: [[MUL_4:%.*]] = mul nuw nsw i32 [[CONV_4]], [[CONV_4]]
	; CHECK-NEXT: [[ADD11_4:%.*]] = add i32 [[MUL_4]], [[ADD11_3]]
	; CHECK-NEXT: [[ARRAYIDX_5:%.*]] = getelementptr inbounds i16, ptr [[P_ADDR_035]], i64 5
	; CHECK-NEXT: [[TMP5:%.*]] = load i16, ptr [[ARRAYIDX_5]], align 2
	; CHECK-NEXT: [[CONV_5:%.*]] = zext i16 [[TMP5]] to i32
	; CHECK-NEXT: [[ADD_5:%.*]] = add i32 [[ADD_4]], [[CONV_5]]
	; CHECK-NEXT: [[MUL_5:%.*]] = mul nuw nsw i32 [[CONV_5]], [[CONV_5]]
	; CHECK-NEXT: [[ADD11_5:%.*]] = add i32 [[MUL_5]], [[ADD11_4]]
	; CHECK-NEXT: [[ARRAYIDX_6:%.*]] = getelementptr inbounds i16, ptr [[P_ADDR_035]], i64 6
	; CHECK-NEXT: [[TMP6:%.*]] = load i16, ptr [[ARRAYIDX_6]], align 2
	; CHECK-NEXT: [[CONV_6:%.*]] = zext i16 [[TMP6]] to i32
	; CHECK-NEXT: [[ADD_6:%.*]] = add i32 [[ADD_5]], [[CONV_6]]
	; CHECK-NEXT: [[MUL_6:%.*]] = mul nuw nsw i32 [[CONV_6]], [[CONV_6]]
	; CHECK-NEXT: [[ADD11_6:%.*]] = add i32 [[MUL_6]], [[ADD11_5]]
	; CHECK-NEXT: [[ARRAYIDX_7:%.*]] = getelementptr inbounds i16, ptr [[P_ADDR_035]], i64 7
	; CHECK-NEXT: [[TMP7:%.*]] = load i16, ptr [[ARRAYIDX_7]], align 2
	; CHECK-NEXT: [[CONV_7:%.*]] = zext i16 [[TMP7]] to i32
	; CHECK-NEXT: [[ADD_7:%.*]] = add i32 [[ADD_6]], [[CONV_7]]
	; CHECK-NEXT: [[MUL_7:%.*]] = mul nuw nsw i32 [[CONV_7]], [[CONV_7]]
	; CHECK-NEXT: [[ADD11_7:%.*]] = add i32 [[MUL_7]], [[ADD11_6]]
	; CHECK-NEXT: [[ARRAYIDX_8:%.*]] = getelementptr inbounds i16, ptr [[P_ADDR_035]], i64 8
	; CHECK-NEXT: [[TMP8:%.*]] = load i16, ptr [[ARRAYIDX_8]], align 2
	; CHECK-NEXT: [[CONV_8:%.*]] = zext i16 [[TMP8]] to i32
	; CHECK-NEXT: [[ADD_8:%.*]] = add i32 [[ADD_7]], [[CONV_8]]
	; CHECK-NEXT: [[MUL_8:%.*]] = mul nuw nsw i32 [[CONV_8]], [[CONV_8]]
	; CHECK-NEXT: [[ADD11_8:%.*]] = add i32 [[MUL_8]], [[ADD11_7]]
	; CHECK-NEXT: [[ARRAYIDX_9:%.*]] = getelementptr inbounds i16, ptr [[P_ADDR_035]], i64 9
	; CHECK-NEXT: [[TMP9:%.*]] = load i16, ptr [[ARRAYIDX_9]], align 2
	; CHECK-NEXT: [[CONV_9:%.*]] = zext i16 [[TMP9]] to i32
	; CHECK-NEXT: [[ADD_9:%.*]] = add i32 [[ADD_8]], [[CONV_9]]
	; CHECK-NEXT: [[MUL_9:%.*]] = mul nuw nsw i32 [[CONV_9]], [[CONV_9]]
	; CHECK-NEXT: [[ADD11_9:%.*]] = add i32 [[MUL_9]], [[ADD11_8]]
	; CHECK-NEXT: [[ARRAYIDX_10:%.*]] = getelementptr inbounds i16, ptr [[P_ADDR_035]], i64 10
	; CHECK-NEXT: [[TMP10:%.*]] = load i16, ptr [[ARRAYIDX_10]], align 2
	; CHECK-NEXT: [[CONV_10:%.*]] = zext i16 [[TMP10]] to i32
	; CHECK-NEXT: [[ADD_10:%.*]] = add i32 [[ADD_9]], [[CONV_10]]
	; CHECK-NEXT: [[MUL_10:%.*]] = mul nuw nsw i32 [[CONV_10]], [[CONV_10]]
	; CHECK-NEXT: [[ADD11_10:%.*]] = add i32 [[MUL_10]], [[ADD11_9]]
	; CHECK-NEXT: [[ARRAYIDX_11:%.*]] = getelementptr inbounds i16, ptr [[P_ADDR_035]], i64 11
	; CHECK-NEXT: [[TMP11:%.*]] = load i16, ptr [[ARRAYIDX_11]], align 2
	; CHECK-NEXT: [[CONV_11:%.*]] = zext i16 [[TMP11]] to i32
	; CHECK-NEXT: [[ADD_11:%.*]] = add i32 [[ADD_10]], [[CONV_11]]
	; CHECK-NEXT: [[MUL_11:%.*]] = mul nuw nsw i32 [[CONV_11]], [[CONV_11]]
	; CHECK-NEXT: [[ADD11_11:%.*]] = add i32 [[MUL_11]], [[ADD11_10]]
	; CHECK-NEXT: [[ARRAYIDX_12:%.*]] = getelementptr inbounds i16, ptr [[P_ADDR_035]], i64 12
	; CHECK-NEXT: [[TMP12:%.*]] = load i16, ptr [[ARRAYIDX_12]], align 2
	; CHECK-NEXT: [[CONV_12:%.*]] = zext i16 [[TMP12]] to i32
	; CHECK-NEXT: [[ADD_12:%.*]] = add i32 [[ADD_11]], [[CONV_12]]
	; CHECK-NEXT: [[MUL_12:%.*]] = mul nuw nsw i32 [[CONV_12]], [[CONV_12]]
	; CHECK-NEXT: [[ADD11_12:%.*]] = add i32 [[MUL_12]], [[ADD11_11]]
	; CHECK-NEXT: [[ARRAYIDX_13:%.*]] = getelementptr inbounds i16, ptr [[P_ADDR_035]], i64 13
	; CHECK-NEXT: [[TMP13:%.*]] = load i16, ptr [[ARRAYIDX_13]], align 2
	; CHECK-NEXT: [[CONV_13:%.*]] = zext i16 [[TMP13]] to i32
	; CHECK-NEXT: [[ADD_13:%.*]] = add i32 [[ADD_12]], [[CONV_13]]
	; CHECK-NEXT: [[MUL_13:%.*]] = mul nuw nsw i32 [[CONV_13]], [[CONV_13]]
	; CHECK-NEXT: [[ADD11_13:%.*]] = add i32 [[MUL_13]], [[ADD11_12]]
	; CHECK-NEXT: [[ARRAYIDX_14:%.*]] = getelementptr inbounds i16, ptr [[P_ADDR_035]], i64 14
	; CHECK-NEXT: [[TMP14:%.*]] = load i16, ptr [[ARRAYIDX_14]], align 2
	; CHECK-NEXT: [[CONV_14:%.*]] = zext i16 [[TMP14]] to i32
	; CHECK-NEXT: [[ADD_14:%.*]] = add i32 [[ADD_13]], [[CONV_14]]
	; CHECK-NEXT: [[MUL_14:%.*]] = mul nuw nsw i32 [[CONV_14]], [[CONV_14]]
	; CHECK-NEXT: [[ADD11_14:%.*]] = add i32 [[MUL_14]], [[ADD11_13]]
	; CHECK-NEXT: [[ARRAYIDX_15:%.*]] = getelementptr inbounds i16, ptr [[P_ADDR_035]], i64 15
	; CHECK-NEXT: [[TMP15:%.*]] = load i16, ptr [[ARRAYIDX_15]], align 2
	; CHECK-NEXT: [[CONV_15:%.*]] = zext i16 [[TMP15]] to i32
	; CHECK-NEXT: [[ADD_15]] = add i32 [[ADD_14]], [[CONV_15]]
	; CHECK-NEXT: [[MUL_15:%.*]] = mul nuw nsw i32 [[CONV_15]], [[CONV_15]]
	; CHECK-NEXT: [[ADD11_15]] = add i32 [[MUL_15]], [[ADD11_14]]
	; CHECK-NEXT: [[ADD_PTR]] = getelementptr inbounds i16, ptr [[P_ADDR_035]], i64 [[IDX_EXT]]			; CHECK-NEXT: [[ADD_PTR]] = getelementptr inbounds i16, ptr [[P_ADDR_035]], i64 [[IDX_EXT]]
	; CHECK-NEXT: [[INC13]] = add nuw nsw i32 [[Y_038]], 1			; CHECK-NEXT: [[INC13]] = add nuw nsw i32 [[Y_038]], 1
	; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i32 [[INC13]], 16			; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i32 [[INC13]], 16
	; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_COND1_PREHEADER]]			; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_COND1_PREHEADER]]
	; CHECK: for.cond.cleanup:			; CHECK: for.cond.cleanup:
	; CHECK-NEXT: [[CONV15:%.*]] = zext i32 [[ADD_15]] to i64			; CHECK-NEXT: [[CONV15:%.*]] = zext i32 [[OP_RDX1]] to i64
	; CHECK-NEXT: [[CONV16:%.*]] = zext i32 [[ADD11_15]] to i64			; CHECK-NEXT: [[CONV16:%.*]] = zext i32 [[OP_RDX]] to i64
	; CHECK-NEXT: [[SHL:%.*]] = shl nuw i64 [[CONV16]], 32			; CHECK-NEXT: [[SHL:%.*]] = shl nuw i64 [[CONV16]], 32
	; CHECK-NEXT: [[ADD17:%.*]] = or i64 [[SHL]], [[CONV15]]			; CHECK-NEXT: [[ADD17:%.*]] = or i64 [[SHL]], [[CONV15]]
	; CHECK-NEXT: ret i64 [[ADD17]]			; CHECK-NEXT: ret i64 [[ADD17]]
	;			;
	entry:			entry:
	%idx.ext = sext i32 %st to i64			%idx.ext = sext i32 %st to i64
	br label %for.cond1.preheader			br label %for.cond1.preheader

	▲ Show 20 Lines • Show All 113 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/slp-fma-loss.ll

Show First 20 Lines • Show All 195 Lines • ▼ Show 20 Lines	;
%add = fadd nnan float %mul.3, %mul.2		%add = fadd nnan float %mul.3, %mul.2
store float %sub, ptr %A, align 4		store float %sub, ptr %A, align 4
%gep.A.1 = getelementptr inbounds float, ptr %A, i64 1		%gep.A.1 = getelementptr inbounds float, ptr %A, i64 1
store float %add, ptr %gep.A.1, align 4		store float %add, ptr %gep.A.1, align 4
store float %B.2, ptr %B, align 4		store float %B.2, ptr %B, align 4
ret void		ret void
}		}

; Test case where not vectorizing is more profitable because multiple		; Test case where not vectorizing is more profitable because multiple
		efriedmaUnsubmitted Not Done Reply Inline Actions Regression? efriedma: Regression?
		dmgreenAuthorUnsubmitted Done Reply Inline Actions I had looked into these. This test case was added without the underlying issue being fixed (fusing fmul and fadd). The tests were changed by the increased cost in ld1r instructions. In this case it just profitable again now. You can see it picks 2x vectorization though, not 4x, which seems to come because of the `insertelement <2 x float> <float poison, float 3.000000e+00>, float [[X:%.]], i32 0`, which is counts as the cost of a constant vector with x inserted into the bottom lane, both of which are incorrectly counted as zero. dmgreen:* I had looked into these. This test case was added without the underlying issue being fixed…
; fmul/{fadd,fsub} pairs can be lowered to fma instructions.		; fmul/{fadd,fsub} pairs can be lowered to fma instructions.
define float @slp_not_profitable_in_loop(float %x, ptr %A) {		define float @slp_not_profitable_in_loop(float %x, ptr %A) {
; CHECK-LABEL: @slp_not_profitable_in_loop(		; CHECK-LABEL: @slp_not_profitable_in_loop(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[GEP_A_1:%.]] = getelementptr inbounds float, ptr [[A:%.]], i64 1		; CHECK-NEXT: [[GEP_A_2:%.]] = getelementptr inbounds float, ptr [[A:%.]], i64 2
; CHECK-NEXT: [[L_0:%.*]] = load float, ptr [[GEP_A_1]], align 4
; CHECK-NEXT: [[GEP_A_2:%.*]] = getelementptr inbounds float, ptr [[A]], i64 2
; CHECK-NEXT: [[L_1:%.*]] = load float, ptr [[GEP_A_2]], align 4		; CHECK-NEXT: [[L_1:%.*]] = load float, ptr [[GEP_A_2]], align 4
; CHECK-NEXT: [[L_2:%.*]] = load float, ptr [[A]], align 4		; CHECK-NEXT: [[TMP0:%.*]] = load <2 x float>, ptr [[A]], align 4
; CHECK-NEXT: [[L_3:%.*]] = load float, ptr [[A]], align 4		; CHECK-NEXT: [[L_3:%.*]] = load float, ptr [[A]], align 4
		; CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x float> <float poison, float 3.000000e+00>, float [[X:%.]], i32 0
; CHECK-NEXT: br label [[LOOP:%.*]]		; CHECK-NEXT: br label [[LOOP:%.*]]
; CHECK: loop:		; CHECK: loop:
; CHECK-NEXT: [[IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[IV_NEXT:%.*]], [[LOOP]] ]		; CHECK-NEXT: [[IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[IV_NEXT:%.*]], [[LOOP]] ]
; CHECK-NEXT: [[RED:%.]] = phi float [ 0.000000e+00, [[ENTRY]] ], [ [[RED_NEXT:%.]], [[LOOP]] ]		; CHECK-NEXT: [[RED:%.]] = phi float [ 0.000000e+00, [[ENTRY]] ], [ [[RED_NEXT:%.]], [[LOOP]] ]
; CHECK-NEXT: [[MUL11:%.*]] = fmul fast float 3.000000e+00, [[L_0]]		; CHECK-NEXT: [[TMP2:%.*]] = fmul fast <2 x float> [[TMP1]], [[TMP0]]
; CHECK-NEXT: [[MUL12:%.*]] = fmul fast float 3.000000e+00, [[L_1]]		; CHECK-NEXT: [[MUL12:%.*]] = fmul fast float 3.000000e+00, [[L_1]]
; CHECK-NEXT: [[MUL14:%.]] = fmul fast float [[X:%.]], [[L_2]]
; CHECK-NEXT: [[MUL16:%.*]] = fmul fast float 3.000000e+00, [[L_3]]		; CHECK-NEXT: [[MUL16:%.*]] = fmul fast float 3.000000e+00, [[L_3]]
; CHECK-NEXT: [[ADD:%.*]] = fadd fast float [[MUL12]], [[MUL11]]		; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x float> [[TMP2]], i32 1
; CHECK-NEXT: [[ADD13:%.*]] = fadd fast float [[ADD]], [[MUL14]]		; CHECK-NEXT: [[ADD:%.*]] = fadd fast float [[MUL12]], [[TMP3]]
		; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x float> [[TMP2]], i32 0
		; CHECK-NEXT: [[ADD13:%.*]] = fadd fast float [[ADD]], [[TMP4]]
; CHECK-NEXT: [[RED_NEXT]] = fadd fast float [[ADD13]], [[MUL16]]		; CHECK-NEXT: [[RED_NEXT]] = fadd fast float [[ADD13]], [[MUL16]]
; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1		; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
; CHECK-NEXT: [[CMP:%.*]] = icmp eq i64 [[IV]], 10		; CHECK-NEXT: [[CMP:%.*]] = icmp eq i64 [[IV]], 10
; CHECK-NEXT: br i1 [[CMP]], label [[EXIT:%.*]], label [[LOOP]]		; CHECK-NEXT: br i1 [[CMP]], label [[EXIT:%.*]], label [[LOOP]]
; CHECK: exit:		; CHECK: exit:
; CHECK-NEXT: ret float [[RED_NEXT]]		; CHECK-NEXT: ret float [[RED_NEXT]]
;		;
entry:		entry:
▲ Show 20 Lines • Show All 60 Lines • Show Last 20 Lines

llvm/test/Transforms/VectorCombine/AArch64/load-extractelement-scalarization.ll

Show First 20 Lines • Show All 499 Lines • ▼ Show 20 Lines	;
%r = extractelement <4 x i31> %lv, i32 1		%r = extractelement <4 x i31> %lv, i32 1
ret i31 %r		ret i31 %r
}		}

; Scalarizing the load for multiple constant indices may not be profitable.		; Scalarizing the load for multiple constant indices may not be profitable.
define i32 @load_multiple_extracts_with_constant_idx(ptr %x) {		define i32 @load_multiple_extracts_with_constant_idx(ptr %x) {
; CHECK-LABEL: @load_multiple_extracts_with_constant_idx(		; CHECK-LABEL: @load_multiple_extracts_with_constant_idx(
; CHECK-NEXT: [[LV:%.]] = load <4 x i32>, ptr [[X:%.]], align 16		; CHECK-NEXT: [[LV:%.]] = load <4 x i32>, ptr [[X:%.]], align 16
; CHECK-NEXT: [[SHIFT:%.*]] = shufflevector <4 x i32> [[LV]], <4 x i32> poison, <4 x i32> <i32 1, i32 poison, i32 poison, i32 poison>		; CHECK-NEXT: [[E_0:%.*]] = extractelement <4 x i32> [[LV]], i32 0
; CHECK-NEXT: [[TMP1:%.*]] = add <4 x i32> [[LV]], [[SHIFT]]		; CHECK-NEXT: [[E_1:%.*]] = extractelement <4 x i32> [[LV]], i32 1
; CHECK-NEXT: [[RES:%.*]] = extractelement <4 x i32> [[TMP1]], i32 0		; CHECK-NEXT: [[RES:%.*]] = add i32 [[E_0]], [[E_1]]
; CHECK-NEXT: ret i32 [[RES]]		; CHECK-NEXT: ret i32 [[RES]]
;		;
%lv = load <4 x i32>, ptr %x		%lv = load <4 x i32>, ptr %x
%e.0 = extractelement <4 x i32> %lv, i32 0		%e.0 = extractelement <4 x i32> %lv, i32 0
%e.1 = extractelement <4 x i32> %lv, i32 1		%e.1 = extractelement <4 x i32> %lv, i32 1
%res = add i32 %e.0, %e.1		%res = add i32 %e.0, %e.1
ret i32 %res		ret i32 %res
}		}

; Scalarizing the load for multiple extracts is profitable in this case,		; Scalarizing the load for multiple extracts is profitable in this case,
; because the vector large vector requires 2 vector registers.		; because the vector large vector requires 2 vector registers.
		efriedmaUnsubmitted Not Done Reply Inline Actions Regression? efriedma: Regression?
		dmgreenAuthorUnsubmitted Done Reply Inline Actions The cost of an extract of lane zero is still 0 (which is known to be wrong but doesn't look like something we can change without causing too many regressions. I was really hoping to remove it for integer type at the same time as this, but it looks like it causes too many problems to remove. I'm hoping that can be improved in the future, and that will hopefully be easier if the base scalar cost is lower). This seems to already handled by instruction selection https://godbolt.org/z/7GEcxo8WT, so shouldnt be a problem on its own. I can change the test to use lane 1 to show it still applies for other lanes. dmgreen: The cost of an extract of lane zero is still 0 (which is known to be wrong but doesn't look…
		efriedmaUnsubmitted Not Done Reply Inline Actions Okay. efriedma: Okay.
define i32 @load_multiple_extracts_with_constant_idx_profitable(ptr %x) {		define i32 @load_multiple_extracts_with_constant_idx_profitable(ptr %x) {
; CHECK-LABEL: @load_multiple_extracts_with_constant_idx_profitable(		; CHECK-LABEL: @load_multiple_extracts_with_constant_idx_profitable(
; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds <8 x i32>, ptr [[X:%.]], i32 0, i32 0		; CHECK-NEXT: [[LV:%.]] = load <8 x i32>, ptr [[X:%.]], align 16
; CHECK-NEXT: [[E_0:%.*]] = load i32, ptr [[TMP1]], align 16		; CHECK-NEXT: [[E_0:%.*]] = extractelement <8 x i32> [[LV]], i32 0
; CHECK-NEXT: [[TMP2:%.*]] = getelementptr inbounds <8 x i32>, ptr [[X]], i32 0, i32 6		; CHECK-NEXT: [[E_1:%.*]] = extractelement <8 x i32> [[LV]], i32 6
; CHECK-NEXT: [[E_1:%.*]] = load i32, ptr [[TMP2]], align 8
; CHECK-NEXT: [[RES:%.*]] = add i32 [[E_0]], [[E_1]]		; CHECK-NEXT: [[RES:%.*]] = add i32 [[E_0]], [[E_1]]
; CHECK-NEXT: ret i32 [[RES]]		; CHECK-NEXT: ret i32 [[RES]]
;		;
%lv = load <8 x i32>, ptr %x, align 16		%lv = load <8 x i32>, ptr %x, align 16
%e.0 = extractelement <8 x i32> %lv, i32 0		%e.0 = extractelement <8 x i32> %lv, i32 0
%e.1 = extractelement <8 x i32> %lv, i32 6		%e.1 = extractelement <8 x i32> %lv, i32 6
%res = add i32 %e.0, %e.1		%res = add i32 %e.0, %e.1
ret i32 %res		ret i32 %res
▲ Show 20 Lines • Show All 172 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Change the cost of vector insert/extract to 2ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 545259

llvm/lib/Target/AArch64/AArch64Subtarget.h

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

llvm/test/Analysis/CostModel/AArch64/arith-fp.ll

llvm/test/Analysis/CostModel/AArch64/arith-overflow.ll

llvm/test/Analysis/CostModel/AArch64/bswap.ll

llvm/test/Analysis/CostModel/AArch64/cast.ll

llvm/test/Analysis/CostModel/AArch64/cmp.ll

llvm/test/Analysis/CostModel/AArch64/ctlz.ll

llvm/test/Analysis/CostModel/AArch64/cttz.ll

llvm/test/Analysis/CostModel/AArch64/div.ll

llvm/test/Analysis/CostModel/AArch64/fptoi_sat.ll

llvm/test/Analysis/CostModel/AArch64/free-widening-casts.ll

llvm/test/Analysis/CostModel/AArch64/fshl.ll

llvm/test/Analysis/CostModel/AArch64/fshr.ll

llvm/test/Analysis/CostModel/AArch64/getIntrinsicInstrCost-vector-reverse.ll

llvm/test/Analysis/CostModel/AArch64/insert-extract.ll

llvm/test/Analysis/CostModel/AArch64/masked_ldst.ll

llvm/test/Analysis/CostModel/AArch64/mem-op-cost-model.ll

llvm/test/Analysis/CostModel/AArch64/min-max.ll

llvm/test/Analysis/CostModel/AArch64/reduce-fadd.ll

llvm/test/Analysis/CostModel/AArch64/reduce-minmax.ll

llvm/test/Analysis/CostModel/AArch64/rem.ll

llvm/test/Analysis/CostModel/AArch64/shuffle-load.ll

llvm/test/Analysis/CostModel/AArch64/shuffle-other.ll

llvm/test/Analysis/CostModel/AArch64/shuffle-select.ll

llvm/test/Analysis/CostModel/AArch64/sve-insert-extract.ll

llvm/test/Analysis/CostModel/AArch64/sve-intrinsics.ll

llvm/test/Analysis/CostModel/AArch64/vector-select.ll

llvm/test/Transforms/LoopVectorize/AArch64/aarch64-predication.ll

llvm/test/Transforms/LoopVectorize/AArch64/interleaved-vs-scalar.ll

llvm/test/Transforms/LoopVectorize/AArch64/interleaved_cost.ll

llvm/test/Transforms/LoopVectorize/AArch64/masked-op-cost.ll

llvm/test/Transforms/LoopVectorize/AArch64/predication_costs.ll

llvm/test/Transforms/LoopVectorize/AArch64/strict-fadd-cost.ll

llvm/test/Transforms/LoopVectorize/AArch64/unsafe-vf-hint-remark.ll

llvm/test/Transforms/LowerMatrixIntrinsics/dot-product-float.ll

llvm/test/Transforms/SLPVectorizer/AArch64/ext-trunc.ll

llvm/test/Transforms/SLPVectorizer/AArch64/gather-cost.ll

llvm/test/Transforms/SLPVectorizer/AArch64/getelementptr.ll

llvm/test/Transforms/SLPVectorizer/AArch64/landing_pad.ll

llvm/test/Transforms/SLPVectorizer/AArch64/matmul.ll

llvm/test/Transforms/SLPVectorizer/AArch64/memory-runtime-checks.ll

llvm/test/Transforms/SLPVectorizer/AArch64/multiple_reduction.ll

llvm/test/Transforms/SLPVectorizer/AArch64/slp-fma-loss.ll

llvm/test/Transforms/VectorCombine/AArch64/load-extractelement-scalarization.ll

[AArch64] Change the cost of vector insert/extract to 2
ClosedPublic