This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
1/3
AArch64TargetTransformInfo.cpp
-
test/
-
Analysis/CostModel/AArch64/
-
CostModel/
-
AArch64/
-
arith-fp-sve.ll
-
arith-fp.ll
-
cast.ll
-
reduce-fadd.ll
-
sve-fixed-length.ll
-
sve-intrinsics.ll
-
sve-math.ll
-
Transforms/SLPVectorizer/AArch64/
-
SLPVectorizer/
-
AArch64/
1
matmul.ll
-
remarks.ll

Differential D146033

[AArch64][TTI] Cost model FADD/FSUB/FNEG
ClosedPublic

Authored by SjoerdMeijer on Mar 14 2023, 4:51 AM.

Download Raw Diff

Details

Reviewers

dmgreen
fhahn
efriedma
paulwalker-arm

Commits

rGd0027e0be990: [AArch64][TTI] Cost model FADD/FSUB/FNEG

Summary

This lowers the cost for FADD, FSUB, and FNEG. The motivation is to avoid over-eager SLP vectorisation, that makes it look like SLP vectorisation is profitable but results in significant slow downs. Lowering the cost for scalar FADD and FSUB costs helps the profitability decision to favour the scalar version where vectorisation isn't beneficial. Performance results show a 7% improvement for Imagick from SPEC FP 2017, a small improvement in Blender, and unchanged results for the other apps in SPEC. RAJAPerf is neutral (mostly no changes).

For a bit more context, this is related to over-eager vectorisation in https://github.com/llvm/llvm-project/issues/61047 but the motivating test case is slightly different.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

SjoerdMeijer created this revision.Mar 14 2023, 4:51 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 14 2023, 4:51 AM

Herald added subscribers: StephenFan, hiraditya, kristof.beyls. · View Herald Transcript

SjoerdMeijer requested review of this revision.Mar 14 2023, 4:51 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 14 2023, 4:51 AM

madhur13490 added a subscriber: madhur13490.Mar 14 2023, 6:32 AM

Harbormaster completed remote builds in B219313: Diff 505026.Mar 14 2023, 6:34 AM

I have now benchmarked RAJAPerf too, which is a loop based and FP heavy benchmark. Results are neutral, mostly no changes.
So overall I am only seeing the uplift of this, with no regression.

Added context, fixed the test cases.

Herald added a subscriber: • pcwang-thead. · View Herald TranscriptMar 16 2023, 8:13 AM

SjoerdMeijer retitled this revision from [AArch64][TTI] Cost model FADD/FSUB. WIP. to [AArch64][TTI] Cost model FADD/FSUB.Mar 16 2023, 8:15 AM

SjoerdMeijer edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B219873: Diff 505812.Mar 16 2023, 9:21 AM

I think this makes a lot of sense. Especially if we are treating many shuffles with a cost of 1, floating point operations should not be twice the cost. We could consider doing the same for fmul from looking at software optimization guides, but the changes for fadd/fsub already have a high likelihood of causing some large changes. Adding fneg is worth it though as that should be a simple operation.

This might lead to a fair number of changes. From looking at the main regression I found for example, it was a case of identical code being produced by the loop vectorizer, but scalar and vector costs being closer lead it to have a higher minimum trip count for entering the vector body (from D109368). The real problem there is aliasing causing LD4 loads to not be recognized though, so it's tough to see how this change is really to blame. There are a number of improvements too, to make up for that regression.

@fhahn do you have any thoughts on the change from your side, or any benchmark results that could be helpful?

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
2414	It is probably worth adding ISD::FNEG, and we might need to be more careful about the types. FP128 shouldn't be cheap, for example. Possibly a higher cost with fp16/bf16 too, when they are not available?
llvm/test/Transforms/SLPVectorizer/AArch64/matmul.ll
7	This source file, when compiled is not SLP vectorizing already, so I don't think things change in practice. It is using a number of llvm.fmuladd intrinsics when I compile it, which already seem to have a cost of 1.

In D146033#4202014, @dmgreen wrote:

I think this makes a lot of sense. Especially if we are treating many shuffles with a cost of 1, floating point operations should not be twice the cost. We could consider doing the same for fmul from looking at software optimization guides, but the changes for fadd/fsub already have a high likelihood of causing some large changes. Adding fneg is worth it though as that should be a simple operation.

This might lead to a fair number of changes. From looking at the main regression I found for example, it was a case of identical code being produced by the loop vectorizer, but scalar and vector costs being closer lead it to have a higher minimum trip count for entering the vector body (from D109368). The real problem there is aliasing causing LD4 loads to not be recognized though, so it's tough to see how this change is really to blame. There are a number of improvements too, to make up for that regression.

@fhahn do you have any thoughts on the change from your side, or any benchmark results that could be helpful?

Thanks a lot for looking at this Dave. I completely agree about giving other FP operations like fmul the same treatment. I forgot to mention why I only chose to change the cost for FADD/FSUB, but it's exactly for the reason you mentioned: I was a bit cautious about causing too many changes at the same time and am hoping to restrict the fallout by changing FADD/FSUB first before doing others. So I am more than happy to add FNEG too if we think that won't increase the risk of regressions.

I will also add the extra checks for the more exotic FP types like you suggested.

Added FNEG, and checks for BF16 and FP16.

Harbormaster completed remote builds in B220413: Diff 506548.Mar 20 2023, 6:17 AM

Gentle ping. Any further opinions on this?

Ping.

I think this looks OK. It has quite a high chance of causing some issues somewhere, but the only two I have seen to be unrelated, seemingly caused by the costs on non-legal interleaving groups.

LGTM, but please keep an eye for any reported issues. Thanks.

This revision is now accepted and ready to land.Apr 9 2023, 8:09 AM

Agreed, and will keep an eye on how this goes.

Thanks for your help with this.

This revision was landed with ongoing or failed builds.Apr 11 2023, 1:46 AM

Closed by commit rGd0027e0be990: [AArch64][TTI] Cost model FADD/FSUB/FNEG (authored by SjoerdMeijer). · Explain Why

This revision was automatically updated to reflect the committed changes.

SjoerdMeijer added a commit: rGd0027e0be990: [AArch64][TTI] Cost model FADD/FSUB/FNEG.

SjoerdMeijer added a reverting change: rG4876f43ea9b2: Revert "[AArch64][TTI] Cost model FADD/FSUB/FNEG".Apr 11 2023, 2:15 AM

FYI I got some warnings from this new code, using clang 14.0.5:

/home/david.spickett/llvm-project/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:2421:3: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
  case ISD::FMUL:
  ^
/home/david.spickett/llvm-project/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:2421:3: note: insert 'LLVM_FALLTHROUGH;' to silence this warning
  case ISD::FMUL:
  ^
  LLVM_FALLTHROUGH;
/home/david.spickett/llvm-project/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:2421:3: note: insert 'break;' to avoid fall-through
  case ISD::FMUL:
  ^
  break;

Though the existing code directly above also doesn't annotate it either, so if you feel like fixing it in another patch maybe do that.

In D146033#4257518, @DavidSpickett wrote:

FYI I got some warnings from this new code, using clang 14.0.5:

/home/david.spickett/llvm-project/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:2421:3: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
  case ISD::FMUL:
  ^
/home/david.spickett/llvm-project/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:2421:3: note: insert 'LLVM_FALLTHROUGH;' to silence this warning
  case ISD::FMUL:
  ^
  LLVM_FALLTHROUGH;
/home/david.spickett/llvm-project/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:2421:3: note: insert 'break;' to avoid fall-through
  case ISD::FMUL:
  ^
  break;

Though the existing code directly above also doesn't annotate it either, so if you feel like fixing it in another patch maybe do that.

Yep, thanks, I noticed it too and will recommit soon with the annotation included in the recommit because there is a bot that errors on it:

\https://lab.llvm.org/buildbot/#/builders/57/builds/25973

MaskRay added a subscriber: MaskRay.Apr 11 2023, 10:50 AM

MaskRay added inline comments.

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
2420	Also an `error: unannotated fall-through between switch labels [-Werror,-Wimplicit-fallthrough]` to address before you reland the patch :)

SjoerdMeijer added inline comments.Apr 11 2023, 10:56 AM

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
2420	Included/fixed in the recommit: https://reviews.llvm.org/rGd827865e9f77

dtemirbulatov added a subscriber: dtemirbulatov.May 25 2023, 2:15 AM

nilanjana_basu mentioned this in D151604: [TTI][AArch64][CostModel] Adjusting the cost of vector element insert/extract operations for ARM based Apple CPUs.May 26 2023, 7:04 PM

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64TargetTransformInfo.cpp

9 lines

test/

Analysis/

CostModel/

AArch64/

50 lines

88 lines

4 lines

103 lines

6 lines

12 lines

2 lines

Transforms/

SLPVectorizer/

AArch64/

matmul.ll

80 lines

remarks.ll

2 lines

Diff 512368

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

Show First 20 Lines • Show All 2,402 Lines • ▼ Show 20 Lines	InstructionCost AArch64TTIImpl::getArithmeticInstrCost(
case ISD::AND:		case ISD::AND:
case ISD::SRL:		case ISD::SRL:
case ISD::SRA:		case ISD::SRA:
case ISD::SHL:		case ISD::SHL:
// These nodes are marked as 'custom' for combining purposes only.		// These nodes are marked as 'custom' for combining purposes only.
// We know that they are legal. See LowerAdd in ISelLowering.		// We know that they are legal. See LowerAdd in ISelLowering.
return LT.first;		return LT.first;

		case ISD::FNEG:
case ISD::FADD:		case ISD::FADD:
case ISD::FSUB:		case ISD::FSUB:
		// Increase the cost for half and bfloat types if not architecturally
		dmgreenUnsubmitted Not Done Reply Inline Actions It is probably worth adding ISD::FNEG, and we might need to be more careful about the types. FP128 shouldn't be cheap, for example. Possibly a higher cost with fp16/bf16 too, when they are not available? dmgreen: It is probably worth adding ISD::FNEG, and we might need to be more careful about the types.
		// supported.
		if ((Ty->getScalarType()->isHalfTy() && !ST->hasFullFP16()) \|\|
		(Ty->getScalarType()->isBFloatTy() && !ST->hasBF16()))
		return 2 * LT.first;
		if (!Ty->getScalarType()->isFP128Ty())
		return LT.first;
		MaskRayUnsubmitted Not Done Reply Inline Actions Also an `error: unannotated fall-through between switch labels [-Werror,-Wimplicit-fallthrough]` to address before you reland the patch :) MaskRay: Also an `error: unannotated fall-through between switch labels [-Werror,-Wimplicit…
		SjoerdMeijerAuthorUnsubmitted Done Reply Inline Actions Included/fixed in the recommit: https://reviews.llvm.org/rGd827865e9f77 SjoerdMeijer: Included/fixed in the recommit: https://reviews.llvm.org/rGd827865e9f77
case ISD::FMUL:		case ISD::FMUL:
case ISD::FDIV:		case ISD::FDIV:
case ISD::FNEG:
// These nodes are marked as 'custom' just to lower them to SVE.		// These nodes are marked as 'custom' just to lower them to SVE.
// We know said lowering will incur no additional cost.		// We know said lowering will incur no additional cost.
if (!Ty->getScalarType()->isFP128Ty())		if (!Ty->getScalarType()->isFP128Ty())
return 2 * LT.first;		return 2 * LT.first;

return BaseT::getArithmeticInstrCost(Opcode, Ty, CostKind, Op1Info,		return BaseT::getArithmeticInstrCost(Opcode, Ty, CostKind, Op1Info,
Op2Info);		Op2Info);
}		}
▲ Show 20 Lines • Show All 1,043 Lines • Show Last 20 Lines

llvm/test/Analysis/CostModel/AArch64/arith-fp-sve.ll

	; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py
	; RUN: opt < %s -enable-no-nans-fp-math -passes="print<cost-model>" 2>&1 -disable-output -mtriple=aarch64 -mattr=+fullfp16 -mattr=+sve \| FileCheck %s			; RUN: opt < %s -enable-no-nans-fp-math -passes="print<cost-model>" 2>&1 -disable-output -mtriple=aarch64 -mattr=+fullfp16 -mattr=+sve \| FileCheck %s

	target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"

	define void @fadd() {			define void @fadd() {
	; CHECK-LABEL: 'fadd'			; CHECK-LABEL: 'fadd'
	; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V4F16 = fadd <vscale x 4 x half> undef, undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4F16 = fadd <vscale x 4 x half> undef, undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V8F16 = fadd <vscale x 8 x half> undef, undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V8F16 = fadd <vscale x 8 x half> undef, undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V16F16 = fadd <vscale x 16 x half> undef, undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V16F16 = fadd <vscale x 16 x half> undef, undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V2F32 = fadd <vscale x 2 x float> undef, undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V2F32 = fadd <vscale x 2 x float> undef, undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V4F32 = fadd <vscale x 4 x float> undef, undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4F32 = fadd <vscale x 4 x float> undef, undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V8F32 = fadd <vscale x 8 x float> undef, undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V8F32 = fadd <vscale x 8 x float> undef, undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V2F64 = fadd <vscale x 2 x double> undef, undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V2F64 = fadd <vscale x 2 x double> undef, undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V4F64 = fadd <vscale x 4 x double> undef, undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V4F64 = fadd <vscale x 4 x double> undef, undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	;			;
	%V4F16 = fadd <vscale x 4 x half> undef, undef			%V4F16 = fadd <vscale x 4 x half> undef, undef
	%V8F16 = fadd <vscale x 8 x half> undef, undef			%V8F16 = fadd <vscale x 8 x half> undef, undef
	%V16F16 = fadd <vscale x 16 x half> undef, undef			%V16F16 = fadd <vscale x 16 x half> undef, undef

	%V2F32 = fadd <vscale x 2 x float> undef, undef			%V2F32 = fadd <vscale x 2 x float> undef, undef
	%V4F32 = fadd <vscale x 4 x float> undef, undef			%V4F32 = fadd <vscale x 4 x float> undef, undef
	%V8F32 = fadd <vscale x 8 x float> undef, undef			%V8F32 = fadd <vscale x 8 x float> undef, undef

	%V2F64 = fadd <vscale x 2 x double> undef, undef			%V2F64 = fadd <vscale x 2 x double> undef, undef
	%V4F64 = fadd <vscale x 4 x double> undef, undef			%V4F64 = fadd <vscale x 4 x double> undef, undef

	ret void			ret void
	}			}

	define void @fsub() {			define void @fsub() {
	; CHECK-LABEL: 'fsub'			; CHECK-LABEL: 'fsub'
	; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V4F16 = fsub <vscale x 4 x half> undef, undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4F16 = fsub <vscale x 4 x half> undef, undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V8F16 = fsub <vscale x 8 x half> undef, undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V8F16 = fsub <vscale x 8 x half> undef, undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V16F16 = fsub <vscale x 16 x half> undef, undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V16F16 = fsub <vscale x 16 x half> undef, undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V2F32 = fsub <vscale x 2 x float> undef, undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V2F32 = fsub <vscale x 2 x float> undef, undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V4F32 = fsub <vscale x 4 x float> undef, undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4F32 = fsub <vscale x 4 x float> undef, undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V8F32 = fsub <vscale x 8 x float> undef, undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V8F32 = fsub <vscale x 8 x float> undef, undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V2F64 = fsub <vscale x 2 x double> undef, undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V2F64 = fsub <vscale x 2 x double> undef, undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V4F64 = fsub <vscale x 4 x double> undef, undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V4F64 = fsub <vscale x 4 x double> undef, undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	;			;
	%V4F16 = fsub <vscale x 4 x half> undef, undef			%V4F16 = fsub <vscale x 4 x half> undef, undef
	%V8F16 = fsub <vscale x 8 x half> undef, undef			%V8F16 = fsub <vscale x 8 x half> undef, undef
	%V16F16 = fsub <vscale x 16 x half> undef, undef			%V16F16 = fsub <vscale x 16 x half> undef, undef

	%V2F32 = fsub <vscale x 2 x float> undef, undef			%V2F32 = fsub <vscale x 2 x float> undef, undef
	%V4F32 = fsub <vscale x 4 x float> undef, undef			%V4F32 = fsub <vscale x 4 x float> undef, undef
	%V8F32 = fsub <vscale x 8 x float> undef, undef			%V8F32 = fsub <vscale x 8 x float> undef, undef

	%V2F64 = fsub <vscale x 2 x double> undef, undef			%V2F64 = fsub <vscale x 2 x double> undef, undef
	%V4F64 = fsub <vscale x 4 x double> undef, undef			%V4F64 = fsub <vscale x 4 x double> undef, undef

	ret void			ret void
	}			}

	define void @fneg() {			define void @fneg() {
	; CHECK-LABEL: 'fneg'			; CHECK-LABEL: 'fneg'
	; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V2F16 = fneg <vscale x 2 x half> undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V2F16 = fneg <vscale x 2 x half> undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V4F16 = fneg <vscale x 4 x half> undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4F16 = fneg <vscale x 4 x half> undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V8F16 = fneg <vscale x 8 x half> undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V8F16 = fneg <vscale x 8 x half> undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V16F16 = fneg <vscale x 16 x half> undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V16F16 = fneg <vscale x 16 x half> undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V2F32 = fneg <vscale x 2 x float> undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V2F32 = fneg <vscale x 2 x float> undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V4F32 = fneg <vscale x 4 x float> undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4F32 = fneg <vscale x 4 x float> undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V8F32 = fneg <vscale x 8 x float> undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V8F32 = fneg <vscale x 8 x float> undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V2F64 = fneg <vscale x 2 x double> undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V2F64 = fneg <vscale x 2 x double> undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V4F64 = fneg <vscale x 4 x double> undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V4F64 = fneg <vscale x 4 x double> undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	;			;
	%V2F16 = fneg <vscale x 2 x half> undef			%V2F16 = fneg <vscale x 2 x half> undef
	%V4F16 = fneg <vscale x 4 x half> undef			%V4F16 = fneg <vscale x 4 x half> undef
	%V8F16 = fneg <vscale x 8 x half> undef			%V8F16 = fneg <vscale x 8 x half> undef
	%V16F16 = fneg <vscale x 16 x half> undef			%V16F16 = fneg <vscale x 16 x half> undef

	%V2F32 = fneg <vscale x 2 x float> undef			%V2F32 = fneg <vscale x 2 x float> undef
	▲ Show 20 Lines • Show All 86 Lines • Show Last 20 Lines

llvm/test/Analysis/CostModel/AArch64/arith-fp.ll

	; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py
	; RUN: opt < %s -enable-no-nans-fp-math -passes="print<cost-model>" 2>&1 -disable-output -mtriple=aarch64 -mattr=+fullfp16 \| FileCheck %s			; RUN: opt < %s -enable-no-nans-fp-math -passes="print<cost-model>" 2>&1 -disable-output -mtriple=aarch64 -mattr=+fullfp16 \| FileCheck %s

	target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"

	define i32 @fadd(i32 %arg) {			define i32 @fadd(i32 %arg) {
	; CHECK-LABEL: 'fadd'			; CHECK-LABEL: 'fadd'
	; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %F16 = fadd half undef, undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %F16 = fadd half undef, undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V4F16 = fadd <4 x half> undef, undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4F16 = fadd <4 x half> undef, undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V8F16 = fadd <8 x half> undef, undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V8F16 = fadd <8 x half> undef, undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V16F16 = fadd <16 x half> undef, undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V16F16 = fadd <16 x half> undef, undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %F32 = fadd float undef, undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %F32 = fadd float undef, undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V2F32 = fadd <2 x float> undef, undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V2F32 = fadd <2 x float> undef, undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V4F32 = fadd <4 x float> undef, undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4F32 = fadd <4 x float> undef, undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V8F32 = fadd <8 x float> undef, undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V8F32 = fadd <8 x float> undef, undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %F64 = fadd double undef, undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %F64 = fadd double undef, undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V2F64 = fadd <2 x double> undef, undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V2F64 = fadd <2 x double> undef, undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V4F64 = fadd <4 x double> undef, undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V4F64 = fadd <4 x double> undef, undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
	;			;
	%F16 = fadd half undef, undef			%F16 = fadd half undef, undef
	%V4F16 = fadd <4 x half> undef, undef			%V4F16 = fadd <4 x half> undef, undef
	%V8F16 = fadd <8 x half> undef, undef			%V8F16 = fadd <8 x half> undef, undef
	%V16F16 = fadd <16 x half> undef, undef			%V16F16 = fadd <16 x half> undef, undef

	%F32 = fadd float undef, undef			%F32 = fadd float undef, undef
	%V2F32 = fadd <2 x float> undef, undef			%V2F32 = fadd <2 x float> undef, undef
	%V4F32 = fadd <4 x float> undef, undef			%V4F32 = fadd <4 x float> undef, undef
	%V8F32 = fadd <8 x float> undef, undef			%V8F32 = fadd <8 x float> undef, undef

	%F64 = fadd double undef, undef			%F64 = fadd double undef, undef
	%V2F64 = fadd <2 x double> undef, undef			%V2F64 = fadd <2 x double> undef, undef
	%V4F64 = fadd <4 x double> undef, undef			%V4F64 = fadd <4 x double> undef, undef

	ret i32 undef			ret i32 undef
	}			}

	define i32 @fsub(i32 %arg) {			define i32 @fsub(i32 %arg) {
	; CHECK-LABEL: 'fsub'			; CHECK-LABEL: 'fsub'
	; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %F16 = fsub half undef, undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %F16 = fsub half undef, undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V4F16 = fsub <4 x half> undef, undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4F16 = fsub <4 x half> undef, undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V8F16 = fsub <8 x half> undef, undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V8F16 = fsub <8 x half> undef, undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V16F16 = fsub <16 x half> undef, undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V16F16 = fsub <16 x half> undef, undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %F32 = fsub float undef, undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %F32 = fsub float undef, undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V2F32 = fsub <2 x float> undef, undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V2F32 = fsub <2 x float> undef, undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V4F32 = fsub <4 x float> undef, undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4F32 = fsub <4 x float> undef, undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V8F32 = fsub <8 x float> undef, undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V8F32 = fsub <8 x float> undef, undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %F64 = fsub double undef, undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %F64 = fsub double undef, undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V2F64 = fsub <2 x double> undef, undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V2F64 = fsub <2 x double> undef, undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V4F64 = fsub <4 x double> undef, undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V4F64 = fsub <4 x double> undef, undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
	;			;
	%F16 = fsub half undef, undef			%F16 = fsub half undef, undef
	%V4F16 = fsub <4 x half> undef, undef			%V4F16 = fsub <4 x half> undef, undef
	%V8F16 = fsub <8 x half> undef, undef			%V8F16 = fsub <8 x half> undef, undef
	%V16F16 = fsub <16 x half> undef, undef			%V16F16 = fsub <16 x half> undef, undef

	%F32 = fsub float undef, undef			%F32 = fsub float undef, undef
	%V2F32 = fsub <2 x float> undef, undef			%V2F32 = fsub <2 x float> undef, undef
	%V4F32 = fsub <4 x float> undef, undef			%V4F32 = fsub <4 x float> undef, undef
	%V8F32 = fsub <8 x float> undef, undef			%V8F32 = fsub <8 x float> undef, undef

	%F64 = fsub double undef, undef			%F64 = fsub double undef, undef
	%V2F64 = fsub <2 x double> undef, undef			%V2F64 = fsub <2 x double> undef, undef
	%V4F64 = fsub <4 x double> undef, undef			%V4F64 = fsub <4 x double> undef, undef

	ret i32 undef			ret i32 undef
	}			}

	define i32 @fneg_idiom(i32 %arg) {			define i32 @fneg_idiom(i32 %arg) {
	; CHECK-LABEL: 'fneg_idiom'			; CHECK-LABEL: 'fneg_idiom'
	; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %F16 = fsub half 0xH8000, undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %F16 = fsub half 0xH8000, undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V4F16 = fsub <4 x half> <half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000>, undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4F16 = fsub <4 x half> <half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000>, undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V8F16 = fsub <8 x half> <half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000>, undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V8F16 = fsub <8 x half> <half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000>, undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %F32 = fsub float -0.000000e+00, undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %F32 = fsub float -0.000000e+00, undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V2F32 = fsub <2 x float> <float -0.000000e+00, float -0.000000e+00>, undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V2F32 = fsub <2 x float> <float -0.000000e+00, float -0.000000e+00>, undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V4F32 = fsub <4 x float> <float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00>, undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4F32 = fsub <4 x float> <float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00>, undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V8F32 = fsub <8 x float> <float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00>, undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V8F32 = fsub <8 x float> <float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00>, undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %F64 = fsub double -0.000000e+00, undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %F64 = fsub double -0.000000e+00, undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V2F64 = fsub <2 x double> <double -0.000000e+00, double -0.000000e+00>, undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V2F64 = fsub <2 x double> <double -0.000000e+00, double -0.000000e+00>, undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V4F64 = fsub <4 x double> <double -0.000000e+00, double -0.000000e+00, double -0.000000e+00, double -0.000000e+00>, undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V4F64 = fsub <4 x double> <double -0.000000e+00, double -0.000000e+00, double -0.000000e+00, double -0.000000e+00>, undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
	;			;
	%F16 = fsub half -0.0, undef			%F16 = fsub half -0.0, undef
	%V4F16 = fsub <4 x half> <half -0.0, half -0.0, half -0.0, half -0.0>, undef			%V4F16 = fsub <4 x half> <half -0.0, half -0.0, half -0.0, half -0.0>, undef
	%V8F16 = fsub <8 x half> <half -0.0, half -0.0, half -0.0, half -0.0, half -0.0, half -0.0, half -0.0, half -0.0>, undef			%V8F16 = fsub <8 x half> <half -0.0, half -0.0, half -0.0, half -0.0, half -0.0, half -0.0, half -0.0, half -0.0>, undef

	%F32 = fsub float -0.0, undef			%F32 = fsub float -0.0, undef
	%V2F32 = fsub <2 x float> <float -0.0, float -0.0>, undef			%V2F32 = fsub <2 x float> <float -0.0, float -0.0>, undef
	%V4F32 = fsub <4 x float> <float -0.0, float -0.0, float -0.0, float -0.0>, undef			%V4F32 = fsub <4 x float> <float -0.0, float -0.0, float -0.0, float -0.0>, undef
	%V8F32 = fsub <8 x float> <float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0>, undef			%V8F32 = fsub <8 x float> <float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0, float -0.0>, undef

	%F64 = fsub double -0.0, undef			%F64 = fsub double -0.0, undef
	%V2F64 = fsub <2 x double> <double -0.0, double -0.0>, undef			%V2F64 = fsub <2 x double> <double -0.0, double -0.0>, undef
	%V4F64 = fsub <4 x double> <double -0.0, double -0.0, double -0.0, double -0.0>, undef			%V4F64 = fsub <4 x double> <double -0.0, double -0.0, double -0.0, double -0.0>, undef

	ret i32 undef			ret i32 undef
	}			}

	define i32 @fneg(i32 %arg) {			define i32 @fneg(i32 %arg) {
	; CHECK-LABEL: 'fneg'			; CHECK-LABEL: 'fneg'
	; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %F16 = fneg half undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %F16 = fneg half undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V2F16 = fneg <2 x half> undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V2F16 = fneg <2 x half> undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V4F16 = fneg <4 x half> undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4F16 = fneg <4 x half> undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V8F16 = fneg <8 x half> undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V8F16 = fneg <8 x half> undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V16F16 = fneg <16 x half> undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V16F16 = fneg <16 x half> undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %F32 = fneg float undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %F32 = fneg float undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V2F32 = fneg <2 x float> undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V2F32 = fneg <2 x float> undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V4F32 = fneg <4 x float> undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V4F32 = fneg <4 x float> undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V8F32 = fneg <8 x float> undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V8F32 = fneg <8 x float> undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %F64 = fneg double undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %F64 = fneg double undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V2F64 = fneg <2 x double> undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V2F64 = fneg <2 x double> undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V4F64 = fneg <4 x double> undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V4F64 = fneg <4 x double> undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
	;			;
	%F16 = fneg half undef			%F16 = fneg half undef
	%V2F16 = fneg <2 x half> undef			%V2F16 = fneg <2 x half> undef
	%V4F16 = fneg <4 x half> undef			%V4F16 = fneg <4 x half> undef
	%V8F16 = fneg <8 x half> undef			%V8F16 = fneg <8 x half> undef
	%V16F16 = fneg <16 x half> undef			%V16F16 = fneg <16 x half> undef

	▲ Show 20 Lines • Show All 291 Lines • Show Last 20 Lines

llvm/test/Analysis/CostModel/AArch64/cast.ll

Show All 32 Lines
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s2i8i64 = sext <2 x i8> undef to <2 x i64>		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s2i8i64 = sext <2 x i8> undef to <2 x i64>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %z2i8i64 = zext <2 x i8> undef to <2 x i64>		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %z2i8i64 = zext <2 x i8> undef to <2 x i64>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s2i16i32 = sext <2 x i16> undef to <2 x i32>		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s2i16i32 = sext <2 x i16> undef to <2 x i32>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %z2i16i32 = zext <2 x i16> undef to <2 x i32>		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %z2i16i32 = zext <2 x i16> undef to <2 x i32>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s2i16i64 = sext <2 x i16> undef to <2 x i64>		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s2i16i64 = sext <2 x i16> undef to <2 x i64>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %z2i16i64 = zext <2 x i16> undef to <2 x i64>		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %z2i16i64 = zext <2 x i16> undef to <2 x i64>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s2i32i64 = sext <2 x i32> undef to <2 x i64>		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s2i32i64 = sext <2 x i32> undef to <2 x i64>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %z2i32i64 = zext <2 x i32> undef to <2 x i64>		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %z2i32i64 = zext <2 x i32> undef to <2 x i64>
		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %z4i1i32 = zext <4 x i1> undef to <4 x i32>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s4i8i16 = sext <4 x i8> undef to <4 x i16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s4i8i16 = sext <4 x i8> undef to <4 x i16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %z4i8i16 = zext <4 x i8> undef to <4 x i16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %z4i8i16 = zext <4 x i8> undef to <4 x i16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s4i8i32 = sext <4 x i8> undef to <4 x i32>		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s4i8i32 = sext <4 x i8> undef to <4 x i32>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %z4i8i32 = zext <4 x i8> undef to <4 x i32>		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %z4i8i32 = zext <4 x i8> undef to <4 x i32>
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %s4i8i64 = sext <4 x i8> undef to <4 x i64>		; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %s4i8i64 = sext <4 x i8> undef to <4 x i64>
; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %z4i8i64 = zext <4 x i8> undef to <4 x i64>		; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %z4i8i64 = zext <4 x i8> undef to <4 x i64>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s4i16i32 = sext <4 x i16> undef to <4 x i32>		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s4i16i32 = sext <4 x i16> undef to <4 x i32>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %z4i16i32 = zext <4 x i16> undef to <4 x i32>		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %z4i16i32 = zext <4 x i16> undef to <4 x i32>
▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines	;
%z2i8i64 = zext <2 x i8> undef to <2 x i64>		%z2i8i64 = zext <2 x i8> undef to <2 x i64>
%s2i16i32 = sext <2 x i16> undef to <2 x i32>		%s2i16i32 = sext <2 x i16> undef to <2 x i32>
%z2i16i32 = zext <2 x i16> undef to <2 x i32>		%z2i16i32 = zext <2 x i16> undef to <2 x i32>
%s2i16i64 = sext <2 x i16> undef to <2 x i64>		%s2i16i64 = sext <2 x i16> undef to <2 x i64>
%z2i16i64 = zext <2 x i16> undef to <2 x i64>		%z2i16i64 = zext <2 x i16> undef to <2 x i64>
%s2i32i64 = sext <2 x i32> undef to <2 x i64>		%s2i32i64 = sext <2 x i32> undef to <2 x i64>
%z2i32i64 = zext <2 x i32> undef to <2 x i64>		%z2i32i64 = zext <2 x i32> undef to <2 x i64>

		%z4i1i32 = zext <4 x i1> undef to <4 x i32>


%s4i8i16 = sext <4 x i8> undef to <4 x i16>		%s4i8i16 = sext <4 x i8> undef to <4 x i16>
%z4i8i16 = zext <4 x i8> undef to <4 x i16>		%z4i8i16 = zext <4 x i8> undef to <4 x i16>
%s4i8i32 = sext <4 x i8> undef to <4 x i32>		%s4i8i32 = sext <4 x i8> undef to <4 x i32>
%z4i8i32 = zext <4 x i8> undef to <4 x i32>		%z4i8i32 = zext <4 x i8> undef to <4 x i32>
%s4i8i64 = sext <4 x i8> undef to <4 x i64>		%s4i8i64 = sext <4 x i8> undef to <4 x i64>
%z4i8i64 = zext <4 x i8> undef to <4 x i64>		%z4i8i64 = zext <4 x i8> undef to <4 x i64>
%s4i16i32 = sext <4 x i16> undef to <4 x i32>		%s4i16i32 = sext <4 x i16> undef to <4 x i32>
%z4i16i32 = zext <4 x i16> undef to <4 x i32>		%z4i16i32 = zext <4 x i16> undef to <4 x i32>
▲ Show 20 Lines • Show All 1,091 Lines • Show Last 20 Lines

llvm/test/Analysis/CostModel/AArch64/reduce-fadd.ll

	; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py
	; RUN: opt -passes='print<cost-model>' 2>&1 -disable-output -mtriple=aarch64--linux-gnu < %s \| FileCheck %s			; RUN: opt -passes='print<cost-model>' 2>&1 -disable-output -mtriple=aarch64--linux-gnu < %s \| FileCheck %s
	; RUN: opt -passes='print<cost-model>' 2>&1 -disable-output -mtriple=aarch64--linux-gnu -mattr=+fullfp16 < %s \| FileCheck %s			; RUN: opt -passes='print<cost-model>' 2>&1 -disable-output -mtriple=aarch64--linux-gnu -mattr=+fullfp16 < %s \| FileCheck %s --check-prefix=FP16
				; RUN: opt -passes='print<cost-model>' 2>&1 -disable-output -mtriple=aarch64--linux-gnu -mattr=+bf16 < %s \| FileCheck %s --check-prefix=BF16

	target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"

	define void @strict_fp_reductions() {			define void @strict_fp_reductions() {
	; CHECK-LABEL: 'strict_fp_reductions'			; CHECK-LABEL: 'strict_fp_reductions'
	; CHECK-NEXT: Cost Model: Found an estimated cost of 21 for instruction: %fadd_v4f16 = call half @llvm.vector.reduce.fadd.v4f16(half 0xH0000, <4 x half> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 21 for instruction: %fadd_v4f16 = call half @llvm.vector.reduce.fadd.v4f16(half 0xH0000, <4 x half> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 45 for instruction: %fadd_v8f16 = call half @llvm.vector.reduce.fadd.v8f16(half 0xH0000, <8 x half> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 45 for instruction: %fadd_v8f16 = call half @llvm.vector.reduce.fadd.v8f16(half 0xH0000, <8 x half> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 21 for instruction: %fadd_v4f32 = call float @llvm.vector.reduce.fadd.v4f32(float 0.000000e+00, <4 x float> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 17 for instruction: %fadd_v4f32 = call float @llvm.vector.reduce.fadd.v4f32(float 0.000000e+00, <4 x float> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 42 for instruction: %fadd_v8f32 = call float @llvm.vector.reduce.fadd.v8f32(float 0.000000e+00, <8 x float> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 34 for instruction: %fadd_v8f32 = call float @llvm.vector.reduce.fadd.v8f32(float 0.000000e+00, <8 x float> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %fadd_v2f64 = call double @llvm.vector.reduce.fadd.v2f64(double 0.000000e+00, <2 x double> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %fadd_v2f64 = call double @llvm.vector.reduce.fadd.v2f64(double 0.000000e+00, <2 x double> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 18 for instruction: %fadd_v4f64 = call double @llvm.vector.reduce.fadd.v4f64(double 0.000000e+00, <4 x double> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %fadd_v4f64 = call double @llvm.vector.reduce.fadd.v4f64(double 0.000000e+00, <4 x double> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %fadd_v4f8 = call bfloat @llvm.vector.reduce.fadd.v4bf16(bfloat 0xR0000, <4 x bfloat> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %fadd_v4f8 = call bfloat @llvm.vector.reduce.fadd.v4bf16(bfloat 0xR0000, <4 x bfloat> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %fadd_v4f128 = call fp128 @llvm.vector.reduce.fadd.v4f128(fp128 undef, <4 x fp128> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %fadd_v4f128 = call fp128 @llvm.vector.reduce.fadd.v4f128(fp128 undef, <4 x fp128> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	;			;
				; FP16-LABEL: 'strict_fp_reductions'
				; FP16-NEXT: Cost Model: Found an estimated cost of 17 for instruction: %fadd_v4f16 = call half @llvm.vector.reduce.fadd.v4f16(half 0xH0000, <4 x half> undef)
				; FP16-NEXT: Cost Model: Found an estimated cost of 37 for instruction: %fadd_v8f16 = call half @llvm.vector.reduce.fadd.v8f16(half 0xH0000, <8 x half> undef)
				; FP16-NEXT: Cost Model: Found an estimated cost of 17 for instruction: %fadd_v4f32 = call float @llvm.vector.reduce.fadd.v4f32(float 0.000000e+00, <4 x float> undef)
				; FP16-NEXT: Cost Model: Found an estimated cost of 34 for instruction: %fadd_v8f32 = call float @llvm.vector.reduce.fadd.v8f32(float 0.000000e+00, <8 x float> undef)
				; FP16-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %fadd_v2f64 = call double @llvm.vector.reduce.fadd.v2f64(double 0.000000e+00, <2 x double> undef)
				; FP16-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %fadd_v4f64 = call double @llvm.vector.reduce.fadd.v4f64(double 0.000000e+00, <4 x double> undef)
				; FP16-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %fadd_v4f8 = call bfloat @llvm.vector.reduce.fadd.v4bf16(bfloat 0xR0000, <4 x bfloat> undef)
				; FP16-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %fadd_v4f128 = call fp128 @llvm.vector.reduce.fadd.v4f128(fp128 undef, <4 x fp128> undef)
				; FP16-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
				;
				; BF16-LABEL: 'strict_fp_reductions'
				; BF16-NEXT: Cost Model: Found an estimated cost of 21 for instruction: %fadd_v4f16 = call half @llvm.vector.reduce.fadd.v4f16(half 0xH0000, <4 x half> undef)
				; BF16-NEXT: Cost Model: Found an estimated cost of 45 for instruction: %fadd_v8f16 = call half @llvm.vector.reduce.fadd.v8f16(half 0xH0000, <8 x half> undef)
				; BF16-NEXT: Cost Model: Found an estimated cost of 17 for instruction: %fadd_v4f32 = call float @llvm.vector.reduce.fadd.v4f32(float 0.000000e+00, <4 x float> undef)
				; BF16-NEXT: Cost Model: Found an estimated cost of 34 for instruction: %fadd_v8f32 = call float @llvm.vector.reduce.fadd.v8f32(float 0.000000e+00, <8 x float> undef)
				; BF16-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %fadd_v2f64 = call double @llvm.vector.reduce.fadd.v2f64(double 0.000000e+00, <2 x double> undef)
				; BF16-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %fadd_v4f64 = call double @llvm.vector.reduce.fadd.v4f64(double 0.000000e+00, <4 x double> undef)
				; BF16-NEXT: Cost Model: Found an estimated cost of 17 for instruction: %fadd_v4f8 = call bfloat @llvm.vector.reduce.fadd.v4bf16(bfloat 0xR0000, <4 x bfloat> undef)
				; BF16-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %fadd_v4f128 = call fp128 @llvm.vector.reduce.fadd.v4f128(fp128 undef, <4 x fp128> undef)
				; BF16-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
				;
	%fadd_v4f16 = call half @llvm.vector.reduce.fadd.v4f16(half 0.0, <4 x half> undef)			%fadd_v4f16 = call half @llvm.vector.reduce.fadd.v4f16(half 0.0, <4 x half> undef)
	%fadd_v8f16 = call half @llvm.vector.reduce.fadd.v8f16(half 0.0, <8 x half> undef)			%fadd_v8f16 = call half @llvm.vector.reduce.fadd.v8f16(half 0.0, <8 x half> undef)
	%fadd_v4f32 = call float @llvm.vector.reduce.fadd.v4f32(float 0.0, <4 x float> undef)			%fadd_v4f32 = call float @llvm.vector.reduce.fadd.v4f32(float 0.0, <4 x float> undef)
	%fadd_v8f32 = call float @llvm.vector.reduce.fadd.v8f32(float 0.0, <8 x float> undef)			%fadd_v8f32 = call float @llvm.vector.reduce.fadd.v8f32(float 0.0, <8 x float> undef)
	%fadd_v2f64 = call double @llvm.vector.reduce.fadd.v2f64(double 0.0, <2 x double> undef)			%fadd_v2f64 = call double @llvm.vector.reduce.fadd.v2f64(double 0.0, <2 x double> undef)
	%fadd_v4f64 = call double @llvm.vector.reduce.fadd.v4f64(double 0.0, <4 x double> undef)			%fadd_v4f64 = call double @llvm.vector.reduce.fadd.v4f64(double 0.0, <4 x double> undef)
	%fadd_v4f8 = call bfloat @llvm.vector.reduce.fadd.v4f8(bfloat 0.0, <4 x bfloat> undef)			%fadd_v4f8 = call bfloat @llvm.vector.reduce.fadd.v4f8(bfloat 0.0, <4 x bfloat> undef)
	%fadd_v4f128 = call fp128 @llvm.vector.reduce.fadd.v4f128(fp128 undef, <4 x fp128> undef)			%fadd_v4f128 = call fp128 @llvm.vector.reduce.fadd.v4f128(fp128 undef, <4 x fp128> undef)

	ret void			ret void
	}			}


	define void @fast_fp_reductions() {			define void @fast_fp_reductions() {
	; CHECK-LABEL: 'fast_fp_reductions'			; CHECK-LABEL: 'fast_fp_reductions'
	; CHECK-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %fadd_v4f16_fast = call fast half @llvm.vector.reduce.fadd.v4f16(half 0xH0000, <4 x half> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %fadd_v4f16_fast = call fast half @llvm.vector.reduce.fadd.v4f16(half 0xH0000, <4 x half> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %fadd_v4f16_reassoc = call reassoc half @llvm.vector.reduce.fadd.v4f16(half 0xH0000, <4 x half> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %fadd_v4f16_reassoc = call reassoc half @llvm.vector.reduce.fadd.v4f16(half 0xH0000, <4 x half> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 30 for instruction: %fadd_v8f16 = call fast half @llvm.vector.reduce.fadd.v8f16(half 0xH0000, <8 x half> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 30 for instruction: %fadd_v8f16 = call fast half @llvm.vector.reduce.fadd.v8f16(half 0xH0000, <8 x half> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 30 for instruction: %fadd_v8f16_reassoc = call reassoc half @llvm.vector.reduce.fadd.v8f16(half 0xH0000, <8 x half> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 30 for instruction: %fadd_v8f16_reassoc = call reassoc half @llvm.vector.reduce.fadd.v8f16(half 0xH0000, <8 x half> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 46 for instruction: %fadd_v11f16 = call fast half @llvm.vector.reduce.fadd.v11f16(half 0xH0000, <11 x half> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 46 for instruction: %fadd_v11f16 = call fast half @llvm.vector.reduce.fadd.v11f16(half 0xH0000, <11 x half> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 52 for instruction: %fadd_v13f16_reassoc = call reassoc half @llvm.vector.reduce.fadd.v13f16(half 0xH0000, <13 x half> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 52 for instruction: %fadd_v13f16_reassoc = call reassoc half @llvm.vector.reduce.fadd.v13f16(half 0xH0000, <13 x half> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %fadd_v4f32 = call fast float @llvm.vector.reduce.fadd.v4f32(float 0.000000e+00, <4 x float> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %fadd_v4f32 = call fast float @llvm.vector.reduce.fadd.v4f32(float 0.000000e+00, <4 x float> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %fadd_v4f32_reassoc = call reassoc float @llvm.vector.reduce.fadd.v4f32(float 0.000000e+00, <4 x float> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %fadd_v4f32_reassoc = call reassoc float @llvm.vector.reduce.fadd.v4f32(float 0.000000e+00, <4 x float> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 30 for instruction: %fadd_v8f32 = call fast float @llvm.vector.reduce.fadd.v8f32(float 0.000000e+00, <8 x float> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 27 for instruction: %fadd_v8f32 = call fast float @llvm.vector.reduce.fadd.v8f32(float 0.000000e+00, <8 x float> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 30 for instruction: %fadd_v8f32_reassoc = call reassoc float @llvm.vector.reduce.fadd.v8f32(float 0.000000e+00, <8 x float> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 27 for instruction: %fadd_v8f32_reassoc = call reassoc float @llvm.vector.reduce.fadd.v8f32(float 0.000000e+00, <8 x float> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 50 for instruction: %fadd_v13f32 = call fast float @llvm.vector.reduce.fadd.v13f32(float 0.000000e+00, <13 x float> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 46 for instruction: %fadd_v13f32 = call fast float @llvm.vector.reduce.fadd.v13f32(float 0.000000e+00, <13 x float> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %fadd_v5f32_reassoc = call reassoc float @llvm.vector.reduce.fadd.v5f32(float 0.000000e+00, <5 x float> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %fadd_v5f32_reassoc = call reassoc float @llvm.vector.reduce.fadd.v5f32(float 0.000000e+00, <5 x float> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %fadd_v2f64 = call fast double @llvm.vector.reduce.fadd.v2f64(double 0.000000e+00, <2 x double> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %fadd_v2f64 = call fast double @llvm.vector.reduce.fadd.v2f64(double 0.000000e+00, <2 x double> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %fadd_v2f64_reassoc = call reassoc double @llvm.vector.reduce.fadd.v2f64(double 0.000000e+00, <2 x double> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %fadd_v2f64_reassoc = call reassoc double @llvm.vector.reduce.fadd.v2f64(double 0.000000e+00, <2 x double> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %fadd_v4f64 = call fast double @llvm.vector.reduce.fadd.v4f64(double 0.000000e+00, <4 x double> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %fadd_v4f64 = call fast double @llvm.vector.reduce.fadd.v4f64(double 0.000000e+00, <4 x double> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %fadd_v4f64_reassoc = call reassoc double @llvm.vector.reduce.fadd.v4f64(double 0.000000e+00, <4 x double> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %fadd_v4f64_reassoc = call reassoc double @llvm.vector.reduce.fadd.v4f64(double 0.000000e+00, <4 x double> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 18 for instruction: %fadd_v7f64 = call fast double @llvm.vector.reduce.fadd.v7f64(double 0.000000e+00, <7 x double> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 15 for instruction: %fadd_v7f64 = call fast double @llvm.vector.reduce.fadd.v7f64(double 0.000000e+00, <7 x double> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 27 for instruction: %fadd_v9f64_reassoc = call reassoc double @llvm.vector.reduce.fadd.v9f64(double 0.000000e+00, <9 x double> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 23 for instruction: %fadd_v9f64_reassoc = call reassoc double @llvm.vector.reduce.fadd.v9f64(double 0.000000e+00, <9 x double> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %fadd_v4f8 = call reassoc bfloat @llvm.vector.reduce.fadd.v4bf16(bfloat 0xR8000, <4 x bfloat> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %fadd_v4f8 = call reassoc bfloat @llvm.vector.reduce.fadd.v4bf16(bfloat 0xR8000, <4 x bfloat> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %fadd_v4f128 = call reassoc fp128 @llvm.vector.reduce.fadd.v4f128(fp128 undef, <4 x fp128> undef)			; CHECK-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %fadd_v4f128 = call reassoc fp128 @llvm.vector.reduce.fadd.v4f128(fp128 undef, <4 x fp128> undef)
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	;			;
				; FP16-LABEL: 'fast_fp_reductions'
				; FP16-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %fadd_v4f16_fast = call fast half @llvm.vector.reduce.fadd.v4f16(half 0xH0000, <4 x half> undef)
				; FP16-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %fadd_v4f16_reassoc = call reassoc half @llvm.vector.reduce.fadd.v4f16(half 0xH0000, <4 x half> undef)
				; FP16-NEXT: Cost Model: Found an estimated cost of 27 for instruction: %fadd_v8f16 = call fast half @llvm.vector.reduce.fadd.v8f16(half 0xH0000, <8 x half> undef)
				; FP16-NEXT: Cost Model: Found an estimated cost of 27 for instruction: %fadd_v8f16_reassoc = call reassoc half @llvm.vector.reduce.fadd.v8f16(half 0xH0000, <8 x half> undef)
				; FP16-NEXT: Cost Model: Found an estimated cost of 43 for instruction: %fadd_v11f16 = call fast half @llvm.vector.reduce.fadd.v11f16(half 0xH0000, <11 x half> undef)
				; FP16-NEXT: Cost Model: Found an estimated cost of 49 for instruction: %fadd_v13f16_reassoc = call reassoc half @llvm.vector.reduce.fadd.v13f16(half 0xH0000, <13 x half> undef)
				; FP16-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %fadd_v4f32 = call fast float @llvm.vector.reduce.fadd.v4f32(float 0.000000e+00, <4 x float> undef)
				; FP16-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %fadd_v4f32_reassoc = call reassoc float @llvm.vector.reduce.fadd.v4f32(float 0.000000e+00, <4 x float> undef)
				; FP16-NEXT: Cost Model: Found an estimated cost of 27 for instruction: %fadd_v8f32 = call fast float @llvm.vector.reduce.fadd.v8f32(float 0.000000e+00, <8 x float> undef)
				; FP16-NEXT: Cost Model: Found an estimated cost of 27 for instruction: %fadd_v8f32_reassoc = call reassoc float @llvm.vector.reduce.fadd.v8f32(float 0.000000e+00, <8 x float> undef)
				; FP16-NEXT: Cost Model: Found an estimated cost of 46 for instruction: %fadd_v13f32 = call fast float @llvm.vector.reduce.fadd.v13f32(float 0.000000e+00, <13 x float> undef)
				; FP16-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %fadd_v5f32_reassoc = call reassoc float @llvm.vector.reduce.fadd.v5f32(float 0.000000e+00, <5 x float> undef)
				; FP16-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %fadd_v2f64 = call fast double @llvm.vector.reduce.fadd.v2f64(double 0.000000e+00, <2 x double> undef)
				; FP16-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %fadd_v2f64_reassoc = call reassoc double @llvm.vector.reduce.fadd.v2f64(double 0.000000e+00, <2 x double> undef)
				; FP16-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %fadd_v4f64 = call fast double @llvm.vector.reduce.fadd.v4f64(double 0.000000e+00, <4 x double> undef)
				; FP16-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %fadd_v4f64_reassoc = call reassoc double @llvm.vector.reduce.fadd.v4f64(double 0.000000e+00, <4 x double> undef)
				; FP16-NEXT: Cost Model: Found an estimated cost of 15 for instruction: %fadd_v7f64 = call fast double @llvm.vector.reduce.fadd.v7f64(double 0.000000e+00, <7 x double> undef)
				; FP16-NEXT: Cost Model: Found an estimated cost of 23 for instruction: %fadd_v9f64_reassoc = call reassoc double @llvm.vector.reduce.fadd.v9f64(double 0.000000e+00, <9 x double> undef)
				; FP16-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %fadd_v4f8 = call reassoc bfloat @llvm.vector.reduce.fadd.v4bf16(bfloat 0xR8000, <4 x bfloat> undef)
				; FP16-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %fadd_v4f128 = call reassoc fp128 @llvm.vector.reduce.fadd.v4f128(fp128 undef, <4 x fp128> undef)
				; FP16-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
				;
				; BF16-LABEL: 'fast_fp_reductions'
				; BF16-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %fadd_v4f16_fast = call fast half @llvm.vector.reduce.fadd.v4f16(half 0xH0000, <4 x half> undef)
				; BF16-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %fadd_v4f16_reassoc = call reassoc half @llvm.vector.reduce.fadd.v4f16(half 0xH0000, <4 x half> undef)
				; BF16-NEXT: Cost Model: Found an estimated cost of 30 for instruction: %fadd_v8f16 = call fast half @llvm.vector.reduce.fadd.v8f16(half 0xH0000, <8 x half> undef)
				; BF16-NEXT: Cost Model: Found an estimated cost of 30 for instruction: %fadd_v8f16_reassoc = call reassoc half @llvm.vector.reduce.fadd.v8f16(half 0xH0000, <8 x half> undef)
				; BF16-NEXT: Cost Model: Found an estimated cost of 46 for instruction: %fadd_v11f16 = call fast half @llvm.vector.reduce.fadd.v11f16(half 0xH0000, <11 x half> undef)
				; BF16-NEXT: Cost Model: Found an estimated cost of 52 for instruction: %fadd_v13f16_reassoc = call reassoc half @llvm.vector.reduce.fadd.v13f16(half 0xH0000, <13 x half> undef)
				; BF16-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %fadd_v4f32 = call fast float @llvm.vector.reduce.fadd.v4f32(float 0.000000e+00, <4 x float> undef)
				; BF16-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %fadd_v4f32_reassoc = call reassoc float @llvm.vector.reduce.fadd.v4f32(float 0.000000e+00, <4 x float> undef)
				; BF16-NEXT: Cost Model: Found an estimated cost of 27 for instruction: %fadd_v8f32 = call fast float @llvm.vector.reduce.fadd.v8f32(float 0.000000e+00, <8 x float> undef)
				; BF16-NEXT: Cost Model: Found an estimated cost of 27 for instruction: %fadd_v8f32_reassoc = call reassoc float @llvm.vector.reduce.fadd.v8f32(float 0.000000e+00, <8 x float> undef)
				; BF16-NEXT: Cost Model: Found an estimated cost of 46 for instruction: %fadd_v13f32 = call fast float @llvm.vector.reduce.fadd.v13f32(float 0.000000e+00, <13 x float> undef)
				; BF16-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %fadd_v5f32_reassoc = call reassoc float @llvm.vector.reduce.fadd.v5f32(float 0.000000e+00, <5 x float> undef)
				; BF16-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %fadd_v2f64 = call fast double @llvm.vector.reduce.fadd.v2f64(double 0.000000e+00, <2 x double> undef)
				; BF16-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %fadd_v2f64_reassoc = call reassoc double @llvm.vector.reduce.fadd.v2f64(double 0.000000e+00, <2 x double> undef)
				; BF16-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %fadd_v4f64 = call fast double @llvm.vector.reduce.fadd.v4f64(double 0.000000e+00, <4 x double> undef)
				; BF16-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %fadd_v4f64_reassoc = call reassoc double @llvm.vector.reduce.fadd.v4f64(double 0.000000e+00, <4 x double> undef)
				; BF16-NEXT: Cost Model: Found an estimated cost of 15 for instruction: %fadd_v7f64 = call fast double @llvm.vector.reduce.fadd.v7f64(double 0.000000e+00, <7 x double> undef)
				; BF16-NEXT: Cost Model: Found an estimated cost of 23 for instruction: %fadd_v9f64_reassoc = call reassoc double @llvm.vector.reduce.fadd.v9f64(double 0.000000e+00, <9 x double> undef)
				; BF16-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %fadd_v4f8 = call reassoc bfloat @llvm.vector.reduce.fadd.v4bf16(bfloat 0xR8000, <4 x bfloat> undef)
				; BF16-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %fadd_v4f128 = call reassoc fp128 @llvm.vector.reduce.fadd.v4f128(fp128 undef, <4 x fp128> undef)
				; BF16-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
				;
	%fadd_v4f16_fast = call fast half @llvm.vector.reduce.fadd.v4f16(half 0.0, <4 x half> undef)			%fadd_v4f16_fast = call fast half @llvm.vector.reduce.fadd.v4f16(half 0.0, <4 x half> undef)
	%fadd_v4f16_reassoc = call reassoc half @llvm.vector.reduce.fadd.v4f16(half 0.0, <4 x half> undef)			%fadd_v4f16_reassoc = call reassoc half @llvm.vector.reduce.fadd.v4f16(half 0.0, <4 x half> undef)

	%fadd_v8f16 = call fast half @llvm.vector.reduce.fadd.v8f16(half 0.0, <8 x half> undef)			%fadd_v8f16 = call fast half @llvm.vector.reduce.fadd.v8f16(half 0.0, <8 x half> undef)
	%fadd_v8f16_reassoc = call reassoc half @llvm.vector.reduce.fadd.v8f16(half 0.0, <8 x half> undef)			%fadd_v8f16_reassoc = call reassoc half @llvm.vector.reduce.fadd.v8f16(half 0.0, <8 x half> undef)

	%fadd_v11f16 = call fast half @llvm.vector.reduce.fadd.v11f16(half 0.0, <11 x half> undef)			%fadd_v11f16 = call fast half @llvm.vector.reduce.fadd.v11f16(half 0.0, <11 x half> undef)
	%fadd_v13f16_reassoc = call reassoc half @llvm.vector.reduce.fadd.v13f16(half 0.0, <13 x half> undef)			%fadd_v13f16_reassoc = call reassoc half @llvm.vector.reduce.fadd.v13f16(half 0.0, <13 x half> undef)
	▲ Show 20 Lines • Show All 42 Lines • Show Last 20 Lines

llvm/test/Analysis/CostModel/AArch64/sve-fixed-length.ll

Show All 38 Lines	; CHECK: cost of [[#div(2047,VBITS)+1]] for instruction: %add2048 = add <64 x i32> undef, undef
%add1024 = add <32 x i32> undef, undef		%add1024 = add <32 x i32> undef, undef
%add2048 = add <64 x i32> undef, undef		%add2048 = add <64 x i32> undef, undef

; Using a single vector length, ensure all element types are recognised.		; Using a single vector length, ensure all element types are recognised.
; CHECK: cost of [[#div(511,VBITS)+1]] for instruction: %add512.i8 = add <64 x i8> undef, undef		; CHECK: cost of [[#div(511,VBITS)+1]] for instruction: %add512.i8 = add <64 x i8> undef, undef
; CHECK: cost of [[#div(511,VBITS)+1]] for instruction: %add512.i16 = add <32 x i16> undef, undef		; CHECK: cost of [[#div(511,VBITS)+1]] for instruction: %add512.i16 = add <32 x i16> undef, undef
; CHECK: cost of [[#div(511,VBITS)+1]] for instruction: %add512.i32 = add <16 x i32> undef, undef		; CHECK: cost of [[#div(511,VBITS)+1]] for instruction: %add512.i32 = add <16 x i32> undef, undef
; CHECK: cost of [[#div(511,VBITS)+1]] for instruction: %add512.i64 = add <8 x i64> undef, undef		; CHECK: cost of [[#div(511,VBITS)+1]] for instruction: %add512.i64 = add <8 x i64> undef, undef
; CHECK: cost of [[#mul(div(511,VBITS)+1,2)]] for instruction: %add512.f16 = fadd <32 x half> undef, undef		; CHECK: cost of [[#div(511,VBITS)+1]] for instruction: %add512.f16 = fadd <32 x half> undef, undef
; CHECK: cost of [[#mul(div(511,VBITS)+1,2)]] for instruction: %add512.f32 = fadd <16 x float> undef, undef		; CHECK: cost of [[#div(511,VBITS)+1]] for instruction: %add512.f32 = fadd <16 x float> undef, undef
; CHECK: cost of [[#mul(div(511,VBITS)+1,2)]] for instruction: %add512.f64 = fadd <8 x double> undef, undef		; CHECK: cost of [[#div(511,VBITS)+1]] for instruction: %add512.f64 = fadd <8 x double> undef, undef
%add512.i8 = add <64 x i8> undef, undef		%add512.i8 = add <64 x i8> undef, undef
%add512.i16 = add <32 x i16> undef, undef		%add512.i16 = add <32 x i16> undef, undef
%add512.i32 = add <16 x i32> undef, undef		%add512.i32 = add <16 x i32> undef, undef
%add512.i64 = add <8 x i64> undef, undef		%add512.i64 = add <8 x i64> undef, undef
%add512.f16 = fadd <32 x half> undef, undef		%add512.f16 = fadd <32 x half> undef, undef
%add512.f32 = fadd <16 x float> undef, undef		%add512.f32 = fadd <16 x float> undef, undef
%add512.f64 = fadd <8 x double> undef, undef		%add512.f64 = fadd <8 x double> undef, undef

▲ Show 20 Lines • Show All 113 Lines • Show Last 20 Lines

llvm/test/Analysis/CostModel/AArch64/sve-intrinsics.ll

Show First 20 Lines • Show All 46 Lines • ▼ Show 20 Lines
; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %umin_nxv4i64 = call i64 @llvm.vector.reduce.umin.nxv4i64(<vscale x 4 x i64> %v1)		; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %umin_nxv4i64 = call i64 @llvm.vector.reduce.umin.nxv4i64(<vscale x 4 x i64> %v1)
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %smin_nxv4i32 = call i32 @llvm.vector.reduce.smin.nxv4i32(<vscale x 4 x i32> %v0)		; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %smin_nxv4i32 = call i32 @llvm.vector.reduce.smin.nxv4i32(<vscale x 4 x i32> %v0)
; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %smin_nxv4i64 = call i64 @llvm.vector.reduce.smin.nxv4i64(<vscale x 4 x i64> %v1)		; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %smin_nxv4i64 = call i64 @llvm.vector.reduce.smin.nxv4i64(<vscale x 4 x i64> %v1)
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %umax_nxv4i32 = call i32 @llvm.vector.reduce.umax.nxv4i32(<vscale x 4 x i32> %v0)		; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %umax_nxv4i32 = call i32 @llvm.vector.reduce.umax.nxv4i32(<vscale x 4 x i32> %v0)
; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %umax_nxv4i64 = call i64 @llvm.vector.reduce.umax.nxv4i64(<vscale x 4 x i64> %v1)		; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %umax_nxv4i64 = call i64 @llvm.vector.reduce.umax.nxv4i64(<vscale x 4 x i64> %v1)
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %smax_nxv4i32 = call i32 @llvm.vector.reduce.smax.nxv4i32(<vscale x 4 x i32> %v0)		; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %smax_nxv4i32 = call i32 @llvm.vector.reduce.smax.nxv4i32(<vscale x 4 x i32> %v0)
; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %smax_nxv4i64 = call i64 @llvm.vector.reduce.smax.nxv4i64(<vscale x 4 x i64> %v1)		; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %smax_nxv4i64 = call i64 @llvm.vector.reduce.smax.nxv4i64(<vscale x 4 x i64> %v1)
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %fadd_nxv4f32 = call fast float @llvm.vector.reduce.fadd.nxv4f32(float 0.000000e+00, <vscale x 4 x float> %v2)		; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %fadd_nxv4f32 = call fast float @llvm.vector.reduce.fadd.nxv4f32(float 0.000000e+00, <vscale x 4 x float> %v2)
; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %fadd_nxv4f64 = call fast double @llvm.vector.reduce.fadd.nxv4f64(double 0.000000e+00, <vscale x 4 x double> %v3)		; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %fadd_nxv4f64 = call fast double @llvm.vector.reduce.fadd.nxv4f64(double 0.000000e+00, <vscale x 4 x double> %v3)
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %fmin_nxv4f32 = call fast float @llvm.vector.reduce.fmin.nxv4f32(<vscale x 4 x float> %v2)		; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %fmin_nxv4f32 = call fast float @llvm.vector.reduce.fmin.nxv4f32(<vscale x 4 x float> %v2)
; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %fmin_nxv4f64 = call fast double @llvm.vector.reduce.fmin.nxv4f64(<vscale x 4 x double> %v3)		; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %fmin_nxv4f64 = call fast double @llvm.vector.reduce.fmin.nxv4f64(<vscale x 4 x double> %v3)
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %fmax_nxv4f32 = call fast float @llvm.vector.reduce.fmax.nxv4f32(<vscale x 4 x float> %v2)		; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %fmax_nxv4f32 = call fast float @llvm.vector.reduce.fmax.nxv4f32(<vscale x 4 x float> %v2)
; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %fmax_nxv4f64 = call fast double @llvm.vector.reduce.fmax.nxv4f64(<vscale x 4 x double> %v3)		; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %fmax_nxv4f64 = call fast double @llvm.vector.reduce.fmax.nxv4f64(<vscale x 4 x double> %v3)
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
; TYPE_BASED_ONLY-LABEL: 'reductions'		; TYPE_BASED_ONLY-LABEL: 'reductions'
; TYPE_BASED_ONLY-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %add_nxv4i32 = call i32 @llvm.vector.reduce.add.nxv4i32(<vscale x 4 x i32> %v0)		; TYPE_BASED_ONLY-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %add_nxv4i32 = call i32 @llvm.vector.reduce.add.nxv4i32(<vscale x 4 x i32> %v0)
Show All 10 Lines
; TYPE_BASED_ONLY-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %umin_nxv4i64 = call i64 @llvm.vector.reduce.umin.nxv4i64(<vscale x 4 x i64> %v1)		; TYPE_BASED_ONLY-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %umin_nxv4i64 = call i64 @llvm.vector.reduce.umin.nxv4i64(<vscale x 4 x i64> %v1)
; TYPE_BASED_ONLY-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %smin_nxv4i32 = call i32 @llvm.vector.reduce.smin.nxv4i32(<vscale x 4 x i32> %v0)		; TYPE_BASED_ONLY-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %smin_nxv4i32 = call i32 @llvm.vector.reduce.smin.nxv4i32(<vscale x 4 x i32> %v0)
; TYPE_BASED_ONLY-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %smin_nxv4i64 = call i64 @llvm.vector.reduce.smin.nxv4i64(<vscale x 4 x i64> %v1)		; TYPE_BASED_ONLY-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %smin_nxv4i64 = call i64 @llvm.vector.reduce.smin.nxv4i64(<vscale x 4 x i64> %v1)
; TYPE_BASED_ONLY-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %umax_nxv4i32 = call i32 @llvm.vector.reduce.umax.nxv4i32(<vscale x 4 x i32> %v0)		; TYPE_BASED_ONLY-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %umax_nxv4i32 = call i32 @llvm.vector.reduce.umax.nxv4i32(<vscale x 4 x i32> %v0)
; TYPE_BASED_ONLY-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %umax_nxv4i64 = call i64 @llvm.vector.reduce.umax.nxv4i64(<vscale x 4 x i64> %v1)		; TYPE_BASED_ONLY-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %umax_nxv4i64 = call i64 @llvm.vector.reduce.umax.nxv4i64(<vscale x 4 x i64> %v1)
; TYPE_BASED_ONLY-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %smax_nxv4i32 = call i32 @llvm.vector.reduce.smax.nxv4i32(<vscale x 4 x i32> %v0)		; TYPE_BASED_ONLY-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %smax_nxv4i32 = call i32 @llvm.vector.reduce.smax.nxv4i32(<vscale x 4 x i32> %v0)
; TYPE_BASED_ONLY-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %smax_nxv4i64 = call i64 @llvm.vector.reduce.smax.nxv4i64(<vscale x 4 x i64> %v1)		; TYPE_BASED_ONLY-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %smax_nxv4i64 = call i64 @llvm.vector.reduce.smax.nxv4i64(<vscale x 4 x i64> %v1)
; TYPE_BASED_ONLY-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %fadd_nxv4f32 = call fast float @llvm.vector.reduce.fadd.nxv4f32(float 0.000000e+00, <vscale x 4 x float> %v2)		; TYPE_BASED_ONLY-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %fadd_nxv4f32 = call fast float @llvm.vector.reduce.fadd.nxv4f32(float 0.000000e+00, <vscale x 4 x float> %v2)
; TYPE_BASED_ONLY-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %fadd_nxv4f64 = call fast double @llvm.vector.reduce.fadd.nxv4f64(double 0.000000e+00, <vscale x 4 x double> %v3)		; TYPE_BASED_ONLY-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %fadd_nxv4f64 = call fast double @llvm.vector.reduce.fadd.nxv4f64(double 0.000000e+00, <vscale x 4 x double> %v3)
; TYPE_BASED_ONLY-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %fmin_nxv4f32 = call fast float @llvm.vector.reduce.fmin.nxv4f32(<vscale x 4 x float> %v2)		; TYPE_BASED_ONLY-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %fmin_nxv4f32 = call fast float @llvm.vector.reduce.fmin.nxv4f32(<vscale x 4 x float> %v2)
; TYPE_BASED_ONLY-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %fmin_nxv4f64 = call fast double @llvm.vector.reduce.fmin.nxv4f64(<vscale x 4 x double> %v3)		; TYPE_BASED_ONLY-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %fmin_nxv4f64 = call fast double @llvm.vector.reduce.fmin.nxv4f64(<vscale x 4 x double> %v3)
; TYPE_BASED_ONLY-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %fmax_nxv4f32 = call fast float @llvm.vector.reduce.fmax.nxv4f32(<vscale x 4 x float> %v2)		; TYPE_BASED_ONLY-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %fmax_nxv4f32 = call fast float @llvm.vector.reduce.fmax.nxv4f32(<vscale x 4 x float> %v2)
; TYPE_BASED_ONLY-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %fmax_nxv4f64 = call fast double @llvm.vector.reduce.fmax.nxv4f64(<vscale x 4 x double> %v3)		; TYPE_BASED_ONLY-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %fmax_nxv4f64 = call fast double @llvm.vector.reduce.fmax.nxv4f64(<vscale x 4 x double> %v3)
; TYPE_BASED_ONLY-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; TYPE_BASED_ONLY-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
%add_nxv4i32 = call i32 @llvm.vector.reduce.add.nxv4i32(<vscale x 4 x i32> %v0)		%add_nxv4i32 = call i32 @llvm.vector.reduce.add.nxv4i32(<vscale x 4 x i32> %v0)
%add_nxv4i64 = call i64 @llvm.vector.reduce.add.nxv4i64(<vscale x 4 x i64> %v1)		%add_nxv4i64 = call i64 @llvm.vector.reduce.add.nxv4i64(<vscale x 4 x i64> %v1)
Show All 21 Lines	;
%fmax_nxv4f32 = call fast float @llvm.vector.reduce.fmax.nxv4f32(<vscale x 4 x float> %v2)		%fmax_nxv4f32 = call fast float @llvm.vector.reduce.fmax.nxv4f32(<vscale x 4 x float> %v2)
%fmax_nxv4f64 = call fast double @llvm.vector.reduce.fmax.nxv4f64(<vscale x 4 x double> %v3)		%fmax_nxv4f64 = call fast double @llvm.vector.reduce.fmax.nxv4f64(<vscale x 4 x double> %v3)

ret void		ret void
}		}

define void @strict_fp_reductions(<vscale x 4 x float> %v0, <vscale x 4 x double> %v1) {		define void @strict_fp_reductions(<vscale x 4 x float> %v0, <vscale x 4 x double> %v1) {
; CHECK-LABEL: 'strict_fp_reductions'		; CHECK-LABEL: 'strict_fp_reductions'
; CHECK-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %fadd_nxv4f32 = call float @llvm.vector.reduce.fadd.nxv4f32(float 0.000000e+00, <vscale x 4 x float> %v0)		; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %fadd_nxv4f32 = call float @llvm.vector.reduce.fadd.nxv4f32(float 0.000000e+00, <vscale x 4 x float> %v0)
; CHECK-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %fadd_nxv4f64 = call double @llvm.vector.reduce.fadd.nxv4f64(double 0.000000e+00, <vscale x 4 x double> %v1)		; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %fadd_nxv4f64 = call double @llvm.vector.reduce.fadd.nxv4f64(double 0.000000e+00, <vscale x 4 x double> %v1)
; CHECK-NEXT: Cost Model: Invalid cost for instruction: %fmul_nxv4f32 = call float @llvm.vector.reduce.fmul.nxv4f32(float 0.000000e+00, <vscale x 4 x float> %v0)		; CHECK-NEXT: Cost Model: Invalid cost for instruction: %fmul_nxv4f32 = call float @llvm.vector.reduce.fmul.nxv4f32(float 0.000000e+00, <vscale x 4 x float> %v0)
; CHECK-NEXT: Cost Model: Invalid cost for instruction: %fmul_nxv4f64 = call double @llvm.vector.reduce.fmul.nxv4f64(double 0.000000e+00, <vscale x 4 x double> %v1)		; CHECK-NEXT: Cost Model: Invalid cost for instruction: %fmul_nxv4f64 = call double @llvm.vector.reduce.fmul.nxv4f64(double 0.000000e+00, <vscale x 4 x double> %v1)
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
; TYPE_BASED_ONLY-LABEL: 'strict_fp_reductions'		; TYPE_BASED_ONLY-LABEL: 'strict_fp_reductions'
; TYPE_BASED_ONLY-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %fadd_nxv4f32 = call float @llvm.vector.reduce.fadd.nxv4f32(float 0.000000e+00, <vscale x 4 x float> %v0)		; TYPE_BASED_ONLY-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %fadd_nxv4f32 = call float @llvm.vector.reduce.fadd.nxv4f32(float 0.000000e+00, <vscale x 4 x float> %v0)
; TYPE_BASED_ONLY-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %fadd_nxv4f64 = call double @llvm.vector.reduce.fadd.nxv4f64(double 0.000000e+00, <vscale x 4 x double> %v1)		; TYPE_BASED_ONLY-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %fadd_nxv4f64 = call double @llvm.vector.reduce.fadd.nxv4f64(double 0.000000e+00, <vscale x 4 x double> %v1)
; TYPE_BASED_ONLY-NEXT: Cost Model: Invalid cost for instruction: %fmul_nxv4f32 = call float @llvm.vector.reduce.fmul.nxv4f32(float 0.000000e+00, <vscale x 4 x float> %v0)		; TYPE_BASED_ONLY-NEXT: Cost Model: Invalid cost for instruction: %fmul_nxv4f32 = call float @llvm.vector.reduce.fmul.nxv4f32(float 0.000000e+00, <vscale x 4 x float> %v0)
; TYPE_BASED_ONLY-NEXT: Cost Model: Invalid cost for instruction: %fmul_nxv4f64 = call double @llvm.vector.reduce.fmul.nxv4f64(double 0.000000e+00, <vscale x 4 x double> %v1)		; TYPE_BASED_ONLY-NEXT: Cost Model: Invalid cost for instruction: %fmul_nxv4f64 = call double @llvm.vector.reduce.fmul.nxv4f64(double 0.000000e+00, <vscale x 4 x double> %v1)
; TYPE_BASED_ONLY-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; TYPE_BASED_ONLY-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
%fadd_nxv4f32 = call float @llvm.vector.reduce.fadd.nxv4f32(float 0.0, <vscale x 4 x float> %v0)		%fadd_nxv4f32 = call float @llvm.vector.reduce.fadd.nxv4f32(float 0.0, <vscale x 4 x float> %v0)
%fadd_nxv4f64 = call double @llvm.vector.reduce.fadd.nxv4f64(double 0.0, <vscale x 4 x double> %v1)		%fadd_nxv4f64 = call double @llvm.vector.reduce.fadd.nxv4f64(double 0.0, <vscale x 4 x double> %v1)
%fmul_nxv4f32 = call float @llvm.vector.reduce.fmul.nxv4f32(float 0.0, <vscale x 4 x float> %v0)		%fmul_nxv4f32 = call float @llvm.vector.reduce.fmul.nxv4f32(float 0.0, <vscale x 4 x float> %v0)
%fmul_nxv4f64 = call double @llvm.vector.reduce.fmul.nxv4f64(double 0.0, <vscale x 4 x double> %v1)		%fmul_nxv4f64 = call double @llvm.vector.reduce.fmul.nxv4f64(double 0.0, <vscale x 4 x double> %v1)
▲ Show 20 Lines • Show All 730 Lines • Show Last 20 Lines

llvm/test/Analysis/CostModel/AArch64/sve-math.ll

	; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py
	; RUN: opt -mtriple=aarch64-- -mattr=+sve -passes="print<cost-model>" 2>&1 -disable-output -cost-kind=throughput < %s \| FileCheck %s --check-prefix=THRU			; RUN: opt -mtriple=aarch64-- -mattr=+sve -passes="print<cost-model>" 2>&1 -disable-output -cost-kind=throughput < %s \| FileCheck %s --check-prefix=THRU
	; RUN: opt -mtriple=aarch64-- -mattr=+sve -passes="print<cost-model>" 2>&1 -disable-output -cost-kind=latency < %s \| FileCheck %s --check-prefix=LATE			; RUN: opt -mtriple=aarch64-- -mattr=+sve -passes="print<cost-model>" 2>&1 -disable-output -cost-kind=latency < %s \| FileCheck %s --check-prefix=LATE
	; RUN: opt -mtriple=aarch64-- -mattr=+sve -passes="print<cost-model>" 2>&1 -disable-output -cost-kind=code-size < %s \| FileCheck %s --check-prefix=SIZE			; RUN: opt -mtriple=aarch64-- -mattr=+sve -passes="print<cost-model>" 2>&1 -disable-output -cost-kind=code-size < %s \| FileCheck %s --check-prefix=SIZE
	; RUN: opt -mtriple=aarch64-- -mattr=+sve -passes="print<cost-model>" 2>&1 -disable-output -cost-kind=size-latency < %s \| FileCheck %s --check-prefix=SIZE_LATE			; RUN: opt -mtriple=aarch64-- -mattr=+sve -passes="print<cost-model>" 2>&1 -disable-output -cost-kind=size-latency < %s \| FileCheck %s --check-prefix=SIZE_LATE

	target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"

	declare <vscale x 2 x double> @llvm.sqrt.v2f64(<vscale x 2 x double>)			declare <vscale x 2 x double> @llvm.sqrt.v2f64(<vscale x 2 x double>)

	define <vscale x 2 x double> @fadd_v2f64(<vscale x 2 x double> %a, <vscale x 2 x double> %b) {			define <vscale x 2 x double> @fadd_v2f64(<vscale x 2 x double> %a, <vscale x 2 x double> %b) {
	; THRU-LABEL: 'fadd_v2f64'			; THRU-LABEL: 'fadd_v2f64'
	; THRU-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r = fadd <vscale x 2 x double> %a, %b			; THRU-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r = fadd <vscale x 2 x double> %a, %b
	; THRU-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <vscale x 2 x double> %r			; THRU-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <vscale x 2 x double> %r
	;			;
	; LATE-LABEL: 'fadd_v2f64'			; LATE-LABEL: 'fadd_v2f64'
	; LATE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %r = fadd <vscale x 2 x double> %a, %b			; LATE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %r = fadd <vscale x 2 x double> %a, %b
	; LATE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret <vscale x 2 x double> %r			; LATE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret <vscale x 2 x double> %r
	;			;
	; SIZE-LABEL: 'fadd_v2f64'			; SIZE-LABEL: 'fadd_v2f64'
	; SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r = fadd <vscale x 2 x double> %a, %b			; SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r = fadd <vscale x 2 x double> %a, %b
	Show All 30 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/matmul.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -passes=slp-vectorizer -S -mtriple=aarch64-unknown-unknown -mcpu=cortex-a53 \| FileCheck %s			; RUN: opt < %s -passes=slp-vectorizer -S -mtriple=aarch64-unknown-unknown -mcpu=cortex-a53 \| FileCheck %s

	target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"

	; This test is reduced from the matrix multiplication benchmark in the test-suite:			; This test is reduced from the matrix multiplication benchmark in the test-suite:
	; https://github.com/llvm/llvm-test-suite/tree/main/SingleSource/Benchmarks/Misc/matmul_f64_4x4.c			; https://github.com/llvm/llvm-test-suite/tree/main/SingleSource/Benchmarks/Misc/matmul_f64_4x4.c
				dmgreenUnsubmitted Not Done Reply Inline Actions This source file, when compiled is not SLP vectorizing already, so I don't think things change in practice. It is using a number of llvm.fmuladd intrinsics when I compile it, which already seem to have a cost of 1. dmgreen: This source file, when compiled is not SLP vectorizing already, so I don't think things change…
	; The operations here are expected to be vectorized to <2 x double>.			; The operations here are expected to be vectorized to <2 x double>.
	; Otherwise, performance will suffer on Cortex-A53.			; Otherwise, performance will suffer on Cortex-A53.

	define void @wrap_mul4(ptr nocapture %Out, ptr nocapture readonly %A, ptr nocapture readonly %B) {			define void @wrap_mul4(ptr nocapture %Out, ptr nocapture readonly %A, ptr nocapture readonly %B) {
	; CHECK-LABEL: @wrap_mul4(			; CHECK-LABEL: @wrap_mul4(
	; CHECK-NEXT: [[TEMP:%.]] = load double, ptr [[A:%.]], align 8			; CHECK-NEXT: [[TEMP:%.]] = load double, ptr [[A:%.]], align 8
				; CHECK-NEXT: [[TEMP1:%.]] = load double, ptr [[B:%.]], align 8
				; CHECK-NEXT: [[MUL_I:%.*]] = fmul double [[TEMP]], [[TEMP1]]
	; CHECK-NEXT: [[ARRAYIDX5_I:%.*]] = getelementptr inbounds [2 x double], ptr [[A]], i64 0, i64 1			; CHECK-NEXT: [[ARRAYIDX5_I:%.*]] = getelementptr inbounds [2 x double], ptr [[A]], i64 0, i64 1
	; CHECK-NEXT: [[TEMP2:%.*]] = load double, ptr [[ARRAYIDX5_I]], align 8			; CHECK-NEXT: [[TEMP2:%.*]] = load double, ptr [[ARRAYIDX5_I]], align 8
	; CHECK-NEXT: [[ARRAYIDX7_I:%.]] = getelementptr inbounds [4 x double], ptr [[B:%.]], i64 1, i64 0			; CHECK-NEXT: [[ARRAYIDX7_I:%.*]] = getelementptr inbounds [4 x double], ptr [[B]], i64 1, i64 0
				; CHECK-NEXT: [[TEMP3:%.*]] = load double, ptr [[ARRAYIDX7_I]], align 8
				; CHECK-NEXT: [[MUL8_I:%.*]] = fmul double [[TEMP2]], [[TEMP3]]
				; CHECK-NEXT: [[ADD_I:%.*]] = fadd double [[MUL_I]], [[MUL8_I]]
				; CHECK-NEXT: [[ARRAYIDX13_I:%.*]] = getelementptr inbounds [4 x double], ptr [[B]], i64 0, i64 1
				; CHECK-NEXT: [[TEMP4:%.*]] = load double, ptr [[ARRAYIDX13_I]], align 8
				; CHECK-NEXT: [[MUL14_I:%.*]] = fmul double [[TEMP]], [[TEMP4]]
				; CHECK-NEXT: [[ARRAYIDX18_I:%.*]] = getelementptr inbounds [4 x double], ptr [[B]], i64 1, i64 1
				; CHECK-NEXT: [[TEMP5:%.*]] = load double, ptr [[ARRAYIDX18_I]], align 8
				; CHECK-NEXT: [[MUL19_I:%.*]] = fmul double [[TEMP2]], [[TEMP5]]
				; CHECK-NEXT: [[ADD20_I:%.*]] = fadd double [[MUL14_I]], [[MUL19_I]]
	; CHECK-NEXT: [[ARRAYIDX25_I:%.*]] = getelementptr inbounds [4 x double], ptr [[B]], i64 0, i64 2			; CHECK-NEXT: [[ARRAYIDX25_I:%.*]] = getelementptr inbounds [4 x double], ptr [[B]], i64 0, i64 2
				; CHECK-NEXT: [[TEMP6:%.*]] = load double, ptr [[ARRAYIDX25_I]], align 8
				; CHECK-NEXT: [[MUL26_I:%.*]] = fmul double [[TEMP]], [[TEMP6]]
	; CHECK-NEXT: [[ARRAYIDX30_I:%.*]] = getelementptr inbounds [4 x double], ptr [[B]], i64 1, i64 2			; CHECK-NEXT: [[ARRAYIDX30_I:%.*]] = getelementptr inbounds [4 x double], ptr [[B]], i64 1, i64 2
				; CHECK-NEXT: [[TEMP7:%.*]] = load double, ptr [[ARRAYIDX30_I]], align 8
				; CHECK-NEXT: [[MUL31_I:%.*]] = fmul double [[TEMP2]], [[TEMP7]]
				; CHECK-NEXT: [[ADD32_I:%.*]] = fadd double [[MUL26_I]], [[MUL31_I]]
				; CHECK-NEXT: [[ARRAYIDX37_I:%.*]] = getelementptr inbounds [4 x double], ptr [[B]], i64 0, i64 3
				; CHECK-NEXT: [[TEMP8:%.*]] = load double, ptr [[ARRAYIDX37_I]], align 8
				; CHECK-NEXT: [[MUL38_I:%.*]] = fmul double [[TEMP]], [[TEMP8]]
				; CHECK-NEXT: [[ARRAYIDX42_I:%.*]] = getelementptr inbounds [4 x double], ptr [[B]], i64 1, i64 3
				; CHECK-NEXT: [[TEMP9:%.*]] = load double, ptr [[ARRAYIDX42_I]], align 8
				; CHECK-NEXT: [[MUL43_I:%.*]] = fmul double [[TEMP2]], [[TEMP9]]
				; CHECK-NEXT: [[ADD44_I:%.*]] = fadd double [[MUL38_I]], [[MUL43_I]]
	; CHECK-NEXT: [[ARRAYIDX47_I:%.*]] = getelementptr inbounds [2 x double], ptr [[A]], i64 1, i64 0			; CHECK-NEXT: [[ARRAYIDX47_I:%.*]] = getelementptr inbounds [2 x double], ptr [[A]], i64 1, i64 0
	; CHECK-NEXT: [[TEMP10:%.*]] = load double, ptr [[ARRAYIDX47_I]], align 8			; CHECK-NEXT: [[TEMP10:%.*]] = load double, ptr [[ARRAYIDX47_I]], align 8
				; CHECK-NEXT: [[MUL50_I:%.*]] = fmul double [[TEMP1]], [[TEMP10]]
	; CHECK-NEXT: [[ARRAYIDX52_I:%.*]] = getelementptr inbounds [2 x double], ptr [[A]], i64 1, i64 1			; CHECK-NEXT: [[ARRAYIDX52_I:%.*]] = getelementptr inbounds [2 x double], ptr [[A]], i64 1, i64 1
	; CHECK-NEXT: [[TEMP11:%.*]] = load double, ptr [[ARRAYIDX52_I]], align 8			; CHECK-NEXT: [[TEMP11:%.*]] = load double, ptr [[ARRAYIDX52_I]], align 8
	; CHECK-NEXT: [[TMP2:%.*]] = load <2 x double>, ptr [[B]], align 8			; CHECK-NEXT: [[MUL55_I:%.*]] = fmul double [[TEMP3]], [[TEMP11]]
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> poison, double [[TEMP]], i32 0			; CHECK-NEXT: [[ADD56_I:%.*]] = fadd double [[MUL50_I]], [[MUL55_I]]
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x double> [[TMP3]], <2 x double> poison, <2 x i32> zeroinitializer			; CHECK-NEXT: [[MUL62_I:%.*]] = fmul double [[TEMP4]], [[TEMP10]]
	; CHECK-NEXT: [[TMP4:%.*]] = fmul <2 x double> [[SHUFFLE]], [[TMP2]]			; CHECK-NEXT: [[MUL67_I:%.*]] = fmul double [[TEMP5]], [[TEMP11]]
	; CHECK-NEXT: [[TMP6:%.*]] = load <2 x double>, ptr [[ARRAYIDX7_I]], align 8			; CHECK-NEXT: [[ADD68_I:%.*]] = fadd double [[MUL62_I]], [[MUL67_I]]
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x double> poison, double [[TEMP2]], i32 0			; CHECK-NEXT: [[MUL74_I:%.*]] = fmul double [[TEMP6]], [[TEMP10]]
	; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <2 x double> [[TMP7]], <2 x double> poison, <2 x i32> zeroinitializer			; CHECK-NEXT: [[MUL79_I:%.*]] = fmul double [[TEMP7]], [[TEMP11]]
	; CHECK-NEXT: [[TMP8:%.*]] = fmul <2 x double> [[SHUFFLE1]], [[TMP6]]			; CHECK-NEXT: [[ADD80_I:%.*]] = fadd double [[MUL74_I]], [[MUL79_I]]
	; CHECK-NEXT: [[TMP9:%.*]] = fadd <2 x double> [[TMP4]], [[TMP8]]			; CHECK-NEXT: [[MUL86_I:%.*]] = fmul double [[TEMP8]], [[TEMP10]]
	; CHECK-NEXT: [[RES_I_SROA_5_0_OUT2_I_SROA_IDX4:%.]] = getelementptr inbounds double, ptr [[OUT:%.]], i64 2			; CHECK-NEXT: [[MUL91_I:%.*]] = fmul double [[TEMP9]], [[TEMP11]]
	; CHECK-NEXT: [[TMP12:%.*]] = load <2 x double>, ptr [[ARRAYIDX25_I]], align 8			; CHECK-NEXT: [[ADD92_I:%.*]] = fadd double [[MUL86_I]], [[MUL91_I]]
	; CHECK-NEXT: [[TMP13:%.*]] = fmul <2 x double> [[SHUFFLE]], [[TMP12]]			; CHECK-NEXT: store double [[ADD_I]], ptr [[OUT:%.*]], align 8
	; CHECK-NEXT: [[TMP15:%.*]] = load <2 x double>, ptr [[ARRAYIDX30_I]], align 8			; CHECK-NEXT: [[RES_I_SROA_4_0_OUT2_I_SROA_IDX2:%.*]] = getelementptr inbounds double, ptr [[OUT]], i64 1
	; CHECK-NEXT: [[TMP16:%.*]] = fmul <2 x double> [[SHUFFLE1]], [[TMP15]]			; CHECK-NEXT: store double [[ADD20_I]], ptr [[RES_I_SROA_4_0_OUT2_I_SROA_IDX2]], align 8
	; CHECK-NEXT: [[TMP17:%.*]] = fadd <2 x double> [[TMP13]], [[TMP16]]			; CHECK-NEXT: [[RES_I_SROA_5_0_OUT2_I_SROA_IDX4:%.*]] = getelementptr inbounds double, ptr [[OUT]], i64 2
	; CHECK-NEXT: store <2 x double> [[TMP9]], ptr [[OUT]], align 8			; CHECK-NEXT: store double [[ADD32_I]], ptr [[RES_I_SROA_5_0_OUT2_I_SROA_IDX4]], align 8
	; CHECK-NEXT: store <2 x double> [[TMP17]], ptr [[RES_I_SROA_5_0_OUT2_I_SROA_IDX4]], align 8			; CHECK-NEXT: [[RES_I_SROA_6_0_OUT2_I_SROA_IDX6:%.*]] = getelementptr inbounds double, ptr [[OUT]], i64 3
				; CHECK-NEXT: store double [[ADD44_I]], ptr [[RES_I_SROA_6_0_OUT2_I_SROA_IDX6]], align 8
	; CHECK-NEXT: [[RES_I_SROA_7_0_OUT2_I_SROA_IDX8:%.*]] = getelementptr inbounds double, ptr [[OUT]], i64 4			; CHECK-NEXT: [[RES_I_SROA_7_0_OUT2_I_SROA_IDX8:%.*]] = getelementptr inbounds double, ptr [[OUT]], i64 4
	; CHECK-NEXT: [[TMP19:%.*]] = insertelement <2 x double> poison, double [[TEMP10]], i32 0			; CHECK-NEXT: store double [[ADD56_I]], ptr [[RES_I_SROA_7_0_OUT2_I_SROA_IDX8]], align 8
	; CHECK-NEXT: [[SHUFFLE4:%.*]] = shufflevector <2 x double> [[TMP19]], <2 x double> poison, <2 x i32> zeroinitializer			; CHECK-NEXT: [[RES_I_SROA_8_0_OUT2_I_SROA_IDX10:%.*]] = getelementptr inbounds double, ptr [[OUT]], i64 5
	; CHECK-NEXT: [[TMP20:%.*]] = fmul <2 x double> [[TMP2]], [[SHUFFLE4]]			; CHECK-NEXT: store double [[ADD68_I]], ptr [[RES_I_SROA_8_0_OUT2_I_SROA_IDX10]], align 8
	; CHECK-NEXT: [[TMP21:%.*]] = insertelement <2 x double> poison, double [[TEMP11]], i32 0
	; CHECK-NEXT: [[SHUFFLE5:%.*]] = shufflevector <2 x double> [[TMP21]], <2 x double> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP22:%.*]] = fmul <2 x double> [[TMP6]], [[SHUFFLE5]]
	; CHECK-NEXT: [[TMP23:%.*]] = fadd <2 x double> [[TMP20]], [[TMP22]]
	; CHECK-NEXT: store <2 x double> [[TMP23]], ptr [[RES_I_SROA_7_0_OUT2_I_SROA_IDX8]], align 8
	; CHECK-NEXT: [[RES_I_SROA_9_0_OUT2_I_SROA_IDX12:%.*]] = getelementptr inbounds double, ptr [[OUT]], i64 6			; CHECK-NEXT: [[RES_I_SROA_9_0_OUT2_I_SROA_IDX12:%.*]] = getelementptr inbounds double, ptr [[OUT]], i64 6
	; CHECK-NEXT: [[TMP25:%.*]] = fmul <2 x double> [[TMP12]], [[SHUFFLE4]]			; CHECK-NEXT: store double [[ADD80_I]], ptr [[RES_I_SROA_9_0_OUT2_I_SROA_IDX12]], align 8
	; CHECK-NEXT: [[TMP26:%.*]] = fmul <2 x double> [[TMP15]], [[SHUFFLE5]]			; CHECK-NEXT: [[RES_I_SROA_10_0_OUT2_I_SROA_IDX14:%.*]] = getelementptr inbounds double, ptr [[OUT]], i64 7
	; CHECK-NEXT: [[TMP27:%.*]] = fadd <2 x double> [[TMP25]], [[TMP26]]			; CHECK-NEXT: store double [[ADD92_I]], ptr [[RES_I_SROA_10_0_OUT2_I_SROA_IDX14]], align 8
	; CHECK-NEXT: store <2 x double> [[TMP27]], ptr [[RES_I_SROA_9_0_OUT2_I_SROA_IDX12]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%temp = load double, ptr %A, align 8			%temp = load double, ptr %A, align 8
	%temp1 = load double, ptr %B, align 8			%temp1 = load double, ptr %B, align 8
	%mul.i = fmul double %temp, %temp1			%mul.i = fmul double %temp, %temp1
	%arrayidx5.i = getelementptr inbounds [2 x double], ptr %A, i64 0, i64 1			%arrayidx5.i = getelementptr inbounds [2 x double], ptr %A, i64 0, i64 1
	%temp2 = load double, ptr %arrayidx5.i, align 8			%temp2 = load double, ptr %arrayidx5.i, align 8
	%arrayidx7.i = getelementptr inbounds [4 x double], ptr %B, i64 1, i64 0			%arrayidx7.i = getelementptr inbounds [4 x double], ptr %B, i64 1, i64 0
	▲ Show 20 Lines • Show All 58 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/remarks.ll

	; RUN: opt -S -passes=slp-vectorizer -mtriple=aarch64--linux-gnu -mcpu=generic -pass-remarks=slp-vectorizer -o /dev/null < %s 2>&1 \| FileCheck %s			; RUN: opt -S -passes=slp-vectorizer -mtriple=aarch64--linux-gnu -mcpu=generic -pass-remarks=slp-vectorizer -o /dev/null < %s 2>&1 \| FileCheck %s

	define void @f(ptr %r, ptr %w) {			define void @f(ptr %r, ptr %w) {
	%r1 = getelementptr inbounds double, ptr %r, i64 1			%r1 = getelementptr inbounds double, ptr %r, i64 1
	%f0 = load double, ptr %r			%f0 = load double, ptr %r
	%f1 = load double, ptr %r1			%f1 = load double, ptr %r1
	%add0 = fadd double %f0, %f0			%add0 = fadd double %f0, %f0
	%add1 = fadd double %f1, %f1			%add1 = fadd double %f1, %f1
	%w1 = getelementptr inbounds double, ptr %w, i64 1			%w1 = getelementptr inbounds double, ptr %w, i64 1
	; CHECK: remark: /tmp/s.c:5:10: Stores SLP vectorized with cost -4 and with tree size 3			; CHECK: remark: /tmp/s.c:5:10: Stores SLP vectorized with cost -3 and with tree size 3
	store double %add0, ptr %w, !dbg !9			store double %add0, ptr %w, !dbg !9
	store double %add1, ptr %w1			store double %add1, ptr %w1
	ret void			ret void
	}			}


	!llvm.dbg.cu = !{!0}			!llvm.dbg.cu = !{!0}
	!llvm.module.flags = !{!3, !4, !5}			!llvm.module.flags = !{!3, !4, !5}
	Show All 12 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][TTI] Cost model FADD/FSUB/FNEGClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 512368

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

llvm/test/Analysis/CostModel/AArch64/arith-fp-sve.ll

llvm/test/Analysis/CostModel/AArch64/arith-fp.ll

llvm/test/Analysis/CostModel/AArch64/cast.ll

llvm/test/Analysis/CostModel/AArch64/reduce-fadd.ll

llvm/test/Analysis/CostModel/AArch64/sve-fixed-length.ll

llvm/test/Analysis/CostModel/AArch64/sve-intrinsics.ll

llvm/test/Analysis/CostModel/AArch64/sve-math.ll

llvm/test/Transforms/SLPVectorizer/AArch64/matmul.ll

llvm/test/Transforms/SLPVectorizer/AArch64/remarks.ll

[AArch64][TTI] Cost model FADD/FSUB/FNEG
ClosedPublic