This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/
-
llvm/
-
Analysis/
-
TargetTransformInfo.h
-
TargetTransformInfoImpl.h
-
CodeGen/
-
BasicTTIImpl.h
-
lib/
-
Analysis/
-
TargetTransformInfo.cpp
-
Target/
-
AArch64/
-
AArch64TargetTransformInfo.h
-
AArch64TargetTransformInfo.cpp
-
AMDGPU/
-
AMDGPUTargetTransformInfo.h
-
AMDGPUTargetTransformInfo.cpp
-
ARM/
-
ARMTargetTransformInfo.h
-
ARMTargetTransformInfo.cpp
-
Hexagon/
-
HexagonTargetTransformInfo.h
-
PowerPC/
-
PPCTargetTransformInfo.h
-
PPCTargetTransformInfo.cpp
-
X86/
-
X86TargetTransformInfo.h
-
X86TargetTransformInfo.cpp
-
test/
-
Analysis/CostModel/AMDGPU/
-
CostModel/
-
AMDGPU/
-
br.ll
-
control-flow.ll
-
CodeGen/AMDGPU/
-
AMDGPU/
-
unroll.ll
-
Transforms/LoopUnroll/AMDGPU/
-
LoopUnroll/
-
AMDGPU/
-
unroll-cost-addrspacecast.ll

Differential D96805

[AMDGPU][CostModel] Refine cost model for control-flow instructions.
ClosedPublic

Authored by dfukalov on Feb 16 2021, 11:21 AM.

Download Raw Diff

Details

Reviewers

rampitec
arsenm

Commits

rG8f4b7e94a2b4: [AMDGPU][CostModel] Refine cost model for control-flow instructions.

Summary

Added cost estimation for switch instruction, updated costs of branches, fixed
phi cost.
Had to increase -amdgpu-unroll-threshold-if default value since conditional
branch cost (size) was corrected to higher value.
Test renamed to "control-flow.ll".

Removed redundant code in X86TTIImpl::getCFInstrCost() and
PPCTTIImpl::getCFInstrCost().

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	2,490 ms	x64 debian > libarcher.races::lock-unrelated.c

Event Timeline

dfukalov created this revision.Feb 16 2021, 11:21 AM

Herald added subscribers: kerbowa, pengfei, zzheng and 10 others. · View Herald TranscriptFeb 16 2021, 11:21 AM

dfukalov requested review of this revision.Feb 16 2021, 11:21 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 16 2021, 11:21 AM

Herald added a subscriber: wdng. · View Herald Transcript

AMDGPU estimates look reasonable, but bumping unroll threshold twice looks suspicious. Is that to accommodate some loops with switches?
I am afraid this can have quite unpredictable impact on overall performance. Did you run any perf tests?

Harbormaster completed remote builds in B89418: Diff 324064.Feb 16 2021, 12:04 PM

It seems to me this threshold bump partially compensated by cbr cost increase in all cases of unroll loops with ifs, where it is multiplicated by trip count.
This threshold bumped because of test/CodeGen/AMDGPU/unroll.ll, where started to fail

; CHECK-LABEL: @unroll_for_if
; CHECK: entry:
; CHECK-NEXT: getelementptr
; CHECK-NEXT: store
; CHECK-NEXT: getelementptr
; CHECK-NEXT: store
; CHECK-NOT: br
define amdgpu_kernel void @unroll_for_if(i32 addrspace(5)* %a) {
entry:
  br label %for.body
for.body:                                         ; preds = %entry, %for.inc
  %i1 = phi i32 [ 0, %entry ], [ %inc, %for.inc ]
  %and = and i32 %i1, 1
  %tobool = icmp eq i32 %and, 0
  br i1 %tobool, label %for.inc, label %if.then
if.then:                                          ; preds = %for.body
  %0 = sext i32 %i1 to i64
  %arrayidx = getelementptr inbounds i32, i32 addrspace(5)* %a, i64 %0
  store i32 0, i32 addrspace(5)* %arrayidx, align 4
  br label %for.inc
for.inc:                                          ; preds = %for.body, %if.then
  %inc = add nuw nsw i32 %i1, 1
  %cmp = icmp ult i32 %inc, 48
  br i1 %cmp, label %for.body, label %for.end

for.end:                                          ; preds = %for.cond
  ret void
}

since cbr code size cost increased (needed increase to 250) plus phi became non-free (+50 to 300).
Perhaps at this time we should set cbr code size estimation not 4 but 3 (2 exec mask manipulations) and collect more statistics.

In D96805#2566985, @dfukalov wrote:
It seems to me this threshold bump partially compensated by cbr cost increase in all cases of unroll loops with ifs, where it is multiplicated by trip count.
This threshold bumped because of test/CodeGen/AMDGPU/unroll.ll, where started to fail
; CHECK-LABEL: @unroll_for_if
; CHECK: entry:
; CHECK-NEXT: getelementptr
; CHECK-NEXT: store
; CHECK-NEXT: getelementptr
; CHECK-NEXT: store
; CHECK-NOT: br
define amdgpu_kernel void @unroll_for_if(i32 addrspace(5)* %a) {
entry:
  br label %for.body
for.body:                                         ; preds = %entry, %for.inc
  %i1 = phi i32 [ 0, %entry ], [ %inc, %for.inc ]
  %and = and i32 %i1, 1
  %tobool = icmp eq i32 %and, 0
  br i1 %tobool, label %for.inc, label %if.then
if.then:                                          ; preds = %for.body
  %0 = sext i32 %i1 to i64
  %arrayidx = getelementptr inbounds i32, i32 addrspace(5)* %a, i64 %0
  store i32 0, i32 addrspace(5)* %arrayidx, align 4
  br label %for.inc
for.inc:                                          ; preds = %for.body, %if.then
  %inc = add nuw nsw i32 %i1, 1
  %cmp = icmp ult i32 %inc, 48
  br i1 %cmp, label %for.body, label %for.end

for.end:                                          ; preds = %for.cond
  ret void
}
since cbr code size cost increased (needed increase to 250) plus phi became non-free (+50 to 300).
Perhaps at this time we should set cbr code size estimation not 4 but 3 (2 exec mask manipulations) and collect more statistics.

Can you do some performance testing please?

Updated threshold and conditional branch cost-size after performance testing.

Perf numbers look reasonable to be now.

This revision is now accepted and ready to land.Apr 9 2021, 1:36 PM

Harbormaster completed remote builds in B98061: Diff 336553.Apr 9 2021, 2:20 PM

This revision was landed with ongoing or failed builds.Apr 9 2021, 11:20 PM

Closed by commit rG8f4b7e94a2b4: [AMDGPU][CostModel] Refine cost model for control-flow instructions. (authored by dfukalov). · Explain Why

This revision was automatically updated to reflect the committed changes.

dfukalov added a commit: rG8f4b7e94a2b4: [AMDGPU][CostModel] Refine cost model for control-flow instructions..

Revision Contents

Path

Size

llvm/

include/

llvm/

Analysis/

TargetTransformInfo.h

14 lines

TargetTransformInfoImpl.h

6 lines

CodeGen/

BasicTTIImpl.h

5 lines

lib/

Analysis/

TargetTransformInfo.cpp

8 lines

Target/

AArch64/

AArch64TargetTransformInfo.h

3 lines

AArch64TargetTransformInfo.cpp

3 lines

AMDGPU/

AMDGPUTargetTransformInfo.h

6 lines

AMDGPUTargetTransformInfo.cpp

48 lines

ARM/

ARMTargetTransformInfo.h

4 lines

ARMTargetTransformInfo.cpp

5 lines

Hexagon/

HexagonTargetTransformInfo.h

3 lines

PowerPC/

PPCTargetTransformInfo.h

3 lines

PPCTargetTransformInfo.cpp

5 lines

X86/

X86TargetTransformInfo.h

3 lines

X86TargetTransformInfo.cpp

7 lines

test/

Analysis/

CostModel/

AMDGPU/

br.ll

control-flow.ll

52 lines

CodeGen/

AMDGPU/

unroll.ll

5 lines

Transforms/

LoopUnroll/

AMDGPU/

unroll-cost-addrspacecast.ll

2 lines

Diff 336553

llvm/include/llvm/Analysis/TargetTransformInfo.h

Show First 20 Lines • Show All 1,097 Lines • ▼ Show 20 Lines	int getCastInstrCost(unsigned Opcode, Type Dst, Type Src,
const Instruction *I = nullptr) const;		const Instruction *I = nullptr) const;

/// \return The expected cost of a sign- or zero-extended vector extract. Use		/// \return The expected cost of a sign- or zero-extended vector extract. Use
/// -1 to indicate that there is no information about the index value.		/// -1 to indicate that there is no information about the index value.
int getExtractWithExtendCost(unsigned Opcode, Type Dst, VectorType VecTy,		int getExtractWithExtendCost(unsigned Opcode, Type Dst, VectorType VecTy,
unsigned Index = -1) const;		unsigned Index = -1) const;

/// \return The expected cost of control-flow related instructions such as		/// \return The expected cost of control-flow related instructions such as
/// Phi, Ret, Br.		/// Phi, Ret, Br, Switch.
int getCFInstrCost(unsigned Opcode,		int getCFInstrCost(unsigned Opcode,
TTI::TargetCostKind CostKind = TTI::TCK_SizeAndLatency) const;		TTI::TargetCostKind CostKind = TTI::TCK_SizeAndLatency,
		const Instruction *I = nullptr) const;

/// \returns The expected cost of compare and select instructions. If there		/// \returns The expected cost of compare and select instructions. If there
/// is an existing instruction that holds Opcode, it may be passed in the		/// is an existing instruction that holds Opcode, it may be passed in the
/// 'I' parameter. The \p VecPred parameter can be used to indicate the select		/// 'I' parameter. The \p VecPred parameter can be used to indicate the select
/// is using a compare with the specified predicate as condition. When vector		/// is using a compare with the specified predicate as condition. When vector
/// types are passed, \p VecPred must be used for all lanes.		/// types are passed, \p VecPred must be used for all lanes.
int getCmpSelInstrCost(		int getCmpSelInstrCost(
unsigned Opcode, Type ValTy, Type CondTy = nullptr,		unsigned Opcode, Type ValTy, Type CondTy = nullptr,
▲ Show 20 Lines • Show All 451 Lines • ▼ Show 20 Lines	virtual int getShuffleCost(ShuffleKind Kind, VectorType *Tp,
ArrayRef<int> Mask, int Index,		ArrayRef<int> Mask, int Index,
VectorType *SubTp) = 0;		VectorType *SubTp) = 0;
virtual int getCastInstrCost(unsigned Opcode, Type Dst, Type Src,		virtual int getCastInstrCost(unsigned Opcode, Type Dst, Type Src,
CastContextHint CCH,		CastContextHint CCH,
TTI::TargetCostKind CostKind,		TTI::TargetCostKind CostKind,
const Instruction *I) = 0;		const Instruction *I) = 0;
virtual int getExtractWithExtendCost(unsigned Opcode, Type *Dst,		virtual int getExtractWithExtendCost(unsigned Opcode, Type *Dst,
VectorType *VecTy, unsigned Index) = 0;		VectorType *VecTy, unsigned Index) = 0;
virtual int getCFInstrCost(unsigned Opcode,		virtual int getCFInstrCost(unsigned Opcode, TTI::TargetCostKind CostKind,
TTI::TargetCostKind CostKind) = 0;		const Instruction *I = nullptr) = 0;
virtual int getCmpSelInstrCost(unsigned Opcode, Type ValTy, Type CondTy,		virtual int getCmpSelInstrCost(unsigned Opcode, Type ValTy, Type CondTy,
CmpInst::Predicate VecPred,		CmpInst::Predicate VecPred,
TTI::TargetCostKind CostKind,		TTI::TargetCostKind CostKind,
const Instruction *I) = 0;		const Instruction *I) = 0;
virtual int getVectorInstrCost(unsigned Opcode, Type *Val,		virtual int getVectorInstrCost(unsigned Opcode, Type *Val,
unsigned Index) = 0;		unsigned Index) = 0;
virtual int getMemoryOpCost(unsigned Opcode, Type *Src, Align Alignment,		virtual int getMemoryOpCost(unsigned Opcode, Type *Src, Align Alignment,
unsigned AddressSpace,		unsigned AddressSpace,
▲ Show 20 Lines • Show All 449 Lines • ▼ Show 20 Lines	int getCastInstrCost(unsigned Opcode, Type Dst, Type Src,
CastContextHint CCH, TTI::TargetCostKind CostKind,		CastContextHint CCH, TTI::TargetCostKind CostKind,
const Instruction *I) override {		const Instruction *I) override {
return Impl.getCastInstrCost(Opcode, Dst, Src, CCH, CostKind, I);		return Impl.getCastInstrCost(Opcode, Dst, Src, CCH, CostKind, I);
}		}
int getExtractWithExtendCost(unsigned Opcode, Type Dst, VectorType VecTy,		int getExtractWithExtendCost(unsigned Opcode, Type Dst, VectorType VecTy,
unsigned Index) override {		unsigned Index) override {
return Impl.getExtractWithExtendCost(Opcode, Dst, VecTy, Index);		return Impl.getExtractWithExtendCost(Opcode, Dst, VecTy, Index);
}		}
int getCFInstrCost(unsigned Opcode, TTI::TargetCostKind CostKind) override {		int getCFInstrCost(unsigned Opcode, TTI::TargetCostKind CostKind,
return Impl.getCFInstrCost(Opcode, CostKind);		const Instruction *I = nullptr) override {
		return Impl.getCFInstrCost(Opcode, CostKind, I);
}		}
int getCmpSelInstrCost(unsigned Opcode, Type ValTy, Type CondTy,		int getCmpSelInstrCost(unsigned Opcode, Type ValTy, Type CondTy,
CmpInst::Predicate VecPred,		CmpInst::Predicate VecPred,
TTI::TargetCostKind CostKind,		TTI::TargetCostKind CostKind,
const Instruction *I) override {		const Instruction *I) override {
return Impl.getCmpSelInstrCost(Opcode, ValTy, CondTy, VecPred, CostKind, I);		return Impl.getCmpSelInstrCost(Opcode, ValTy, CondTy, VecPred, CostKind, I);
}		}
int getVectorInstrCost(unsigned Opcode, Type *Val, unsigned Index) override {		int getVectorInstrCost(unsigned Opcode, Type *Val, unsigned Index) override {
▲ Show 20 Lines • Show All 271 Lines • Show Last 20 Lines

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

Show First 20 Lines • Show All 506 Lines • ▼ Show 20 Lines	unsigned getCastInstrCost(unsigned Opcode, Type Dst, Type Src,
return 1;		return 1;
}		}

unsigned getExtractWithExtendCost(unsigned Opcode, Type *Dst,		unsigned getExtractWithExtendCost(unsigned Opcode, Type *Dst,
VectorType *VecTy, unsigned Index) const {		VectorType *VecTy, unsigned Index) const {
return 1;		return 1;
}		}

unsigned getCFInstrCost(unsigned Opcode, TTI::TargetCostKind CostKind) const {		unsigned getCFInstrCost(unsigned Opcode, TTI::TargetCostKind CostKind,
		const Instruction *I = nullptr) const {
// A phi would be free, unless we're costing the throughput because it		// A phi would be free, unless we're costing the throughput because it
// will require a register.		// will require a register.
if (Opcode == Instruction::PHI && CostKind != TTI::TCK_RecipThroughput)		if (Opcode == Instruction::PHI && CostKind != TTI::TCK_RecipThroughput)
return 0;		return 0;
return 1;		return 1;
}		}

unsigned getCmpSelInstrCost(unsigned Opcode, Type ValTy, Type CondTy,		unsigned getCmpSelInstrCost(unsigned Opcode, Type ValTy, Type CondTy,
▲ Show 20 Lines • Show All 404 Lines • ▼ Show 20 Lines	case Instruction::Call: {
assert(isa<IntrinsicInst>(U) && "Unexpected non-intrinsic call");		assert(isa<IntrinsicInst>(U) && "Unexpected non-intrinsic call");
auto *Intrinsic = cast<IntrinsicInst>(U);		auto *Intrinsic = cast<IntrinsicInst>(U);
IntrinsicCostAttributes CostAttrs(Intrinsic->getIntrinsicID(), *CB);		IntrinsicCostAttributes CostAttrs(Intrinsic->getIntrinsicID(), *CB);
return TargetTTI->getIntrinsicInstrCost(CostAttrs, CostKind);		return TargetTTI->getIntrinsicInstrCost(CostAttrs, CostKind);
}		}
case Instruction::Br:		case Instruction::Br:
case Instruction::Ret:		case Instruction::Ret:
case Instruction::PHI:		case Instruction::PHI:
return TargetTTI->getCFInstrCost(Opcode, CostKind);		case Instruction::Switch:
		return TargetTTI->getCFInstrCost(Opcode, CostKind, I);
case Instruction::ExtractValue:		case Instruction::ExtractValue:
case Instruction::Freeze:		case Instruction::Freeze:
return TTI::TCC_Free;		return TTI::TCC_Free;
case Instruction::Alloca:		case Instruction::Alloca:
if (cast<AllocaInst>(U)->isStaticAlloca())		if (cast<AllocaInst>(U)->isStaticAlloca())
return TTI::TCC_Free;		return TTI::TCC_Free;
break;		break;
case Instruction::GetElementPtr: {		case Instruction::GetElementPtr: {
▲ Show 20 Lines • Show All 211 Lines • Show Last 20 Lines

llvm/include/llvm/CodeGen/BasicTTIImpl.h

Show First 20 Lines • Show All 891 Lines • ▼ Show 20 Lines	public:
unsigned getExtractWithExtendCost(unsigned Opcode, Type *Dst,		unsigned getExtractWithExtendCost(unsigned Opcode, Type *Dst,
VectorType *VecTy, unsigned Index) {		VectorType *VecTy, unsigned Index) {
return thisT()->getVectorInstrCost(Instruction::ExtractElement, VecTy,		return thisT()->getVectorInstrCost(Instruction::ExtractElement, VecTy,
Index) +		Index) +
thisT()->getCastInstrCost(Opcode, Dst, VecTy->getElementType(),		thisT()->getCastInstrCost(Opcode, Dst, VecTy->getElementType(),
TTI::CastContextHint::None, TTI::TCK_RecipThroughput);		TTI::CastContextHint::None, TTI::TCK_RecipThroughput);
}		}

unsigned getCFInstrCost(unsigned Opcode, TTI::TargetCostKind CostKind) {		unsigned getCFInstrCost(unsigned Opcode, TTI::TargetCostKind CostKind,
return BaseT::getCFInstrCost(Opcode, CostKind);		const Instruction *I = nullptr) {
		return BaseT::getCFInstrCost(Opcode, CostKind, I);
}		}

unsigned getCmpSelInstrCost(unsigned Opcode, Type ValTy, Type CondTy,		unsigned getCmpSelInstrCost(unsigned Opcode, Type ValTy, Type CondTy,
CmpInst::Predicate VecPred,		CmpInst::Predicate VecPred,
TTI::TargetCostKind CostKind,		TTI::TargetCostKind CostKind,
const Instruction *I = nullptr) {		const Instruction *I = nullptr) {
const TargetLoweringBase *TLI = getTLI();		const TargetLoweringBase *TLI = getTLI();
int ISD = TLI->InstructionOpcodeToISD(Opcode);		int ISD = TLI->InstructionOpcodeToISD(Opcode);
▲ Show 20 Lines • Show All 1,178 Lines • Show Last 20 Lines

llvm/lib/Analysis/TargetTransformInfo.cpp

Show First 20 Lines • Show All 777 Lines • ▼ Show 20 Lines	int TargetTransformInfo::getExtractWithExtendCost(unsigned Opcode, Type *Dst,
VectorType *VecTy,		VectorType *VecTy,
unsigned Index) const {		unsigned Index) const {
int Cost = TTIImpl->getExtractWithExtendCost(Opcode, Dst, VecTy, Index);		int Cost = TTIImpl->getExtractWithExtendCost(Opcode, Dst, VecTy, Index);
assert(Cost >= 0 && "TTI should not produce negative costs!");		assert(Cost >= 0 && "TTI should not produce negative costs!");
return Cost;		return Cost;
}		}

int TargetTransformInfo::getCFInstrCost(unsigned Opcode,		int TargetTransformInfo::getCFInstrCost(unsigned Opcode,
TTI::TargetCostKind CostKind) const {		TTI::TargetCostKind CostKind,
int Cost = TTIImpl->getCFInstrCost(Opcode, CostKind);		const Instruction *I) const {
		assert((I == nullptr \|\| I->getOpcode() == Opcode) &&
		"Opcode should reflect passed instruction.");
		int Cost = TTIImpl->getCFInstrCost(Opcode, CostKind, I);
assert(Cost >= 0 && "TTI should not produce negative costs!");		assert(Cost >= 0 && "TTI should not produce negative costs!");
return Cost;		return Cost;
}		}

int TargetTransformInfo::getCmpSelInstrCost(unsigned Opcode, Type *ValTy,		int TargetTransformInfo::getCmpSelInstrCost(unsigned Opcode, Type *ValTy,
Type *CondTy,		Type *CondTy,
CmpInst::Predicate VecPred,		CmpInst::Predicate VecPred,
TTI::TargetCostKind CostKind,		TTI::TargetCostKind CostKind,
▲ Show 20 Lines • Show All 573 Lines • ▼ Show 20 Lines	TargetTransformInfo::getInstructionThroughput(const Instruction *I) const {
case Instruction::FPTrunc:		case Instruction::FPTrunc:
case Instruction::BitCast:		case Instruction::BitCast:
case Instruction::AddrSpaceCast:		case Instruction::AddrSpaceCast:
case Instruction::ExtractElement:		case Instruction::ExtractElement:
case Instruction::InsertElement:		case Instruction::InsertElement:
case Instruction::ExtractValue:		case Instruction::ExtractValue:
case Instruction::ShuffleVector:		case Instruction::ShuffleVector:
case Instruction::Call:		case Instruction::Call:
		case Instruction::Switch:
return getUserCost(I, CostKind);		return getUserCost(I, CostKind);
default:		default:
// We don't have any information on this instruction.		// We don't have any information on this instruction.
return -1;		return -1;
}		}
}		}

TargetTransformInfo::Concept::~Concept() {}		TargetTransformInfo::Concept::~Concept() {}
▲ Show 20 Lines • Show All 48 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h

Show First 20 Lines • Show All 133 Lines • ▼ Show 20 Lines	public:

int getCastInstrCost(unsigned Opcode, Type Dst, Type Src,		int getCastInstrCost(unsigned Opcode, Type Dst, Type Src,
TTI::CastContextHint CCH, TTI::TargetCostKind CostKind,		TTI::CastContextHint CCH, TTI::TargetCostKind CostKind,
const Instruction *I = nullptr);		const Instruction *I = nullptr);

int getExtractWithExtendCost(unsigned Opcode, Type Dst, VectorType VecTy,		int getExtractWithExtendCost(unsigned Opcode, Type Dst, VectorType VecTy,
unsigned Index);		unsigned Index);

unsigned getCFInstrCost(unsigned Opcode, TTI::TargetCostKind CostKind);		unsigned getCFInstrCost(unsigned Opcode, TTI::TargetCostKind CostKind,
		const Instruction *I = nullptr);

int getVectorInstrCost(unsigned Opcode, Type *Val, unsigned Index);		int getVectorInstrCost(unsigned Opcode, Type *Val, unsigned Index);

int getMinMaxReductionCost(VectorType Ty, VectorType CondTy,		int getMinMaxReductionCost(VectorType Ty, VectorType CondTy,
bool IsPairwise, bool IsUnsigned,		bool IsPairwise, bool IsUnsigned,
TTI::TargetCostKind CostKind);		TTI::TargetCostKind CostKind);

int getArithmeticReductionCostSVE(unsigned Opcode, VectorType *ValTy,		int getArithmeticReductionCostSVE(unsigned Opcode, VectorType *ValTy,
▲ Show 20 Lines • Show All 139 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

Show First 20 Lines • Show All 647 Lines • ▼ Show 20 Lines	int AArch64TTIImpl::getExtractWithExtendCost(unsigned Opcode, Type *Dst,
}		}

// If we are unable to perform the extend for free, get the default cost.		// If we are unable to perform the extend for free, get the default cost.
return Cost + getCastInstrCost(Opcode, Dst, Src, TTI::CastContextHint::None,		return Cost + getCastInstrCost(Opcode, Dst, Src, TTI::CastContextHint::None,
CostKind);		CostKind);
}		}

unsigned AArch64TTIImpl::getCFInstrCost(unsigned Opcode,		unsigned AArch64TTIImpl::getCFInstrCost(unsigned Opcode,
TTI::TargetCostKind CostKind) {		TTI::TargetCostKind CostKind,
		const Instruction *I) {
if (CostKind != TTI::TCK_RecipThroughput)		if (CostKind != TTI::TCK_RecipThroughput)
return Opcode == Instruction::PHI ? 0 : 1;		return Opcode == Instruction::PHI ? 0 : 1;
assert(CostKind == TTI::TCK_RecipThroughput && "unexpected CostKind");		assert(CostKind == TTI::TCK_RecipThroughput && "unexpected CostKind");
// Branches are assumed to be predicted.		// Branches are assumed to be predicted.
return 0;		return 0;
}		}

int AArch64TTIImpl::getVectorInstrCost(unsigned Opcode, Type *Val,		int AArch64TTIImpl::getVectorInstrCost(unsigned Opcode, Type *Val,
▲ Show 20 Lines • Show All 746 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h

Show First 20 Lines • Show All 157 Lines • ▼ Show 20 Lines	int getArithmeticInstrCost(
TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput,		TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput,
TTI::OperandValueKind Opd1Info = TTI::OK_AnyValue,		TTI::OperandValueKind Opd1Info = TTI::OK_AnyValue,
TTI::OperandValueKind Opd2Info = TTI::OK_AnyValue,		TTI::OperandValueKind Opd2Info = TTI::OK_AnyValue,
TTI::OperandValueProperties Opd1PropInfo = TTI::OP_None,		TTI::OperandValueProperties Opd1PropInfo = TTI::OP_None,
TTI::OperandValueProperties Opd2PropInfo = TTI::OP_None,		TTI::OperandValueProperties Opd2PropInfo = TTI::OP_None,
ArrayRef<const Value > Args = ArrayRef<const Value >(),		ArrayRef<const Value > Args = ArrayRef<const Value >(),
const Instruction *CxtI = nullptr);		const Instruction *CxtI = nullptr);

unsigned getCFInstrCost(unsigned Opcode, TTI::TargetCostKind CostKind);		unsigned getCFInstrCost(unsigned Opcode, TTI::TargetCostKind CostKind,
		const Instruction *I = nullptr);

bool isInlineAsmSourceOfDivergence(const CallInst *CI,		bool isInlineAsmSourceOfDivergence(const CallInst *CI,
ArrayRef<unsigned> Indices = {}) const;		ArrayRef<unsigned> Indices = {}) const;

int getVectorInstrCost(unsigned Opcode, Type *ValTy, unsigned Index);		int getVectorInstrCost(unsigned Opcode, Type *ValTy, unsigned Index);
bool isSourceOfDivergence(const Value *V) const;		bool isSourceOfDivergence(const Value *V) const;
bool isAlwaysUniform(const Value *V) const;		bool isAlwaysUniform(const Value *V) const;

▲ Show 20 Lines • Show All 73 Lines • ▼ Show 20 Lines	public:
unsigned getLoadStoreVecRegBitWidth(unsigned AddrSpace) const;		unsigned getLoadStoreVecRegBitWidth(unsigned AddrSpace) const;
bool isLegalToVectorizeMemChain(unsigned ChainSizeInBytes, Align Alignment,		bool isLegalToVectorizeMemChain(unsigned ChainSizeInBytes, Align Alignment,
unsigned AddrSpace) const;		unsigned AddrSpace) const;
bool isLegalToVectorizeLoadChain(unsigned ChainSizeInBytes, Align Alignment,		bool isLegalToVectorizeLoadChain(unsigned ChainSizeInBytes, Align Alignment,
unsigned AddrSpace) const;		unsigned AddrSpace) const;
bool isLegalToVectorizeStoreChain(unsigned ChainSizeInBytes, Align Alignment,		bool isLegalToVectorizeStoreChain(unsigned ChainSizeInBytes, Align Alignment,
unsigned AddrSpace) const;		unsigned AddrSpace) const;
unsigned getMaxInterleaveFactor(unsigned VF);		unsigned getMaxInterleaveFactor(unsigned VF);
unsigned getCFInstrCost(unsigned Opcode, TTI::TargetCostKind CostKind);		unsigned getCFInstrCost(unsigned Opcode, TTI::TargetCostKind CostKind,
		const Instruction *I = nullptr);
int getVectorInstrCost(unsigned Opcode, Type *ValTy, unsigned Index);		int getVectorInstrCost(unsigned Opcode, Type *ValTy, unsigned Index);
};		};

} // end namespace llvm		} // end namespace llvm

#endif // LLVM_LIB_TARGET_AMDGPU_AMDGPUTARGETTRANSFORMINFO_H		#endif // LLVM_LIB_TARGET_AMDGPU_AMDGPUTARGETTRANSFORMINFO_H

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp

//===- AMDGPUTargetTransformInfo.cpp - AMDGPU specific TTI pass -----------===//		//===- AMDGPUTargetTransformInfo.cpp - AMDGPU specific TTI pass -----------===//
		Lint: Lint Inline Actions clang-format suggested style edits found: Lint: Lint: clang-format suggested style edits found:
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// \file		// \file
Show All 21 Lines	static cl::opt<unsigned> UnrollThresholdPrivate(
cl::desc("Unroll threshold for AMDGPU if private memory used in a loop"),		cl::desc("Unroll threshold for AMDGPU if private memory used in a loop"),
cl::init(2700), cl::Hidden);		cl::init(2700), cl::Hidden);

static cl::opt<unsigned> UnrollThresholdLocal(		static cl::opt<unsigned> UnrollThresholdLocal(
"amdgpu-unroll-threshold-local",		"amdgpu-unroll-threshold-local",
cl::desc("Unroll threshold for AMDGPU if local memory used in a loop"),		cl::desc("Unroll threshold for AMDGPU if local memory used in a loop"),
cl::init(1000), cl::Hidden);		cl::init(1000), cl::Hidden);

static cl::opt<unsigned> UnrollThresholdIf(		static cl::opt<unsigned> UnrollThresholdIf(
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code -static cl::opt<unsigned> UnrollThresholdIf( - "amdgpu-unroll-threshold-if", - cl::desc("Unroll threshold increment for AMDGPU for each if statement inside loop"), - cl::init(200), cl::Hidden); +static cl::opt<unsigned> + UnrollThresholdIf("amdgpu-unroll-threshold-if", + cl::desc("Unroll threshold increment for AMDGPU for each " + "if statement inside loop"), + cl::init(200), cl::Hidden); Lint: Pre-merge checks: clang-format: please reformat the code ``` -static cl::opt<unsigned> UnrollThresholdIf…
"amdgpu-unroll-threshold-if",		"amdgpu-unroll-threshold-if",
cl::desc("Unroll threshold increment for AMDGPU for each if statement inside loop"),		cl::desc("Unroll threshold increment for AMDGPU for each if statement inside loop"),
cl::init(150), cl::Hidden);		cl::init(200), cl::Hidden);

static cl::opt<bool> UnrollRuntimeLocal(		static cl::opt<bool> UnrollRuntimeLocal(
"amdgpu-unroll-runtime-local",		"amdgpu-unroll-runtime-local",
cl::desc("Allow runtime unroll for AMDGPU if local memory used in a loop"),		cl::desc("Allow runtime unroll for AMDGPU if local memory used in a loop"),
cl::init(true), cl::Hidden);		cl::init(true), cl::Hidden);

static cl::opt<bool> UseLegacyDA(		static cl::opt<bool> UseLegacyDA(
"amdgpu-use-legacy-divergence-analysis",		"amdgpu-use-legacy-divergence-analysis",
▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines

void AMDGPUTTIImpl::getUnrollingPreferences(Loop *L, ScalarEvolution &SE,		void AMDGPUTTIImpl::getUnrollingPreferences(Loop *L, ScalarEvolution &SE,
TTI::UnrollingPreferences &UP) {		TTI::UnrollingPreferences &UP) {
const Function &F = *L->getHeader()->getParent();		const Function &F = *L->getHeader()->getParent();
UP.Threshold = AMDGPU::getIntegerAttribute(F, "amdgpu-unroll-threshold", 300);		UP.Threshold = AMDGPU::getIntegerAttribute(F, "amdgpu-unroll-threshold", 300);
UP.MaxCount = std::numeric_limits<unsigned>::max();		UP.MaxCount = std::numeric_limits<unsigned>::max();
UP.Partial = true;		UP.Partial = true;

		// Conditional branch in a loop back edge needs 3 additional exec
		// manipulations in average.
		UP.BEInsns += 3;

// TODO: Do we want runtime unrolling?		// TODO: Do we want runtime unrolling?

// Maximum alloca size than can fit registers. Reserve 16 registers.		// Maximum alloca size than can fit registers. Reserve 16 registers.
const unsigned MaxAlloca = (256 - 16) * 4;		const unsigned MaxAlloca = (256 - 16) * 4;
unsigned ThresholdPrivate = UnrollThresholdPrivate;		unsigned ThresholdPrivate = UnrollThresholdPrivate;
unsigned ThresholdLocal = UnrollThresholdLocal;		unsigned ThresholdLocal = UnrollThresholdLocal;

// If this loop has the amdgpu.loop.unroll.threshold metadata we will use the		// If this loop has the amdgpu.loop.unroll.threshold metadata we will use the
▲ Show 20 Lines • Show All 687 Lines • ▼ Show 20 Lines	if (any_of(ValidSatTys, [&LT](MVT M) { return M == LT.second; }))
NElts = 1;		NElts = 1;
break;		break;
}		}

return LT.first * NElts * InstRate;		return LT.first * NElts * InstRate;
}		}

unsigned GCNTTIImpl::getCFInstrCost(unsigned Opcode,		unsigned GCNTTIImpl::getCFInstrCost(unsigned Opcode,
TTI::TargetCostKind CostKind) {		TTI::TargetCostKind CostKind,
if (CostKind == TTI::TCK_CodeSize \|\| CostKind == TTI::TCK_SizeAndLatency)		const Instruction *I) {
return Opcode == Instruction::PHI ? 0 : 1;		assert((I == nullptr \|\| I->getOpcode() == Opcode) &&
		"Opcode should reflect passed instruction.");
// XXX - For some reason this isn't called for switch.		const bool SCost =
		(CostKind == TTI::TCK_CodeSize \|\| CostKind == TTI::TCK_SizeAndLatency);
		const int CBrCost = SCost ? 5 : 7;
switch (Opcode) {		switch (Opcode) {
case Instruction::Br:		case Instruction::Br: {
		// Branch instruction takes about 4 slots on gfx900.
		auto BI = dyn_cast_or_null<BranchInst>(I);
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: 'auto BI' can be declared as 'const auto BI' [llvm-qualified-auto] not useful Lint: Pre-merge checks:* clang-tidy: warning: 'auto BI' can be declared as 'const auto *BI' [llvm-qualified-auto]…
		if (BI && BI->isUnconditional())
		return SCost ? 1 : 4;
		// Suppose conditional branch takes additional 3 exec manipulations
		// instructions in average.
		return CBrCost;
		}
		case Instruction::Switch: {
		auto SI = dyn_cast_or_null<SwitchInst>(I);
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: 'auto SI' can be declared as 'const auto SI' [llvm-qualified-auto] not useful Lint: Pre-merge checks:* clang-tidy: warning: 'auto SI' can be declared as 'const auto *SI' [llvm-qualified-auto]…
		// Each case (including default) takes 1 cmp + 1 cbr instructions in
		// average.
		return (SI ? (SI->getNumCases() + 1) : 4) * (CBrCost + 1);
		}
case Instruction::Ret:		case Instruction::Ret:
return 10;		return SCost ? 1 : 10;
default:		case Instruction::PHI:
return BaseT::getCFInstrCost(Opcode, CostKind);		// TODO: 1. A prediction phi won't be eliminated?
		// 2. Estimate data copy instructions in this case.
		return 1;
}		}
		return BaseT::getCFInstrCost(Opcode, CostKind, I);
}		}

int GCNTTIImpl::getArithmeticReductionCost(unsigned Opcode, VectorType *Ty,		int GCNTTIImpl::getArithmeticReductionCost(unsigned Opcode, VectorType *Ty,
bool IsPairwise,		bool IsPairwise,
TTI::TargetCostKind CostKind) {		TTI::TargetCostKind CostKind) {
EVT OrigTy = TLI->getValueType(DL, Ty);		EVT OrigTy = TLI->getValueType(DL, Ty);

// Computes cost on targets that have packed math instructions(which support		// Computes cost on targets that have packed math instructions(which support
▲ Show 20 Lines • Show All 455 Lines • ▼ Show 20 Lines	unsigned R600TTIImpl::getMaxInterleaveFactor(unsigned VF) {
// TODO: Enable this again.		// TODO: Enable this again.
if (VF == 1)		if (VF == 1)
return 1;		return 1;

return 8;		return 8;
}		}

unsigned R600TTIImpl::getCFInstrCost(unsigned Opcode,		unsigned R600TTIImpl::getCFInstrCost(unsigned Opcode,
TTI::TargetCostKind CostKind) {		TTI::TargetCostKind CostKind,
		const Instruction *I) {
if (CostKind == TTI::TCK_CodeSize \|\| CostKind == TTI::TCK_SizeAndLatency)		if (CostKind == TTI::TCK_CodeSize \|\| CostKind == TTI::TCK_SizeAndLatency)
return Opcode == Instruction::PHI ? 0 : 1;		return Opcode == Instruction::PHI ? 0 : 1;

// XXX - For some reason this isn't called for switch.		// XXX - For some reason this isn't called for switch.
switch (Opcode) {		switch (Opcode) {
case Instruction::Br:		case Instruction::Br:
case Instruction::Ret:		case Instruction::Ret:
return 10;		return 10;
default:		default:
return BaseT::getCFInstrCost(Opcode, CostKind);		return BaseT::getCFInstrCost(Opcode, CostKind, I);
}		}
}		}

int R600TTIImpl::getVectorInstrCost(unsigned Opcode, Type *ValTy,		int R600TTIImpl::getVectorInstrCost(unsigned Opcode, Type *ValTy,
unsigned Index) {		unsigned Index) {
switch (Opcode) {		switch (Opcode) {
case Instruction::ExtractElement:		case Instruction::ExtractElement:
case Instruction::InsertElement: {		case Instruction::InsertElement: {
Show All 27 Lines

llvm/lib/Target/ARM/ARMTargetTransformInfo.h

Show First 20 Lines • Show All 192 Lines • ▼ Show 20 Lines	public:
bool preferInLoopReduction(unsigned Opcode, Type *Ty,		bool preferInLoopReduction(unsigned Opcode, Type *Ty,
TTI::ReductionFlags Flags) const;		TTI::ReductionFlags Flags) const;

bool preferPredicatedReductionSelect(unsigned Opcode, Type *Ty,		bool preferPredicatedReductionSelect(unsigned Opcode, Type *Ty,
TTI::ReductionFlags Flags) const;		TTI::ReductionFlags Flags) const;

bool shouldExpandReduction(const IntrinsicInst *II) const { return false; }		bool shouldExpandReduction(const IntrinsicInst *II) const { return false; }

int getCFInstrCost(unsigned Opcode,		int getCFInstrCost(unsigned Opcode, TTI::TargetCostKind CostKind,
TTI::TargetCostKind CostKind);		const Instruction *I = nullptr);

int getCastInstrCost(unsigned Opcode, Type Dst, Type Src,		int getCastInstrCost(unsigned Opcode, Type Dst, Type Src,
TTI::CastContextHint CCH, TTI::TargetCostKind CostKind,		TTI::CastContextHint CCH, TTI::TargetCostKind CostKind,
const Instruction *I = nullptr);		const Instruction *I = nullptr);

int getCmpSelInstrCost(unsigned Opcode, Type ValTy, Type CondTy,		int getCmpSelInstrCost(unsigned Opcode, Type ValTy, Type CondTy,
CmpInst::Predicate VecPred,		CmpInst::Predicate VecPred,
TTI::TargetCostKind CostKind,		TTI::TargetCostKind CostKind,
▲ Show 20 Lines • Show All 110 Lines • Show Last 20 Lines

llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp

Show First 20 Lines • Show All 373 Lines • ▼ Show 20 Lines	if (isSSATMinMaxPattern(Inst, Imm) \|\|
(isa<ICmpInst>(Inst) && Inst->hasOneUse() &&		(isa<ICmpInst>(Inst) && Inst->hasOneUse() &&
isSSATMinMaxPattern(cast<Instruction>(*Inst->user_begin()), Imm)))		isSSATMinMaxPattern(cast<Instruction>(*Inst->user_begin()), Imm)))
return 0;		return 0;
}		}

return getIntImmCost(Imm, Ty, CostKind);		return getIntImmCost(Imm, Ty, CostKind);
}		}

int ARMTTIImpl::getCFInstrCost(unsigned Opcode, TTI::TargetCostKind CostKind) {		int ARMTTIImpl::getCFInstrCost(unsigned Opcode, TTI::TargetCostKind CostKind,
		const Instruction *I) {
if (CostKind == TTI::TCK_RecipThroughput &&		if (CostKind == TTI::TCK_RecipThroughput &&
(ST->hasNEON() \|\| ST->hasMVEIntegerOps())) {		(ST->hasNEON() \|\| ST->hasMVEIntegerOps())) {
// FIXME: The vectorizer is highly sensistive to the cost of these		// FIXME: The vectorizer is highly sensistive to the cost of these
// instructions, which suggests that it may be using the costs incorrectly.		// instructions, which suggests that it may be using the costs incorrectly.
// But, for now, just make them free to avoid performance regressions for		// But, for now, just make them free to avoid performance regressions for
// vector targets.		// vector targets.
return 0;		return 0;
}		}
return BaseT::getCFInstrCost(Opcode, CostKind);		return BaseT::getCFInstrCost(Opcode, CostKind, I);
}		}

int ARMTTIImpl::getCastInstrCost(unsigned Opcode, Type Dst, Type Src,		int ARMTTIImpl::getCastInstrCost(unsigned Opcode, Type Dst, Type Src,
TTI::CastContextHint CCH,		TTI::CastContextHint CCH,
TTI::TargetCostKind CostKind,		TTI::TargetCostKind CostKind,
const Instruction *I) {		const Instruction *I) {
int ISD = TLI->InstructionOpcodeToISD(Opcode);		int ISD = TLI->InstructionOpcodeToISD(Opcode);
assert(ISD && "Invalid opcode");		assert(ISD && "Invalid opcode");
▲ Show 20 Lines • Show All 1,827 Lines • Show Last 20 Lines

llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.h

Show First 20 Lines • Show All 147 Lines • ▼ Show 20 Lines	unsigned getArithmeticInstrCost(
ArrayRef<const Value > Args = ArrayRef<const Value >(),		ArrayRef<const Value > Args = ArrayRef<const Value >(),
const Instruction *CxtI = nullptr);		const Instruction *CxtI = nullptr);
unsigned getCastInstrCost(unsigned Opcode, Type Dst, Type Src,		unsigned getCastInstrCost(unsigned Opcode, Type Dst, Type Src,
TTI::CastContextHint CCH,		TTI::CastContextHint CCH,
TTI::TargetCostKind CostKind,		TTI::TargetCostKind CostKind,
const Instruction *I = nullptr);		const Instruction *I = nullptr);
unsigned getVectorInstrCost(unsigned Opcode, Type *Val, unsigned Index);		unsigned getVectorInstrCost(unsigned Opcode, Type *Val, unsigned Index);

unsigned getCFInstrCost(unsigned Opcode, TTI::TargetCostKind CostKind) {		unsigned getCFInstrCost(unsigned Opcode, TTI::TargetCostKind CostKind,
		const Instruction *I = nullptr) {
return 1;		return 1;
}		}

bool isLegalMaskedStore(Type *DataType, Align Alignment);		bool isLegalMaskedStore(Type *DataType, Align Alignment);
bool isLegalMaskedLoad(Type *DataType, Align Alignment);		bool isLegalMaskedLoad(Type *DataType, Align Alignment);

/// @}		/// @}

Show All 9 Lines

llvm/lib/Target/PowerPC/PPCTargetTransformInfo.h

Show First 20 Lines • Show All 106 Lines • ▼ Show 20 Lines	int getArithmeticInstrCost(
TTI::OperandValueProperties Opd2PropInfo = TTI::OP_None,		TTI::OperandValueProperties Opd2PropInfo = TTI::OP_None,
ArrayRef<const Value > Args = ArrayRef<const Value >(),		ArrayRef<const Value > Args = ArrayRef<const Value >(),
const Instruction *CxtI = nullptr);		const Instruction *CxtI = nullptr);
int getShuffleCost(TTI::ShuffleKind Kind, Type *Tp, ArrayRef<int> Mask,		int getShuffleCost(TTI::ShuffleKind Kind, Type *Tp, ArrayRef<int> Mask,
int Index, Type *SubTp);		int Index, Type *SubTp);
int getCastInstrCost(unsigned Opcode, Type Dst, Type Src,		int getCastInstrCost(unsigned Opcode, Type Dst, Type Src,
TTI::CastContextHint CCH, TTI::TargetCostKind CostKind,		TTI::CastContextHint CCH, TTI::TargetCostKind CostKind,
const Instruction *I = nullptr);		const Instruction *I = nullptr);
int getCFInstrCost(unsigned Opcode, TTI::TargetCostKind CostKind);		int getCFInstrCost(unsigned Opcode, TTI::TargetCostKind CostKind,
		const Instruction *I = nullptr);
int getCmpSelInstrCost(unsigned Opcode, Type ValTy, Type CondTy,		int getCmpSelInstrCost(unsigned Opcode, Type ValTy, Type CondTy,
CmpInst::Predicate VecPred,		CmpInst::Predicate VecPred,
TTI::TargetCostKind CostKind,		TTI::TargetCostKind CostKind,
const Instruction *I = nullptr);		const Instruction *I = nullptr);
int getVectorInstrCost(unsigned Opcode, Type *Val, unsigned Index);		int getVectorInstrCost(unsigned Opcode, Type *Val, unsigned Index);
int getMemoryOpCost(unsigned Opcode, Type *Src, MaybeAlign Alignment,		int getMemoryOpCost(unsigned Opcode, Type *Src, MaybeAlign Alignment,
unsigned AddressSpace,		unsigned AddressSpace,
TTI::TargetCostKind CostKind,		TTI::TargetCostKind CostKind,
Show All 15 Lines

llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp

Show First 20 Lines • Show All 994 Lines • ▼ Show 20 Lines	int PPCTTIImpl::getShuffleCost(TTI::ShuffleKind Kind, Type *Tp,
// (at least in the sense that there need only be one non-loop-invariant		// (at least in the sense that there need only be one non-loop-invariant
// instruction). We need one such shuffle instruction for each actual		// instruction). We need one such shuffle instruction for each actual
// register (this is not true for arbitrary shuffles, but is true for the		// register (this is not true for arbitrary shuffles, but is true for the
// structured types of shuffles covered by TTI::ShuffleKind).		// structured types of shuffles covered by TTI::ShuffleKind).
return vectorCostAdjustment(LT.first, Instruction::ShuffleVector, Tp,		return vectorCostAdjustment(LT.first, Instruction::ShuffleVector, Tp,
nullptr);		nullptr);
}		}

int PPCTTIImpl::getCFInstrCost(unsigned Opcode, TTI::TargetCostKind CostKind) {		int PPCTTIImpl::getCFInstrCost(unsigned Opcode, TTI::TargetCostKind CostKind,
		const Instruction *I) {
if (CostKind != TTI::TCK_RecipThroughput)		if (CostKind != TTI::TCK_RecipThroughput)
return Opcode == Instruction::PHI ? 0 : 1;		return Opcode == Instruction::PHI ? 0 : 1;
// Branches are assumed to be predicted.		// Branches are assumed to be predicted.
return CostKind == TTI::TCK_RecipThroughput ? 0 : 1;		return 0;
}		}

int PPCTTIImpl::getCastInstrCost(unsigned Opcode, Type Dst, Type Src,		int PPCTTIImpl::getCastInstrCost(unsigned Opcode, Type Dst, Type Src,
TTI::CastContextHint CCH,		TTI::CastContextHint CCH,
TTI::TargetCostKind CostKind,		TTI::TargetCostKind CostKind,
const Instruction *I) {		const Instruction *I) {
assert(TLI->InstructionOpcodeToISD(Opcode) && "Invalid opcode");		assert(TLI->InstructionOpcodeToISD(Opcode) && "Invalid opcode");

▲ Show 20 Lines • Show All 285 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86TargetTransformInfo.h

Show First 20 Lines • Show All 197 Lines • ▼ Show 20 Lines	int getInterleavedMemoryOpCostAVX2(
ArrayRef<unsigned> Indices, Align Alignment, unsigned AddressSpace,		ArrayRef<unsigned> Indices, Align Alignment, unsigned AddressSpace,
TTI::TargetCostKind CostKind = TTI::TCK_SizeAndLatency,		TTI::TargetCostKind CostKind = TTI::TCK_SizeAndLatency,
bool UseMaskForCond = false, bool UseMaskForGaps = false);		bool UseMaskForCond = false, bool UseMaskForGaps = false);

int getIntImmCost(int64_t);		int getIntImmCost(int64_t);

int getIntImmCost(const APInt &Imm, Type *Ty, TTI::TargetCostKind CostKind);		int getIntImmCost(const APInt &Imm, Type *Ty, TTI::TargetCostKind CostKind);

unsigned getCFInstrCost(unsigned Opcode, TTI::TargetCostKind CostKind);		unsigned getCFInstrCost(unsigned Opcode, TTI::TargetCostKind CostKind,
		const Instruction *I = nullptr);

int getIntImmCostInst(unsigned Opcode, unsigned Idx, const APInt &Imm,		int getIntImmCostInst(unsigned Opcode, unsigned Idx, const APInt &Imm,
Type *Ty, TTI::TargetCostKind CostKind,		Type *Ty, TTI::TargetCostKind CostKind,
Instruction *Inst = nullptr);		Instruction *Inst = nullptr);
int getIntImmCostIntrin(Intrinsic::ID IID, unsigned Idx, const APInt &Imm,		int getIntImmCostIntrin(Intrinsic::ID IID, unsigned Idx, const APInt &Imm,
Type *Ty, TTI::TargetCostKind CostKind);		Type *Ty, TTI::TargetCostKind CostKind);
bool isLSRCostLess(TargetTransformInfo::LSRCost &C1,		bool isLSRCostLess(TargetTransformInfo::LSRCost &C1,
TargetTransformInfo::LSRCost &C2);		TargetTransformInfo::LSRCost &C2);
Show All 35 Lines

llvm/lib/Target/X86/X86TargetTransformInfo.cpp

Show First 20 Lines • Show All 4,070 Lines • ▼ Show 20 Lines	int X86TTIImpl::getIntImmCostIntrin(Intrinsic::ID IID, unsigned Idx,
case Intrinsic::experimental_patchpoint_i64:		case Intrinsic::experimental_patchpoint_i64:
if ((Idx < 4) \|\| (Imm.getBitWidth() <= 64 && isInt<64>(Imm.getSExtValue())))		if ((Idx < 4) \|\| (Imm.getBitWidth() <= 64 && isInt<64>(Imm.getSExtValue())))
return TTI::TCC_Free;		return TTI::TCC_Free;
break;		break;
}		}
return X86TTIImpl::getIntImmCost(Imm, Ty, CostKind);		return X86TTIImpl::getIntImmCost(Imm, Ty, CostKind);
}		}

unsigned		unsigned X86TTIImpl::getCFInstrCost(unsigned Opcode,
X86TTIImpl::getCFInstrCost(unsigned Opcode, TTI::TargetCostKind CostKind) {		TTI::TargetCostKind CostKind,
		const Instruction *I) {
if (CostKind != TTI::TCK_RecipThroughput)		if (CostKind != TTI::TCK_RecipThroughput)
return Opcode == Instruction::PHI ? 0 : 1;		return Opcode == Instruction::PHI ? 0 : 1;
// Branches are assumed to be predicted.		// Branches are assumed to be predicted.
return CostKind == TTI::TCK_RecipThroughput ? 0 : 1;		return 0;
}		}

int X86TTIImpl::getGatherOverhead() const {		int X86TTIImpl::getGatherOverhead() const {
// Some CPUs have more overhead for gather. The specified overhead is relative		// Some CPUs have more overhead for gather. The specified overhead is relative
// to the Load operation. "2" is the number provided by Intel architects. This		// to the Load operation. "2" is the number provided by Intel architects. This
// parameter is used for cost estimation of Gather Op and comparison with		// parameter is used for cost estimation of Gather Op and comparison with
// other alternatives.		// other alternatives.
// TODO: Remove the explicit hasAVX512()?, That would mean we would only		// TODO: Remove the explicit hasAVX512()?, That would mean we would only
▲ Show 20 Lines • Show All 687 Lines • Show Last 20 Lines

llvm/test/Analysis/CostModel/AMDGPU/br.ll

This file was deleted.

	; RUN: opt -cost-model -analyze -mtriple=amdgcn-unknown-amdhsa < %s \| FileCheck %s

	; CHECK: 'test_br_cost'
	; CHECK: estimated cost of 10 for instruction: br i1
	; CHECK: estimated cost of 10 for instruction: br label
	; CHECK: estimated cost of 10 for instruction: ret void
	define amdgpu_kernel void @test_br_cost(i32 addrspace(1)* %out, i32 addrspace(1)* %vaddr, i32 %b) #0 {
	bb0:
	br i1 undef, label %bb1, label %bb2

	bb1:
	%vec = load i32, i32 addrspace(1)* %vaddr
	%add = add i32 %vec, %b
	store i32 %add, i32 addrspace(1)* %out
	br label %bb2

	bb2:
	ret void

	}

	; CHECK: 'test_switch_cost'
	; CHECK: estimated cost of -1 for instruction: switch
	define amdgpu_kernel void @test_switch_cost(i32 %a) #0 {
	entry:
	switch i32 %a, label %default [
	i32 0, label %case0
	i32 1, label %case1
	]

	case0:
	store volatile i32 undef, i32 addrspace(1)* undef
	ret void

	case1:
	store volatile i32 undef, i32 addrspace(1)* undef
	ret void

	default:
	store volatile i32 undef, i32 addrspace(1)* undef
	ret void

	end:
	ret void
	}

llvm/test/Analysis/CostModel/AMDGPU/control-flow.ll

This file was added.

				; RUN: opt -cost-model -analyze -mtriple=amdgcn-unknown-amdhsa < %s \| FileCheck --check-prefixes=ALL,SPEED %s
				; RUN: opt -cost-model -cost-kind=code-size -analyze -mtriple=amdgcn-unknown-amdhsa < %s \| FileCheck --check-prefixes=ALL,SIZE %s

				; ALL-LABEL: 'test_br_cost'
				; SPEED: estimated cost of 7 for instruction: br i1
				; SPEED: estimated cost of 4 for instruction: br label
				; SPEED: estimated cost of 1 for instruction: %phi = phi i32 [
				; SPEED: estimated cost of 10 for instruction: ret void
				; SIZE: estimated cost of 5 for instruction: br i1
				; SIZE: estimated cost of 1 for instruction: br label
				; SIZE: estimated cost of 1 for instruction: %phi = phi i32 [
				; SIZE: estimated cost of 1 for instruction: ret void
				define amdgpu_kernel void @test_br_cost(i32 addrspace(1)* %out, i32 addrspace(1)* %vaddr, i32 %b) #0 {
				bb0:
				br i1 undef, label %bb1, label %bb2

				bb1:
				%vec = load i32, i32 addrspace(1)* %vaddr
				%add = add i32 %vec, %b
				store i32 %add, i32 addrspace(1)* %out
				br label %bb2

				bb2:
				%phi = phi i32 [ %b, %bb0 ], [ %add, %bb1 ]
				ret void
				}

				; ALL-LABEL: 'test_switch_cost'
				; SPEED: estimated cost of 24 for instruction: switch
				; SIZE: estimated cost of 18 for instruction: switch
				define amdgpu_kernel void @test_switch_cost(i32 %a) #0 {
				entry:
				switch i32 %a, label %default [
				i32 0, label %case0
				i32 1, label %case1
				]

				case0:
				store volatile i32 undef, i32 addrspace(1)* undef
				ret void

				case1:
				store volatile i32 undef, i32 addrspace(1)* undef
				ret void

				default:
				store volatile i32 undef, i32 addrspace(1)* undef
				ret void

				end:
				ret void
				}

llvm/test/CodeGen/AMDGPU/unroll.ll

	Show First 20 Lines • Show All 75 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: store			; CHECK-NEXT: store
	; CHECK-NOT: br			; CHECK-NOT: br
	define amdgpu_kernel void @unroll_for_if(i32 addrspace(5)* %a) {			define amdgpu_kernel void @unroll_for_if(i32 addrspace(5)* %a) {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %entry, %for.inc			for.body: ; preds = %entry, %for.inc
	%i1 = phi i32 [ 0, %entry ], [ %inc, %for.inc ]			%i1 = phi i32 [ 0, %entry ], [ %inc, %for.inc ]
	%and = and i32 %i1, 1			%tobool = icmp eq i32 %i1, 0
	%tobool = icmp eq i32 %and, 0
	br i1 %tobool, label %for.inc, label %if.then			br i1 %tobool, label %for.inc, label %if.then

	if.then: ; preds = %for.body			if.then: ; preds = %for.body
	%0 = sext i32 %i1 to i64			%0 = sext i32 %i1 to i64
	%arrayidx = getelementptr inbounds i32, i32 addrspace(5)* %a, i64 %0			%arrayidx = getelementptr inbounds i32, i32 addrspace(5)* %a, i64 %0
	store i32 0, i32 addrspace(5)* %arrayidx, align 4			store i32 0, i32 addrspace(5)* %arrayidx, align 4
	br label %for.inc			br label %for.inc

	for.inc: ; preds = %for.body, %if.then			for.inc: ; preds = %for.body, %if.then
	%inc = add nuw nsw i32 %i1, 1			%inc = add nuw nsw i32 %i1, 1
	%cmp = icmp ult i32 %inc, 48			%cmp = icmp ult i32 %inc, 38
	br i1 %cmp, label %for.body, label %for.end			br i1 %cmp, label %for.body, label %for.end

	for.end: ; preds = %for.cond			for.end: ; preds = %for.cond
	ret void			ret void
	}			}

	; Check that runtime unroll is enabled for local memory references			; Check that runtime unroll is enabled for local memory references

	Show All 31 Lines

llvm/test/Transforms/LoopUnroll/AMDGPU/unroll-cost-addrspacecast.ll

	; RUN: opt -S -mtriple=amdgcn-unknown-amdhsa -mcpu=hawaii -loop-unroll -unroll-threshold=75 -unroll-peel-count=0 -unroll-allow-partial=false -unroll-max-iteration-count-to-analyze=16 < %s \| FileCheck %s			; RUN: opt -S -mtriple=amdgcn-unknown-amdhsa -mcpu=hawaii -loop-unroll -unroll-threshold=49 -unroll-peel-count=0 -unroll-allow-partial=false -unroll-max-iteration-count-to-analyze=16 < %s \| FileCheck %s

	; CHECK-LABEL: @test_func_addrspacecast_cost_noop(			; CHECK-LABEL: @test_func_addrspacecast_cost_noop(
	; CHECK-NOT: br i1			; CHECK-NOT: br i1
	define amdgpu_kernel void @test_func_addrspacecast_cost_noop(float addrspace(1)* noalias nocapture %out, float addrspace(1)* noalias nocapture %in) #0 {			define amdgpu_kernel void @test_func_addrspacecast_cost_noop(float addrspace(1)* noalias nocapture %out, float addrspace(1)* noalias nocapture %in) #0 {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	▲ Show 20 Lines • Show All 68 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU][CostModel] Refine cost model for control-flow instructions.ClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 336553

llvm/include/llvm/Analysis/TargetTransformInfo.h

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

llvm/include/llvm/CodeGen/BasicTTIImpl.h

llvm/lib/Analysis/TargetTransformInfo.cpp

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp

llvm/lib/Target/ARM/ARMTargetTransformInfo.h

llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp

llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.h

llvm/lib/Target/PowerPC/PPCTargetTransformInfo.h

llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp

llvm/lib/Target/X86/X86TargetTransformInfo.h

llvm/lib/Target/X86/X86TargetTransformInfo.cpp

llvm/test/Analysis/CostModel/AMDGPU/br.ll

llvm/test/Analysis/CostModel/AMDGPU/control-flow.ll

llvm/test/CodeGen/AMDGPU/unroll.ll

llvm/test/Transforms/LoopUnroll/AMDGPU/unroll-cost-addrspacecast.ll

[AMDGPU][CostModel] Refine cost model for control-flow instructions.
ClosedPublic