This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Analysis/
-
llvm/
-
Analysis/
1
TargetTransformInfoImpl.h
-
test/Analysis/CostModel/
-
Analysis/
-
CostModel/
-
AArch64/
1/4
load-to-trunc.ll
-
AMDGPU/
-
load-to-trunc.ll
-
ARM/
-
load-to-trunc.ll
-
PowerPC/
-
load-to-trunc.ll
-
RISCV/
-
load-to-trunc.ll
-
SystemZ/
-
load-to-trunc.ll
-
X86/
-
load-to-trunc.ll

Differential D109388

[AArch64][CostModel] Use cost of target trunc type when only use of a non-register sized load
ClosedPublic

Authored by AndrewLitteken on Sep 7 2021, 12:58 PM.

Download Raw Diff

Details

Reviewers

fhahn
paquette
samparker

Commits

rG4ff4e7ea3033: [CostModel] Use cost of target trunc type when only it is the only use of a non…

Summary

The code size cost model for AArch64 uses the legalization cost for the type of the pointer of a load. If this load is followed directly by a trunc instruction, and is the only use of the result of the load, only one instruction is generated in the target assembly language. This adds a check for this case, and uses the target type of the trunc instruction if so.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

AndrewLitteken created this revision.Sep 7 2021, 12:58 PM

Herald added subscribers: hiraditya, kristof.beyls. · View Herald TranscriptSep 7 2021, 12:58 PM

AndrewLitteken requested review of this revision.Sep 7 2021, 12:58 PM

Herald added a project: Restricted Project. · View Herald TranscriptSep 7 2021, 12:58 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

AndrewLitteken added inline comments.Sep 7 2021, 12:59 PM

llvm/test/Analysis/CostModel/AArch64/load-to-trunc.ll
28	Add new line

Harbormaster completed remote builds in B122926: Diff 371157.Sep 7 2021, 1:49 PM

samparker added inline comments.Sep 9 2021, 5:49 AM

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1462 ↗	(On Diff #371157)	Looks like you're better off calculating the cost from the trunc, in getCastInstrCost, instead of here. If you really only mean code size cost, but the trunc is probably free for all costs for all legal loads, remember to also check that CostKind == TTI::TCK_CodeSize too.
llvm/test/Analysis/CostModel/AArch64/load-to-trunc.ll
11	Even if the trunc is free, we're still going to pay quite a bit for legalizing the unusual load, if this test is showing the cost of the load then it doesn't make sense.
19	This check doesn't look right.

Bah! I'm too used to thinking about extend in this case rather than trunc! Please ignore my previous comments!

What stage of the compiler are you needing this modelling to be done? I would have thought that this gets simplified quite early on.

This was found because of the IR Outliner overestimating the size cost of these sorts of patterns. The IR Outliner is currently positioned later in the size based optimizations when it is turned on, but could in theory be placed at any point in the pipeline.

In general, you can't expect the costmodel to reverse-engineer every optimization that might happen in the llvm pipeline. There are too many to be sensibly captured and the IR needs to be at least somewhat representative of what the backend will see. I don't think there is anything before ISel that will split up a trunc of a load like this though.

Do you plan to do this for all architectures?

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1464 ↗	(On Diff #371157)	I don't believe this will be correct for vectors.
llvm/test/Analysis/CostModel/AArch64/load-to-trunc.ll
9	A i128 would show what you mean to here, without being so large. It is also worth adding a few extra sizes, for things like i64 load truncated to i32/i8 etc. They will be much more common. Also use the update_analysis_test_checks script.

In D109388#2992635, @dmgreen wrote:

In general, you can't expect the costmodel to reverse-engineer every optimization that might happen in the llvm pipeline. There are too many to be sensibly captured and the IR needs to be at least somewhat representative of what the backend will see. I don't think there is anything before ISel that will split up a trunc of a load like this though.

Do you plan to do this for all architectures?

Would it make more sense to have a check for this on the outliner side, and make a special call to getMemoryOpCost for load to trunc instructions then?

Would it make more sense to have a check for this on the outliner side, and make a special call to getMemoryOpCost for load to trunc instructions then?

Does this come up a lot? I would be surprised, I think it only makes a difference for > 64bit loads on AArch64? For 32bit architectures I could see it happening more.

I've no objections to it being in the backend costmodel, it sounds more correct for how llvm is set up right now, but as far as I understand it should apply to all backends equally.

Reworking so change is applicable to all targets.

Herald added subscribers: frasercrmck, kerbowa, luismarques and 23 others. · View Herald TranscriptSep 13 2021, 11:04 AM

In D109388#2992706, @dmgreen wrote:

Does this come up a lot? I would be surprised, I think it only makes a difference for > 64bit loads on AArch64? For 32bit architectures I could see it happening more.

I'm not entirely sure about the frequency, but we found it to occur when compiling Swift code and extracting certain pieces from more complex classes.

paquette added inline comments.Sep 13 2021, 11:29 AM

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
1033	comment explaining why?

Harbormaster completed remote builds in B123706: Diff 372295.Sep 13 2021, 11:51 AM

Adding comment explaining the conditional.

Harbormaster completed remote builds in B124026: Diff 372723.Sep 15 2021, 9:58 AM

In theory, I think this is probably okay.

Without the outliner, how does this impact code size on CTMark?

Herald added a subscriber: luke957. · View Herald TranscriptJan 5 2022, 3:49 PM

There were no changes in the size of CTMark with -O2 or -Os optimizations without the outliner turned on.

No objections from me :)

I think this is good to go then!

This revision is now accepted and ready to land.Jan 10 2022, 10:23 AM

This revision was landed with ongoing or failed builds.Jan 12 2022, 4:04 PM

Closed by commit rG4ff4e7ea3033: [CostModel] Use cost of target trunc type when only it is the only use of a non… (authored by AndrewLitteken). · Explain Why

This revision was automatically updated to reflect the committed changes.

AndrewLitteken added a commit: rG4ff4e7ea3033: [CostModel] Use cost of target trunc type when only it is the only use of a non….

Revision Contents

Path

Size

llvm/

include/

llvm/

Analysis/

TargetTransformInfoImpl.h

15 lines

test/

Analysis/

CostModel/

AArch64/

load-to-trunc.ll

27 lines

AMDGPU/

load-to-trunc.ll

27 lines

ARM/

load-to-trunc.ll

28 lines

PowerPC/

load-to-trunc.ll

26 lines

RISCV/

load-to-trunc.ll

27 lines

SystemZ/

load-to-trunc.ll

27 lines

X86/

load-to-trunc.ll

28 lines

Diff 399505

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

Show First 20 Lines • Show All 1,024 Lines • ▼ Show 20 Lines	InstructionCost getUserCost(const User U, ArrayRef<const Value > Operands,
case Instruction::And:		case Instruction::And:
case Instruction::Or:		case Instruction::Or:
case Instruction::Xor:		case Instruction::Xor:
case Instruction::FNeg: {		case Instruction::FNeg: {
TTI::OperandValueProperties Op1VP = TTI::OP_None;		TTI::OperandValueProperties Op1VP = TTI::OP_None;
TTI::OperandValueProperties Op2VP = TTI::OP_None;		TTI::OperandValueProperties Op2VP = TTI::OP_None;
TTI::OperandValueKind Op1VK =		TTI::OperandValueKind Op1VK =
TTI::getOperandInfo(U->getOperand(0), Op1VP);		TTI::getOperandInfo(U->getOperand(0), Op1VP);
TTI::OperandValueKind Op2VK = Opcode != Instruction::FNeg ?		TTI::OperandValueKind Op2VK = Opcode != Instruction::FNeg ?
		paquetteUnsubmitted Not Done Reply Inline Actions comment explaining why? paquette: comment explaining why?
TTI::getOperandInfo(U->getOperand(1), Op2VP) : TTI::OK_AnyValue;		TTI::getOperandInfo(U->getOperand(1), Op2VP) : TTI::OK_AnyValue;
SmallVector<const Value *, 2> Operands(U->operand_values());		SmallVector<const Value *, 2> Operands(U->operand_values());
return TargetTTI->getArithmeticInstrCost(Opcode, Ty, CostKind,		return TargetTTI->getArithmeticInstrCost(Opcode, Ty, CostKind,
Op1VK, Op2VK,		Op1VK, Op2VK,
Op1VP, Op2VP, Operands, I);		Op1VP, Op2VP, Operands, I);
}		}
case Instruction::IntToPtr:		case Instruction::IntToPtr:
case Instruction::PtrToInt:		case Instruction::PtrToInt:
Show All 14 Lines	case Instruction::Store: {
auto *SI = cast<StoreInst>(U);		auto *SI = cast<StoreInst>(U);
Type *ValTy = U->getOperand(0)->getType();		Type *ValTy = U->getOperand(0)->getType();
return TargetTTI->getMemoryOpCost(Opcode, ValTy, SI->getAlign(),		return TargetTTI->getMemoryOpCost(Opcode, ValTy, SI->getAlign(),
SI->getPointerAddressSpace(),		SI->getPointerAddressSpace(),
CostKind, I);		CostKind, I);
}		}
case Instruction::Load: {		case Instruction::Load: {
auto *LI = cast<LoadInst>(U);		auto *LI = cast<LoadInst>(U);
return TargetTTI->getMemoryOpCost(Opcode, U->getType(), LI->getAlign(),		Type *LoadType = U->getType();
		// If there is a non-register sized type, the cost estimation may expand
		// it to be several instructions to load into multiple registers on the
		// target. But, if the only use of the load is a trunc instruction to a
		// register sized type, the instruction selector can combine these
		// instructions to be a single load. So, in this case, we use the
		// destination type of the trunc instruction rather than the load to
		// accurately estimate the cost of this load instruction.
		if (CostKind == TTI::TCK_CodeSize && LI->hasOneUse() &&
		!LoadType->isVectorTy()) {
		if (const TruncInst TI = dyn_cast<TruncInst>(LI->user_begin()))
		LoadType = TI->getDestTy();
		}
		return TargetTTI->getMemoryOpCost(Opcode, LoadType, LI->getAlign(),
LI->getPointerAddressSpace(),		LI->getPointerAddressSpace(),
CostKind, I);		CostKind, I);
}		}
case Instruction::Select: {		case Instruction::Select: {
const Value Op0, Op1;		const Value Op0, Op1;
if (match(U, m_LogicalAnd(m_Value(Op0), m_Value(Op1))) \|\|		if (match(U, m_LogicalAnd(m_Value(Op0), m_Value(Op1))) \|\|
match(U, m_LogicalOr(m_Value(Op0), m_Value(Op1)))) {		match(U, m_LogicalOr(m_Value(Op0), m_Value(Op1)))) {
// select x, y, false --> x & y		// select x, y, false --> x & y
▲ Show 20 Lines • Show All 159 Lines • Show Last 20 Lines

llvm/test/Analysis/CostModel/AArch64/load-to-trunc.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py
				; Check memory cost model action for a load of an unusually sized integer
				; follow by and a trunc to a register sized integer gives a cost of 1 rather
				; than the expanded cost if it is not.

				; RUN: opt -cost-model -cost-kind=code-size -analyze -mtriple=aarch64--linux-gnu < %s \| FileCheck %s --check-prefix=CHECK

				; Check that cost is 1 for unusual load to register sized load.
				define i32 @loadUnusualIntegerWithTrunc(i128* %ptr) {
				dmgreenUnsubmitted Not Done Reply Inline Actions A i128 would show what you mean to here, without being so large. It is also worth adding a few extra sizes, for things like i64 load truncated to i32/i8 etc. They will be much more common. Also use the update_analysis_test_checks script. dmgreen: A i128 would show what you mean to here, without being so large. It is also worth adding a few…
				; CHECK-LABEL: 'loadUnusualIntegerWithTrunc'
				; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %out = load i128, i128* %ptr, align 4
				samparkerUnsubmitted Not Done Reply Inline Actions Even if the trunc is free, we're still going to pay quite a bit for legalizing the unusual load, if this test is showing the cost of the load then it doesn't make sense. samparker: Even if the trunc is free, we're still going to pay quite a bit for legalizing the unusual load…
				; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %trunc = trunc i128 %out to i32
				; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret i32 %trunc
				;
				%out = load i128, i128* %ptr
				%trunc = trunc i128 %out to i32
				ret i32 %trunc
				}

				samparkerUnsubmitted Not Done Reply Inline Actions This check doesn't look right. samparker: This check doesn't look right.
				define i128 @loadUnusualInteger(i128* %ptr) {
				; CHECK-LABEL: 'loadUnusualInteger'
				; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %out = load i128, i128* %ptr, align 4
				; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret i128 %out
				;
				%out = load i128, i128* %ptr
				ret i128 %out
				}
				AndrewLittekenAuthorUnsubmitted Done Reply Inline Actions Add new line AndrewLitteken: Add new line

llvm/test/Analysis/CostModel/AMDGPU/load-to-trunc.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py
				; Check memory cost model action for a load of an unusually sized integer
				; follow by and a trunc to a register sized integer gives a cost of 1 rather
				; than the expanded cost if it is not.

				; RUN: opt -cost-model -cost-kind=code-size -analyze -mtriple=amdgcn-unknown-amdhsa < %s \| FileCheck %s --check-prefix=CHECK

				; Check that cost is 1 for unusual load to register sized load.
				define i32 @loadUnusualIntegerWithTrunc(i128* %ptr) {
				; CHECK-LABEL: 'loadUnusualIntegerWithTrunc'
				; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %out = load i128, i128* %ptr, align 4
				; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %trunc = trunc i128 %out to i32
				; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret i32 %trunc
				;
				%out = load i128, i128* %ptr
				%trunc = trunc i128 %out to i32
				ret i32 %trunc
				}

				define i128 @loadUnusualInteger(i128* %ptr) {
				; CHECK-LABEL: 'loadUnusualInteger'
				; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %out = load i128, i128* %ptr, align 4
				; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret i128 %out
				;
				%out = load i128, i128* %ptr
				ret i128 %out
				}

llvm/test/Analysis/CostModel/ARM/load-to-trunc.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py
				; Check memory cost model action for a load of an unusually sized integer
				; follow by and a trunc to a register sized integer gives a cost of 1 rather
				; than the expanded cost if it is not. Currently, this target does not have
				; that expansion.

				; RUN: opt -cost-model -cost-kind=code-size -analyze -mtriple=armv8r-none-eabi < %s \| FileCheck %s --check-prefix=CHECK

				; Check that cost is 1 for unusual load to register sized load.
				define i32 @loadUnusualIntegerWithTrunc(i128* %ptr) {
				; CHECK-LABEL: 'loadUnusualIntegerWithTrunc'
				; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %out = load i128, i128* %ptr, align 4
				; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %trunc = trunc i128 %out to i32
				; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret i32 %trunc
				;
				%out = load i128, i128* %ptr
				%trunc = trunc i128 %out to i32
				ret i32 %trunc
				}

				define i128 @loadUnusualInteger(i128* %ptr) {
				; CHECK-LABEL: 'loadUnusualInteger'
				; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %out = load i128, i128* %ptr, align 4
				; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret i128 %out
				;
				%out = load i128, i128* %ptr
				ret i128 %out
				}

llvm/test/Analysis/CostModel/PowerPC/load-to-trunc.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py
				; Check memory cost model action for a load of an unusually sized integer
				; follow by and a trunc to a register sized integer gives a cost of 1 rather
				; than the expanded cost if it is not.
				; RUN: opt -cost-model -cost-kind=code-size -analyze -mtriple=powerpc64-unknown-linux-gnu < %s \| FileCheck %s --check-prefix=CHECK

				; Check that cost is 1 for unusual load to register sized load.
				define i32 @loadUnusualIntegerWithTrunc(i128* %ptr) {
				; CHECK-LABEL: 'loadUnusualIntegerWithTrunc'
				; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %out = load i128, i128* %ptr, align 4
				; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %trunc = trunc i128 %out to i32
				; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret i32 %trunc
				;
				%out = load i128, i128* %ptr
				%trunc = trunc i128 %out to i32
				ret i32 %trunc
				}

				define i128 @loadUnusualInteger(i128* %ptr) {
				; CHECK-LABEL: 'loadUnusualInteger'
				; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %out = load i128, i128* %ptr, align 4
				; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret i128 %out
				;
				%out = load i128, i128* %ptr
				ret i128 %out
				}

llvm/test/Analysis/CostModel/RISCV/load-to-trunc.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py
				; Check memory cost model action for a load of an unusually sized integer
				; follow by and a trunc to a register sized integer gives a cost of 1 rather
				; than the expanded cost if it is not.

				; RUN: opt -cost-model -cost-kind=code-size -analyze -mtriple=riscv64 < %s \| FileCheck %s --check-prefix=CHECK

				; Check that cost is 1 for unusual load to register sized load.
				define i32 @loadUnusualIntegerWithTrunc(i128* %ptr) {
				; CHECK-LABEL: 'loadUnusualIntegerWithTrunc'
				; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %out = load i128, i128* %ptr, align 4
				; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %trunc = trunc i128 %out to i32
				; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret i32 %trunc
				;
				%out = load i128, i128* %ptr
				%trunc = trunc i128 %out to i32
				ret i32 %trunc
				}

				define i128 @loadUnusualInteger(i128* %ptr) {
				; CHECK-LABEL: 'loadUnusualInteger'
				; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %out = load i128, i128* %ptr, align 4
				; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret i128 %out
				;
				%out = load i128, i128* %ptr
				ret i128 %out
				}

llvm/test/Analysis/CostModel/SystemZ/load-to-trunc.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py
				; Check memory cost model action for a load of an unusually sized integer
				; follow by and a trunc to a register sized integer gives a cost of 1 rather
				; than the expanded cost if it is not. This target does not currently perform
				; the expansion in the cost modelling.
				; RUN: opt -cost-model -cost-kind=code-size -analyze -mtriple=systemz-unknown < %s \| FileCheck %s --check-prefix=CHECK

				; Check that cost is 1 for unusual load to register sized load.
				define i32 @loadUnusualIntegerWithTrunc(i128* %ptr) {
				; CHECK-LABEL: 'loadUnusualIntegerWithTrunc'
				; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %out = load i128, i128* %ptr, align 4
				; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %trunc = trunc i128 %out to i32
				; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret i32 %trunc
				;
				%out = load i128, i128* %ptr
				%trunc = trunc i128 %out to i32
				ret i32 %trunc
				}

				define i128 @loadUnusualInteger(i128* %ptr) {
				; CHECK-LABEL: 'loadUnusualInteger'
				; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %out = load i128, i128* %ptr, align 4
				; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret i128 %out
				;
				%out = load i128, i128* %ptr
				ret i128 %out
				}

llvm/test/Analysis/CostModel/X86/load-to-trunc.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py
				; Check memory cost model action for a load of an unusually sized integer
				; follow by and a trunc to a register sized integer gives a cost of 1 rather
				; than the expanded cost. Currently the x86 code size cost model does not use
				; the expanded cost and only assigns a cost of 1 to each load.

				; RUN: opt -cost-model -cost-kind=code-size -analyze -mtriple=x86_64--linux-gnu < %s \| FileCheck %s --check-prefix=CHECK

				; Check that cost is 1 for unusual load to register sized load.
				define i32 @loadUnusualIntegerWithTrunc(i128* %ptr) {
				; CHECK-LABEL: 'loadUnusualIntegerWithTrunc'
				; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %out = load i128, i128* %ptr, align 4
				; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %trunc = trunc i128 %out to i32
				; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret i32 %trunc
				;
				%out = load i128, i128* %ptr
				%trunc = trunc i128 %out to i32
				ret i32 %trunc
				}

				define i128 @loadUnusualInteger(i128* %ptr) {
				; CHECK-LABEL: 'loadUnusualInteger'
				; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %out = load i128, i128* %ptr, align 4
				; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret i128 %out
				;
				%out = load i128, i128* %ptr
				ret i128 %out
				}

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][CostModel] Use cost of target trunc type when only use of a non-register sized loadClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 399505

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

llvm/test/Analysis/CostModel/AArch64/load-to-trunc.ll

llvm/test/Analysis/CostModel/AMDGPU/load-to-trunc.ll

llvm/test/Analysis/CostModel/ARM/load-to-trunc.ll

llvm/test/Analysis/CostModel/PowerPC/load-to-trunc.ll

llvm/test/Analysis/CostModel/RISCV/load-to-trunc.ll

llvm/test/Analysis/CostModel/SystemZ/load-to-trunc.ll

llvm/test/Analysis/CostModel/X86/load-to-trunc.ll

[AArch64][CostModel] Use cost of target trunc type when only use of a non-register sized load
ClosedPublic