This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] PHI node cost should not be counted for the size and latency.
ClosedPublic

Authored by alex-t on Jun 29 2021, 5:27 AM.

Download Raw Diff

Details

Reviewers

rampitec
dfukalov

Commits

rGe585b332e423: [AMDGPU] PHI node cost should not be counted for the size and latency.

Summary

Details: https://reviews.llvm.org/D96805 changed the GCNTTIImpl::getCFInstrCost to return 1 for the PHI nodes
for the TTI::TCK_CodeSize and TTI::TCK_SizeAndLatency. This is incorrect because the value moves that are the
result of the PHI lowering are inserted into the basic block predecessors - not into the block itself.
As a result of this change LoopRotate and LoopUnroll were broken because of the incorrect Loop header and loop
body size/cost estimation.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

alex-t created this revision.Jun 29 2021, 5:27 AM

Herald added subscribers: foad, kerbowa, hiraditya and 8 others. · View Herald TranscriptJun 29 2021, 5:27 AM

alex-t requested review of this revision.Jun 29 2021, 5:27 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 29 2021, 5:27 AM

Herald added a subscriber: wdng. · View Herald Transcript

Harbormaster completed remote builds in B111493: Diff 355186.Jun 29 2021, 6:23 AM

dfukalov added inline comments.Jun 29 2021, 7:26 AM

llvm/test/Analysis/CostModel/AMDGPU/control-flow.ll
8	Please update the sizes reported after your change instead of just removing the test lines here and below.

Test corrected

rampitec accepted this revision.Jun 29 2021, 1:41 PM

This revision is now accepted and ready to land.Jun 29 2021, 1:41 PM

Harbormaster completed remote builds in B111606: Diff 355345.Jun 29 2021, 1:52 PM

dfukalov added inline comments.Jun 29 2021, 3:51 PM

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
841	Nit: leave this or similar related todo comments somewhere, it wasn't done.

alex-t added inline comments.Jun 30 2021, 5:11 AM

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
841	Such a prediction is unlikely possible. The number of copies that survived after the register coalescing too much dependent on the passes that run in between.

xgupta mentioned this in D105186: [AMDGPU] PHI node cost should not be counted for the size and latency..Jun 30 2021, 6:09 AM

This revision was landed with ongoing or failed builds.Jun 30 2021, 6:11 AM

Closed by commit rGe585b332e423: [AMDGPU] PHI node cost should not be counted for the size and latency. (authored by alex-t). · Explain Why

This revision was automatically updated to reflect the committed changes.

alex-t added a commit: rGe585b332e423: [AMDGPU] PHI node cost should not be counted for the size and latency..

Harbormaster completed remote builds in B111734: Diff 355522.Jun 30 2021, 6:51 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

AMDGPUTargetTransformInfo.cpp

4 lines

test/

Analysis/

CostModel/

AMDGPU/

control-flow.ll

2 lines

Diff 355522

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp

//===- AMDGPUTargetTransformInfo.cpp - AMDGPU specific TTI pass -----------===//		//===- AMDGPUTargetTransformInfo.cpp - AMDGPU specific TTI pass -----------===//
		Lint: Lint Inline Actions clang-format not found in user’s local PATH; not linting file. Lint: Lint: clang-format not found in user’s local PATH; not linting file.
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
▲ Show 20 Lines • Show All 823 Lines • ▼ Show 20 Lines	InstructionCost GCNTTIImpl::getCFInstrCost(unsigned Opcode,
case Instruction::Switch: {		case Instruction::Switch: {
auto SI = dyn_cast_or_null<SwitchInst>(I);		auto SI = dyn_cast_or_null<SwitchInst>(I);
// Each case (including default) takes 1 cmp + 1 cbr instructions in		// Each case (including default) takes 1 cmp + 1 cbr instructions in
// average.		// average.
return (SI ? (SI->getNumCases() + 1) : 4) * (CBrCost + 1);		return (SI ? (SI->getNumCases() + 1) : 4) * (CBrCost + 1);
}		}
case Instruction::Ret:		case Instruction::Ret:
return SCost ? 1 : 10;		return SCost ? 1 : 10;
case Instruction::PHI:
// TODO: 1. A prediction phi won't be eliminated?
dfukalovUnsubmitted Not Done Reply Inline Actions Nit: leave this or similar related todo comments somewhere, it wasn't done. dfukalov: Nit: leave this or similar related todo comments somewhere, it wasn't done.
alex-tAuthorUnsubmitted Done Reply Inline Actions Such a prediction is unlikely possible. The number of copies that survived after the register coalescing too much dependent on the passes that run in between. alex-t: Such a prediction is unlikely possible. The number of copies that survived after the register…
// 2. Estimate data copy instructions in this case.
return 1;
}		}
return BaseT::getCFInstrCost(Opcode, CostKind, I);		return BaseT::getCFInstrCost(Opcode, CostKind, I);
}		}

InstructionCost		InstructionCost
GCNTTIImpl::getArithmeticReductionCost(unsigned Opcode, VectorType *Ty,		GCNTTIImpl::getArithmeticReductionCost(unsigned Opcode, VectorType *Ty,
bool IsPairwise,		bool IsPairwise,
TTI::TargetCostKind CostKind) {		TTI::TargetCostKind CostKind) {
▲ Show 20 Lines • Show All 515 Lines • Show Last 20 Lines

llvm/test/Analysis/CostModel/AMDGPU/control-flow.ll

	; RUN: opt -cost-model -analyze -mtriple=amdgcn-unknown-amdhsa < %s \| FileCheck --check-prefixes=ALL,SPEED %s			; RUN: opt -cost-model -analyze -mtriple=amdgcn-unknown-amdhsa < %s \| FileCheck --check-prefixes=ALL,SPEED %s
	; RUN: opt -cost-model -cost-kind=code-size -analyze -mtriple=amdgcn-unknown-amdhsa < %s \| FileCheck --check-prefixes=ALL,SIZE %s			; RUN: opt -cost-model -cost-kind=code-size -analyze -mtriple=amdgcn-unknown-amdhsa < %s \| FileCheck --check-prefixes=ALL,SIZE %s

	; ALL-LABEL: 'test_br_cost'			; ALL-LABEL: 'test_br_cost'
	; SPEED: estimated cost of 7 for instruction: br i1			; SPEED: estimated cost of 7 for instruction: br i1
	; SPEED: estimated cost of 4 for instruction: br label			; SPEED: estimated cost of 4 for instruction: br label
	; SPEED: estimated cost of 1 for instruction: %phi = phi i32 [			; SPEED: estimated cost of 1 for instruction: %phi = phi i32 [
	; SPEED: estimated cost of 10 for instruction: ret void			; SPEED: estimated cost of 10 for instruction: ret void
				dfukalovUnsubmitted Not Done Reply Inline Actions Please update the sizes reported after your change instead of just removing the test lines here and below. dfukalov: Please update the sizes reported after your change instead of just removing the test lines here…
	; SIZE: estimated cost of 5 for instruction: br i1			; SIZE: estimated cost of 5 for instruction: br i1
	; SIZE: estimated cost of 1 for instruction: br label			; SIZE: estimated cost of 1 for instruction: br label
	; SIZE: estimated cost of 1 for instruction: %phi = phi i32 [			; SIZE: estimated cost of 0 for instruction: %phi = phi i32 [
	; SIZE: estimated cost of 1 for instruction: ret void			; SIZE: estimated cost of 1 for instruction: ret void
	define amdgpu_kernel void @test_br_cost(i32 addrspace(1)* %out, i32 addrspace(1)* %vaddr, i32 %b) #0 {			define amdgpu_kernel void @test_br_cost(i32 addrspace(1)* %out, i32 addrspace(1)* %vaddr, i32 %b) #0 {
	bb0:			bb0:
	br i1 undef, label %bb1, label %bb2			br i1 undef, label %bb1, label %bb2

	bb1:			bb1:
	%vec = load i32, i32 addrspace(1)* %vaddr			%vec = load i32, i32 addrspace(1)* %vaddr
	%add = add i32 %vec, %b			%add = add i32 %vec, %b
	Show All 33 Lines