This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] PHI node cost should not be counted for the size and latency.
AbandonedPublic

Authored by alex-t on Jun 30 2021, 5:51 AM.

Download Raw Diff

Details

Reviewers: None

Summary

  Details: https://reviews.llvm.org/D96805 changed the GCNTTIImpl::getCFInstrCost to return 1 for the PHI nodes
  for the TTI::TCK_CodeSize and TTI::TCK_SizeAndLatency. This is incorrect because the value moves that are the
  result of the PHI lowering are inserted into the basic block predecessors - not into the block itself.
  As a result of this change LoopRotate and LoopUnroll were broken because of the incorrect Loop header and loop
  body size/cost estimation.

  Fixes SWDEV-289429 10-11% Performance drop observed with ROC_OCL_Perf_Linpack_DGEMM_W32

Differential Revision: https://reviews.llvm.org/D105104

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

alex-t created this revision.Jun 30 2021, 5:51 AM

Herald added subscribers: foad, kerbowa, hiraditya and 8 others. · View Herald TranscriptJun 30 2021, 5:51 AM

alex-t requested review of this revision.Jun 30 2021, 5:51 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 30 2021, 5:51 AM

Herald added subscribers: llvm-commits, wdng. · View Herald Transcript

Seems it is a duplicate of D105104?
Instead of committing that patch, you create a new revision :)

alex-t abandoned this revision.Jun 30 2021, 6:10 AM

Harbormaster completed remote builds in B111731: Diff 355517.Jun 30 2021, 6:31 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

AMDGPUTargetTransformInfo.cpp

4 lines

test/

Analysis/

CostModel/

AMDGPU/

control-flow.ll

2 lines

Diff 355517

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp

//===- AMDGPUTargetTransformInfo.cpp - AMDGPU specific TTI pass -----------===//		//===- AMDGPUTargetTransformInfo.cpp - AMDGPU specific TTI pass -----------===//
		Lint: Lint Inline Actions clang-format not found in user’s local PATH; not linting file. Lint: Lint: clang-format not found in user’s local PATH; not linting file.
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
▲ Show 20 Lines • Show All 823 Lines • ▼ Show 20 Lines	InstructionCost GCNTTIImpl::getCFInstrCost(unsigned Opcode,
case Instruction::Switch: {		case Instruction::Switch: {
auto SI = dyn_cast_or_null<SwitchInst>(I);		auto SI = dyn_cast_or_null<SwitchInst>(I);
// Each case (including default) takes 1 cmp + 1 cbr instructions in		// Each case (including default) takes 1 cmp + 1 cbr instructions in
// average.		// average.
return (SI ? (SI->getNumCases() + 1) : 4) * (CBrCost + 1);		return (SI ? (SI->getNumCases() + 1) : 4) * (CBrCost + 1);
}		}
case Instruction::Ret:		case Instruction::Ret:
return SCost ? 1 : 10;		return SCost ? 1 : 10;
case Instruction::PHI:
// TODO: 1. A prediction phi won't be eliminated?
// 2. Estimate data copy instructions in this case.
return 1;
}		}
return BaseT::getCFInstrCost(Opcode, CostKind, I);		return BaseT::getCFInstrCost(Opcode, CostKind, I);
}		}

InstructionCost		InstructionCost
GCNTTIImpl::getArithmeticReductionCost(unsigned Opcode, VectorType *Ty,		GCNTTIImpl::getArithmeticReductionCost(unsigned Opcode, VectorType *Ty,
bool IsPairwise,		bool IsPairwise,
TTI::TargetCostKind CostKind) {		TTI::TargetCostKind CostKind) {
▲ Show 20 Lines • Show All 515 Lines • Show Last 20 Lines

llvm/test/Analysis/CostModel/AMDGPU/control-flow.ll

	; RUN: opt -cost-model -analyze -mtriple=amdgcn-unknown-amdhsa < %s \| FileCheck --check-prefixes=ALL,SPEED %s			; RUN: opt -cost-model -analyze -mtriple=amdgcn-unknown-amdhsa < %s \| FileCheck --check-prefixes=ALL,SPEED %s
	; RUN: opt -cost-model -cost-kind=code-size -analyze -mtriple=amdgcn-unknown-amdhsa < %s \| FileCheck --check-prefixes=ALL,SIZE %s			; RUN: opt -cost-model -cost-kind=code-size -analyze -mtriple=amdgcn-unknown-amdhsa < %s \| FileCheck --check-prefixes=ALL,SIZE %s

	; ALL-LABEL: 'test_br_cost'			; ALL-LABEL: 'test_br_cost'
	; SPEED: estimated cost of 7 for instruction: br i1			; SPEED: estimated cost of 7 for instruction: br i1
	; SPEED: estimated cost of 4 for instruction: br label			; SPEED: estimated cost of 4 for instruction: br label
	; SPEED: estimated cost of 1 for instruction: %phi = phi i32 [			; SPEED: estimated cost of 1 for instruction: %phi = phi i32 [
	; SPEED: estimated cost of 10 for instruction: ret void			; SPEED: estimated cost of 10 for instruction: ret void
	; SIZE: estimated cost of 5 for instruction: br i1			; SIZE: estimated cost of 5 for instruction: br i1
	; SIZE: estimated cost of 1 for instruction: br label			; SIZE: estimated cost of 1 for instruction: br label
	; SIZE: estimated cost of 1 for instruction: %phi = phi i32 [			; SIZE: estimated cost of 0 for instruction: %phi = phi i32 [
	; SIZE: estimated cost of 1 for instruction: ret void			; SIZE: estimated cost of 1 for instruction: ret void
	define amdgpu_kernel void @test_br_cost(i32 addrspace(1)* %out, i32 addrspace(1)* %vaddr, i32 %b) #0 {			define amdgpu_kernel void @test_br_cost(i32 addrspace(1)* %out, i32 addrspace(1)* %vaddr, i32 %b) #0 {
	bb0:			bb0:
	br i1 undef, label %bb1, label %bb2			br i1 undef, label %bb1, label %bb2

	bb1:			bb1:
	%vec = load i32, i32 addrspace(1)* %vaddr			%vec = load i32, i32 addrspace(1)* %vaddr
	%add = add i32 %vec, %b			%add = add i32 %vec, %b
	Show All 33 Lines