This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/lib/CodeGen/
-
lib/
-
CodeGen/
1/2
InterleavedLoadCombinePass.cpp

Differential D110100

[NFCI][CodeGen, AArch64] Fix inconsistent TargetCostKind types.
ClosedPublic

Authored by dfukalov on Sep 20 2021, 12:47 PM.

Download Raw Diff

Details

Reviewers

marels
RKSimon
samparker

Commits

rG1a7b7d7ba232: [NFCI][CodeGen, AArch64] Fix inconsistent TargetCostKind types.

Summary

The pass uses different cost kinds to estimate "old" and "interleaved" costs:
default cost kind for all targets override getInterleavedMemoryOpCost() is
TCK_SizeAndLatency. Although at the moment estimated TCK_Latency costs are
equal to TCK_SizeAndLatency, (so the change is NFC) it may change in future.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

dfukalov created this revision.Sep 20 2021, 12:47 PM

Herald added subscribers: hiraditya, kristof.beyls. · View Herald TranscriptSep 20 2021, 12:47 PM

dfukalov requested review of this revision.Sep 20 2021, 12:47 PM

Herald added a project: Restricted Project. · View Herald TranscriptSep 20 2021, 12:47 PM

Harbormaster completed remote builds in B124743: Diff 373697.Sep 20 2021, 1:27 PM

RKSimon added a reviewer: samparker.Sep 21 2021, 12:55 PM

Looks reasonable in general, thanks!

default cost kind for all targets override getInterleavedMemoryOpCost() is TCK_SizeAndLatency

Interesting, it looks like the top-level define in TTI defaults to TTI::TCK_RecipThroughput, so should this trump the default in the target-specific defines?

https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/Analysis/TargetTransformInfo.h#L1155

Do you think we should stop using default cost kinds entirely?

Do you think we should stop using default cost kinds entirely?

I think this would be for the best, and I thought I already changed most of the calls to be explicit.

In D110100#3014729, @samparker wrote:

Do you think we should stop using default cost kinds entirely?

I think this would be for the best, and I thought I already changed most of the calls to be explicit.

@dfukalov Is this something you'd be willing to do in this patch? At least for getInstructionCost + getInterleavedMemoryOpCost (TTI + Impl)?

In D110100#3014761, @RKSimon wrote:

In D110100#3014729, @samparker wrote:

Do you think we should stop using default cost kinds entirely?

I think this would be for the best, and I thought I already changed most of the calls to be explicit.

@dfukalov Is this something you'd be willing to do in this patch? At least for getInstructionCost + getInterleavedMemoryOpCost (TTI + Impl)?

I guess suggested changes should be in different patch, I just checked who uses pure latency cost kind our days and found this inconsistency.
Perhaps the fix may be "less invasive" with using TCK_Latency in both places, but they seems equal at the moment and I'm not sure the author meant exactly pure latency here...

RKSimon mentioned this in D110242: [Target][CodeGen] Remove default CostKind arguments on inner/impl TTI overrides.Sep 22 2021, 6:17 AM

RKSimon mentioned this in rGb1f38a27f0c9: [Target][CodeGen] Remove default CostKind arguments on inner/impl TTI overrides.Sep 22 2021, 7:35 AM

LGTM with one (optional) minor

llvm/lib/CodeGen/InterleavedLoadCombinePass.cpp
1133–1134	Would it make sense to add a common CostKind variable here and update all the getCost calls to use that? We already do something similar in SLP. TTI::TargetCostKind CostKind = TTI::TCK_SizeAndLatency;

This revision is now accepted and ready to land.Sep 22 2021, 8:12 AM

This revision was landed with ongoing or failed builds.Sep 22 2021, 10:16 AM

Closed by commit rG1a7b7d7ba232: [NFCI][CodeGen, AArch64] Fix inconsistent TargetCostKind types. (authored by dfukalov). · Explain Why

This revision was automatically updated to reflect the committed changes.

dfukalov added a commit: rG1a7b7d7ba232: [NFCI][CodeGen, AArch64] Fix inconsistent TargetCostKind types..

dfukalov added inline comments.Sep 22 2021, 10:17 AM

llvm/lib/CodeGen/InterleavedLoadCombinePass.cpp
1133–1134	Good point, thanks!

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

InterleavedLoadCombinePass.cpp

6 lines

Diff 374282

llvm/lib/CodeGen/InterleavedLoadCombinePass.cpp

Show First 20 Lines • Show All 1,124 Lines • ▼ Show 20 Lines	bool InterleavedLoadCombineImpl::combine(std::list<VectorInfo> &InterleavedLoad,
if (!InsertionPoint)		if (!InsertionPoint)
return false;		return false;

std::set<LoadInst *> LIs;		std::set<LoadInst *> LIs;
std::set<Instruction *> Is;		std::set<Instruction *> Is;
std::set<Instruction *> SVIs;		std::set<Instruction *> SVIs;

InstructionCost InterleavedCost;		InstructionCost InterleavedCost;
InstructionCost InstructionCost = 0;		InstructionCost InstructionCost = 0;
		const TTI::TargetCostKind CostKind = TTI::TCK_SizeAndLatency;
		RKSimonUnsubmitted Not Done Reply Inline Actions Would it make sense to add a common CostKind variable here and update all the getCost calls to use that? We already do something similar in SLP. TTI::TargetCostKind CostKind = TTI::TCK_SizeAndLatency; RKSimon: Would it make sense to add a common CostKind variable here and update all the getCost calls to…
		dfukalovAuthorUnsubmitted Done Reply Inline Actions Good point, thanks! dfukalov: Good point, thanks!

// Get the interleave factor		// Get the interleave factor
unsigned Factor = InterleavedLoad.size();		unsigned Factor = InterleavedLoad.size();

// Merge all input sets used in analysis		// Merge all input sets used in analysis
for (auto &VI : InterleavedLoad) {		for (auto &VI : InterleavedLoad) {
// Generate a set of all load instructions to be combined		// Generate a set of all load instructions to be combined
LIs.insert(VI.LIs.begin(), VI.LIs.end());		LIs.insert(VI.LIs.begin(), VI.LIs.end());
Show All 11 Lines	bool InterleavedLoadCombineImpl::combine(std::list<VectorInfo> &InterleavedLoad,
if (LIs.size() < 2)		if (LIs.size() < 2)
return false;		return false;

// Test if all participating instruction will be dead after the		// Test if all participating instruction will be dead after the
// transformation. If intermediate results are used, no performance gain can		// transformation. If intermediate results are used, no performance gain can
// be expected. Also sum the cost of the Instructions beeing left dead.		// be expected. Also sum the cost of the Instructions beeing left dead.
for (auto &I : Is) {		for (auto &I : Is) {
// Compute the old cost		// Compute the old cost
InstructionCost +=		InstructionCost += TTI.getInstructionCost(I, CostKind);
TTI.getInstructionCost(I, TargetTransformInfo::TCK_Latency);

// The final SVIs are allowed not to be dead, all uses will be replaced		// The final SVIs are allowed not to be dead, all uses will be replaced
if (SVIs.find(I) != SVIs.end())		if (SVIs.find(I) != SVIs.end())
continue;		continue;

// If there are users outside the set to be eliminated, we abort the		// If there are users outside the set to be eliminated, we abort the
// transformation. No gain can be expected.		// transformation. No gain can be expected.
for (auto *U : I->users()) {		for (auto *U : I->users()) {
Show All 36 Lines	unsigned ElementsPerSVI =
->getNumElements();		->getNumElements();
FixedVectorType ILTy = FixedVectorType::get(ETy, Factor ElementsPerSVI);		FixedVectorType ILTy = FixedVectorType::get(ETy, Factor ElementsPerSVI);

SmallVector<unsigned, 4> Indices;		SmallVector<unsigned, 4> Indices;
for (unsigned i = 0; i < Factor; i++)		for (unsigned i = 0; i < Factor; i++)
Indices.push_back(i);		Indices.push_back(i);
InterleavedCost = TTI.getInterleavedMemoryOpCost(		InterleavedCost = TTI.getInterleavedMemoryOpCost(
Instruction::Load, ILTy, Factor, Indices, InsertionPoint->getAlign(),		Instruction::Load, ILTy, Factor, Indices, InsertionPoint->getAlign(),
InsertionPoint->getPointerAddressSpace());		InsertionPoint->getPointerAddressSpace(), CostKind);

if (InterleavedCost >= InstructionCost) {		if (InterleavedCost >= InstructionCost) {
return false;		return false;
}		}

// Create a pointer cast for the wide load.		// Create a pointer cast for the wide load.
auto CI = Builder.CreatePointerCast(InsertionPoint->getOperand(0),		auto CI = Builder.CreatePointerCast(InsertionPoint->getOperand(0),
ILTy->getPointerTo(),		ILTy->getPointerTo(),
▲ Show 20 Lines • Show All 143 Lines • Show Last 20 Lines