This is an archive of the discontinued LLVM Phabricator instance.

[SLP] Fix cost model w.r.t. operand properties
ClosedPublic

Authored by reames on Aug 24 2022, 8:12 AM.

Download Raw Diff

Details

Reviewers

ABataev
RKSimon
craig.topper

Commits

rG42ef5720493e: [SLP] Fix cost model w.r.t. operand properties

Summary

We allow the target to report different costs depending on properties of the operands; given this, we have to make sure we pass the right set of operands and account for the fact that different scalar instructions can have operands with different properties.

As a motivating example, consider a set of multiplies which each multiply by a constant (but not all the same constant). Most of the constants are power of two (but not all).

If the target doesn't have support for non-uniform constant immediates, this will likely require constant materialization and a non-uniform multiply. However, depending on the balance of target costs for constant scalar multiplies vs a single vector multiply, this might or might not be a profitable vectorization.

This ends up basically being a rewrite of the existing code. Normally, I'd scope the change more narrowly, but I kept noticing things which seemed highly suspicious, and none of the existing code appears to have any test coverage at all. I think this is a case where simply throwing out the existing code and starting from scratch is reasonable.

This is a follow on to Alexey's D126885, but also handles the arithmetic instruction case since the existing code appears to have the same problem.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

reames created this revision.Aug 24 2022, 8:12 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 24 2022, 8:12 AM

Herald added subscribers: vporpo, StephenFan, bollu and 2 others. · View Herald Transcript

reames requested review of this revision.Aug 24 2022, 8:12 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 24 2022, 8:12 AM

RKSimon added inline comments.Aug 24 2022, 8:17 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6569	Don't use auto (even though it was used before.....)

reames added inline comments.Aug 24 2022, 8:33 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6569	Why? This is certainly allowed by LLVM coding standards.

Harbormaster completed remote builds in B183120: Diff 455222.Aug 24 2022, 9:26 AM

ping

Herald added a subscriber: • pcwang-thead. · View Herald TranscriptSep 6 2022, 8:28 AM

ping x 2

Check D115757, it supports same kind for all operations.

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6569	Allowed if it is easy to determine the type, better to use the actual type where possible.

Defer to reviewer style preference.

Harbormaster completed remote builds in B188053: Diff 462010.Sep 21 2022, 2:43 PM

LGTM but naturally test coverage would be a plus. Have you had any luck with your multiply by (pow2 and non-pow2) constant example?

This revision was not accepted when it landed; it landed in state Needs Review.Sep 23 2022, 8:40 AM

This revision was landed with ongoing or failed builds.

Closed by commit rG42ef5720493e: [SLP] Fix cost model w.r.t. operand properties (authored by reames). · Explain Why

This revision was automatically updated to reflect the committed changes.

reames added a commit: rG42ef5720493e: [SLP] Fix cost model w.r.t. operand properties.

In D132566#3808619, @RKSimon wrote:

LGTM but naturally test coverage would be a plus. Have you had any luck with your multiply by (pow2 and non-pow2) constant example?

JFYI, I'm not ignoring the request for test coverage. I tried multiple times to come up with a viable test for this, and each time, I'd stumble across an unrelated issue first. There's so many interacting problems here that finding the right combination of input to trip this difference is effectively impossible. I'm going to work through a couple of the other issues, and hopefully as I do, I can start building a reasonable test corpus.

In D132566#3811911, @reames wrote:

In D132566#3808619, @RKSimon wrote:

LGTM but naturally test coverage would be a plus. Have you had any luck with your multiply by (pow2 and non-pow2) constant example?

JFYI, I'm not ignoring the request for test coverage. I tried multiple times to come up with a viable test for this, and each time, I'd stumble across an unrelated issue first. There's so many interacting problems here that finding the right combination of input to trip this difference is effectively impossible. I'm going to work through a couple of the other issues, and hopefully as I do, I can start building a reasonable test corpus.

I feel your pain! Getting everything to work as you'd expect with costs is never straightforward...

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

SLPVectorizer.cpp

52 lines

Diff 462507

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,544 Lines • ▼ Show 20 Lines	switch (ShuffleOrOp) {
case Instruction::SRem:		case Instruction::SRem:
case Instruction::FRem:		case Instruction::FRem:
case Instruction::Shl:		case Instruction::Shl:
case Instruction::LShr:		case Instruction::LShr:
case Instruction::AShr:		case Instruction::AShr:
case Instruction::And:		case Instruction::And:
case Instruction::Or:		case Instruction::Or:
case Instruction::Xor: {		case Instruction::Xor: {
TTI::OperandValueInfo Op1Info = {TTI::OK_AnyValue, TTI::OP_None};

// Certain instructions can be cheaper to vectorize if they have a
// constant second vector operand.
const unsigned OpIdx = isa<BinaryOperator>(VL0) ? 1 : 0;		const unsigned OpIdx = isa<BinaryOperator>(VL0) ? 1 : 0;
auto Op2Info = getOperandInfo(VL, OpIdx);

SmallVector<const Value *, 4> Operands(VL0->operand_values());		InstructionCost ScalarCost = 0;
InstructionCost ScalarEltCost =		for (auto *V : VL) {
		auto *VI = cast<Instruction>(V);
		TTI::OperandValueInfo Op1Info = TTI::getOperandInfo(VI->getOperand(0));
		TTI::OperandValueInfo Op2Info = TTI::getOperandInfo(VI->getOperand(OpIdx));
		SmallVector<const Value *, 4> Operands(VI->operand_values());
		ScalarCost +=
TTI->getArithmeticInstrCost(E->getOpcode(), ScalarTy, CostKind,		TTI->getArithmeticInstrCost(E->getOpcode(), ScalarTy, CostKind,
Op1Info, Op2Info,		Op1Info, Op2Info, Operands, VI);
Operands, VL0);
if (NeedToShuffleReuses) {
CommonCost -= (EntryVF - VL.size()) * ScalarEltCost;
}		}
InstructionCost ScalarCost = VecTy->getNumElements() * ScalarEltCost;		if (NeedToShuffleReuses) {
for (unsigned I = 0, Num = VL0->getNumOperands(); I < Num; ++I) {		CommonCost -= (EntryVF - VL.size()) * ScalarCost/VL.size();
if (all_of(VL, [I](Value *V) {
return isConstant(cast<Instruction>(V)->getOperand(I));
}))
Operands[I] = ConstantVector::getNullValue(VecTy);
}		}
		TTI::OperandValueInfo Op1Info = getOperandInfo(VL, 0);
		TTI::OperandValueInfo Op2Info = getOperandInfo(VL, OpIdx);
		RKSimonUnsubmitted Not Done Reply Inline Actions Don't use auto (even though it was used before.....) RKSimon: Don't use auto (even though it was used before.....)
		reamesAuthorUnsubmitted Done Reply Inline Actions Why? This is certainly allowed by LLVM coding standards. reames: Why? This is certainly allowed by LLVM coding standards.
		ABataevUnsubmitted Not Done Reply Inline Actions Allowed if it is easy to determine the type, better to use the actual type where possible. ABataev: Allowed if it is easy to determine the type, better to use the actual type where possible.
InstructionCost VecCost =		InstructionCost VecCost =
TTI->getArithmeticInstrCost(E->getOpcode(), VecTy, CostKind,		TTI->getArithmeticInstrCost(E->getOpcode(), VecTy, CostKind,
Op1Info, Op2Info,		Op1Info, Op2Info);
Operands, VL0);
LLVM_DEBUG(dumpTreeCosts(E, CommonCost, VecCost, ScalarCost));		LLVM_DEBUG(dumpTreeCosts(E, CommonCost, VecCost, ScalarCost));
return CommonCost + VecCost - ScalarCost;		return CommonCost + VecCost - ScalarCost;
}		}
case Instruction::GetElementPtr: {		case Instruction::GetElementPtr: {
TargetTransformInfo::OperandValueKind Op1VK =		TargetTransformInfo::OperandValueKind Op1VK =
TargetTransformInfo::OK_AnyValue;		TargetTransformInfo::OK_AnyValue;
TargetTransformInfo::OperandValueKind Op2VK =		TargetTransformInfo::OperandValueKind Op2VK =
any_of(VL,		any_of(VL,
▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	case Instruction::Load: {
return CommonCost + VecLdCost - ScalarLdCost;		return CommonCost + VecLdCost - ScalarLdCost;
}		}
case Instruction::Store: {		case Instruction::Store: {
// We know that we can merge the stores. Calculate the cost.		// We know that we can merge the stores. Calculate the cost.
bool IsReorder = !E->ReorderIndices.empty();		bool IsReorder = !E->ReorderIndices.empty();
auto *SI =		auto *SI =
cast<StoreInst>(IsReorder ? VL[E->ReorderIndices.front()] : VL0);		cast<StoreInst>(IsReorder ? VL[E->ReorderIndices.front()] : VL0);
Align Alignment = SI->getAlign();		Align Alignment = SI->getAlign();
		InstructionCost ScalarStCost = 0;
		for (auto *V : VL) {
		auto *VI = cast<Instruction>(V);
		TTI::OperandValueInfo OpInfo = TTI::getOperandInfo(VI->getOperand(0));
		ScalarStCost +=
		TTI->getMemoryOpCost(Instruction::Store, ScalarTy, Alignment, 0,
		CostKind, OpInfo, VI);
		}
TTI::OperandValueInfo OpInfo = getOperandInfo(VL, 0);		TTI::OperandValueInfo OpInfo = getOperandInfo(VL, 0);
InstructionCost ScalarEltCost = TTI->getMemoryOpCost(		InstructionCost VecStCost =
Instruction::Store, ScalarTy, Alignment, 0, CostKind, OpInfo, VL0);		TTI->getMemoryOpCost(Instruction::Store, VecTy, Alignment, 0, CostKind,
InstructionCost ScalarStCost = VecTy->getNumElements() * ScalarEltCost;		OpInfo);
TTI::OperandValueKind OpVK = TTI::OK_AnyValue;
if (OpInfo.isConstant())
OpVK = TTI::OK_NonUniformConstantValue;
InstructionCost VecStCost = TTI->getMemoryOpCost(
Instruction::Store, VecTy, Alignment, 0, CostKind,
{OpVK, TTI::OP_None}, VL0);
LLVM_DEBUG(dumpTreeCosts(E, CommonCost, VecStCost, ScalarStCost));		LLVM_DEBUG(dumpTreeCosts(E, CommonCost, VecStCost, ScalarStCost));
return CommonCost + VecStCost - ScalarStCost;		return CommonCost + VecStCost - ScalarStCost;
}		}
case Instruction::Call: {		case Instruction::Call: {
CallInst *CI = cast<CallInst>(VL0);		CallInst *CI = cast<CallInst>(VL0);
Intrinsic::ID ID = getVectorIntrinsicIDForCall(CI, TLI);		Intrinsic::ID ID = getVectorIntrinsicIDForCall(CI, TLI);

// Calculate the cost of the scalar and vector calls.		// Calculate the cost of the scalar and vector calls.
▲ Show 20 Lines • Show All 6,056 Lines • Show Last 20 Lines