This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
1/2
LoopVectorize.cpp
-
test/Transforms/LoopVectorize/AArch64/
-
Transforms/
-
LoopVectorize/
-
AArch64/
-
no_vector_instructions.ll

Differential D33457

[LV] Update type in cost model for scalarization
ClosedPublic

Authored by mssimpso on May 23 2017, 12:30 PM.

Download Raw Diff

Details

Reviewers

delena
mkuper
Ayal

Summary

For non-uniform instructions marked for scalarization, we should update VectorTy when computing instruction costs to reflect the scalar type. In addition to determining instruction costs, this type is also used to signal that all instructions in the loop will be scalarized. This currently affects memory instructions and non-pointer induction variables and their updates. (We also mark GEPs scalar after vectorization, but their cost is computed together with memory instructions.) For scalarized induction updates, this patch also scales the scalar cost by the vectorization factor, corresponding to each induction step.

Diff Detail

Build Status

Buildable 6706
Build 6706: arc lint + arc unit

Event Timeline

mssimpso created this revision.May 23 2017, 12:30 PM

Herald added subscribers: javed.absar, mzolotukhin. · View Herald TranscriptMay 23 2017, 12:30 PM

Ayal added inline comments.May 23 2017, 2:18 PM

lib/Transforms/Vectorize/LoopVectorize.cpp
7160	isVectorTy() implies !isVoidTy(), so checking the latter becomes redundant. Does the VF > 1 check also become redundant?

mssimpso added inline comments.May 23 2017, 2:36 PM

lib/Transforms/Vectorize/LoopVectorize.cpp
7160	Right, I thought there may be some redundancy here. So we should replace "!VectorTy->isVoidTy()" with "VectorTy->isVectorTy()". Checking that the type is actually a vector is what we really want. I'm not quite sure about "VF > 1". We only care about scalarization if we are vectorizing. I think it might be needed for the "TTI.getNumberOfParts(VectorTy) < VF" check, which wouldn't make sense if VF == 1.

Addressed Ayal's comment.

Removed !isVoidTy() check since this is implied by isVectorTy().

Curious if you've found benchmarks that are affected by this change?

This revision is now accepted and ready to land.May 23 2017, 3:12 PM

Thanks, Ayal! The change is mostly noise on the hardware (AArch64 Kryo/Falkor) and benchmarks (spec, test-suite) I tested.

For some context, I had been experimenting with a patch that changes the cost model's "tie-break" rule when I noticed the issue. Right now we select the smallest VF with the best score (so scalar wins out if there's no benefit from vectorization). But I was testing a patch that changed this. If any VF > 1 was better than VF == 1, it selected the largest VF having the best score. So we would prefer wider vectors in a tie if any vectorization was better than scalar.

For some loops, this interacted badly with the MaxVF determination based on types. Currently, if we know a load or store will be scalarized, we don't consider it when collecting type sizes, which determine the MaxVF. So with the above experimental patch, I was seeing loops where nothing was vectorized being effectively unrolled by the MaxVF, because VectorTy wasn't being set properly.

Sorry for the long explanation!

Committed in rL303763.

Revision Contents

Path

Size

lib/

Transforms/

Vectorize/

LoopVectorize.cpp

21 lines

test/

Transforms/

LoopVectorize/

AArch64/

no_vector_instructions.ll

26 lines

Diff 100000

lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 7,149 Lines • ▼ Show 20 Lines	LoopVectorizationCostModel::getInstructionCost(Instruction *I, unsigned VF) {

if (VF > 1 && isProfitableToScalarize(I, VF))		if (VF > 1 && isProfitableToScalarize(I, VF))
return VectorizationCostTy(InstsToScalarize[VF][I], false);		return VectorizationCostTy(InstsToScalarize[VF][I], false);

Type *VectorTy;		Type *VectorTy;
unsigned C = getInstructionCost(I, VF, VectorTy);		unsigned C = getInstructionCost(I, VF, VectorTy);

bool TypeNotScalarized =		bool TypeNotScalarized =
VF > 1 && !VectorTy->isVoidTy() && TTI.getNumberOfParts(VectorTy) < VF;		VF > 1 && VectorTy->isVectorTy() && TTI.getNumberOfParts(VectorTy) < VF;
return VectorizationCostTy(C, TypeNotScalarized);		return VectorizationCostTy(C, TypeNotScalarized);
}		}
		AyalUnsubmitted Done Reply Inline Actions isVectorTy() implies !isVoidTy(), so checking the latter becomes redundant. Does the VF > 1 check also become redundant? Ayal: isVectorTy() implies !isVoidTy(), so checking the latter becomes redundant. Does the VF > 1…
		mssimpsoAuthorUnsubmitted Not Done Reply Inline Actions Right, I thought there may be some redundancy here. So we should replace "!VectorTy->isVoidTy()" with "VectorTy->isVectorTy()". Checking that the type is actually a vector is what we really want. I'm not quite sure about "VF > 1". We only care about scalarization if we are vectorizing. I think it might be needed for the "TTI.getNumberOfParts(VectorTy) < VF" check, which wouldn't make sense if VF == 1. mssimpso: Right, I thought there may be some redundancy here. So we should replace "!VectorTy->isVoidTy…

void LoopVectorizationCostModel::setCostBasedWideningDecision(unsigned VF) {		void LoopVectorizationCostModel::setCostBasedWideningDecision(unsigned VF) {
if (VF == 1)		if (VF == 1)
return;		return;
for (BasicBlock *BB : TheLoop->blocks()) {		for (BasicBlock *BB : TheLoop->blocks()) {
// For each instruction in the old loop.		// For each instruction in the old loop.
for (Instruction &I : *BB) {		for (Instruction &I : *BB) {
Value *Ptr = getPointerOperand(&I);		Value *Ptr = getPointerOperand(&I);
▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines
}		}

unsigned LoopVectorizationCostModel::getInstructionCost(Instruction *I,		unsigned LoopVectorizationCostModel::getInstructionCost(Instruction *I,
unsigned VF,		unsigned VF,
Type *&VectorTy) {		Type *&VectorTy) {
Type *RetTy = I->getType();		Type *RetTy = I->getType();
if (canTruncateToMinimalBitwidth(I, VF))		if (canTruncateToMinimalBitwidth(I, VF))
RetTy = IntegerType::get(RetTy->getContext(), MinBWs[I]);		RetTy = IntegerType::get(RetTy->getContext(), MinBWs[I]);
VectorTy = ToVectorTy(RetTy, VF);		VectorTy = isScalarAfterVectorization(I, VF) ? RetTy : ToVectorTy(RetTy, VF);
auto SE = PSE.getSE();		auto SE = PSE.getSE();

// TODO: We need to estimate the cost of intrinsic calls.		// TODO: We need to estimate the cost of intrinsic calls.
switch (I->getOpcode()) {		switch (I->getOpcode()) {
case Instruction::GetElementPtr:		case Instruction::GetElementPtr:
// We mark this instruction as zero-cost because the cost of GEPs in		// We mark this instruction as zero-cost because the cost of GEPs in
// vectorized code depends on whether the corresponding memory instruction		// vectorized code depends on whether the corresponding memory instruction
// is scalarized or not. Therefore, we handle GEPs with the memory		// is scalarized or not. Therefore, we handle GEPs with the memory
▲ Show 20 Lines • Show All 116 Lines • ▼ Show 20 Lines	if (isa<ConstantInt>(Op2)) {
ConstantInt *CInt = dyn_cast<ConstantInt>(SplatValue);		ConstantInt *CInt = dyn_cast<ConstantInt>(SplatValue);
if (CInt && CInt->getValue().isPowerOf2())		if (CInt && CInt->getValue().isPowerOf2())
Op2VP = TargetTransformInfo::OP_PowerOf2;		Op2VP = TargetTransformInfo::OP_PowerOf2;
Op2VK = TargetTransformInfo::OK_UniformConstantValue;		Op2VK = TargetTransformInfo::OK_UniformConstantValue;
}		}
} else if (Legal->isUniform(Op2)) {		} else if (Legal->isUniform(Op2)) {
Op2VK = TargetTransformInfo::OK_UniformValue;		Op2VK = TargetTransformInfo::OK_UniformValue;
}		}
SmallVector<const Value *, 4> Operands(I->operand_values());		SmallVector<const Value *, 4> Operands(I->operand_values());
return TTI.getArithmeticInstrCost(I->getOpcode(), VectorTy, Op1VK,		unsigned N = isScalarAfterVectorization(I, VF) ? VF : 1;
		return N * TTI.getArithmeticInstrCost(I->getOpcode(), VectorTy, Op1VK,
Op2VK, Op1VP, Op2VP, Operands);		Op2VK, Op1VP, Op2VP, Operands);
}		}
case Instruction::Select: {		case Instruction::Select: {
SelectInst *SI = cast<SelectInst>(I);		SelectInst *SI = cast<SelectInst>(I);
const SCEV *CondSCEV = SE->getSCEV(SI->getCondition());		const SCEV *CondSCEV = SE->getSCEV(SI->getCondition());
bool ScalarCond = (SE->isLoopInvariant(CondSCEV, TheLoop));		bool ScalarCond = (SE->isLoopInvariant(CondSCEV, TheLoop));
Type *CondTy = SI->getCondition()->getType();		Type *CondTy = SI->getCondition()->getType();
if (!ScalarCond)		if (!ScalarCond)
CondTy = VectorType::get(CondTy, VF);		CondTy = VectorType::get(CondTy, VF);

return TTI.getCmpSelInstrCost(I->getOpcode(), VectorTy, CondTy, I);		return TTI.getCmpSelInstrCost(I->getOpcode(), VectorTy, CondTy, I);
}		}
case Instruction::ICmp:		case Instruction::ICmp:
case Instruction::FCmp: {		case Instruction::FCmp: {
Type *ValTy = I->getOperand(0)->getType();		Type *ValTy = I->getOperand(0)->getType();
Instruction *Op0AsInstruction = dyn_cast<Instruction>(I->getOperand(0));		Instruction *Op0AsInstruction = dyn_cast<Instruction>(I->getOperand(0));
if (canTruncateToMinimalBitwidth(Op0AsInstruction, VF))		if (canTruncateToMinimalBitwidth(Op0AsInstruction, VF))
ValTy = IntegerType::get(ValTy->getContext(), MinBWs[Op0AsInstruction]);		ValTy = IntegerType::get(ValTy->getContext(), MinBWs[Op0AsInstruction]);
VectorTy = ToVectorTy(ValTy, VF);		VectorTy = ToVectorTy(ValTy, VF);
return TTI.getCmpSelInstrCost(I->getOpcode(), VectorTy, nullptr, I);		return TTI.getCmpSelInstrCost(I->getOpcode(), VectorTy, nullptr, I);
}		}
case Instruction::Store:		case Instruction::Store:
case Instruction::Load: {		case Instruction::Load: {
VectorTy = ToVectorTy(getMemInstValueType(I), VF);		unsigned Width = VF;
		if (Width > 1) {
		InstWidening Decision = getWideningDecision(I, Width);
		assert(Decision != CM_Unknown &&
		"CM decision should be taken at this point");
		if (Decision == CM_Scalarize)
		Width = 1;
		}
		VectorTy = ToVectorTy(getMemInstValueType(I), Width);
return getMemoryInstructionCost(I, VF);		return getMemoryInstructionCost(I, VF);
}		}
case Instruction::ZExt:		case Instruction::ZExt:
case Instruction::SExt:		case Instruction::SExt:
case Instruction::FPToUI:		case Instruction::FPToUI:
case Instruction::FPToSI:		case Instruction::FPToSI:
case Instruction::FPExt:		case Instruction::FPExt:
case Instruction::PtrToInt:		case Instruction::PtrToInt:
▲ Show 20 Lines • Show All 604 Lines • Show Last 20 Lines

test/Transforms/LoopVectorize/AArch64/no_vector_instructions.ll

This file was added.

				; REQUIRES: asserts
				; RUN: opt < %s -loop-vectorize -force-vector-interleave=1 -S --debug-only=loop-vectorize 2>&1 \| FileCheck %s

				target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
				target triple = "aarch64--linux-gnu"

				; CHECK-LABEL: all_scalar
				; CHECK: LV: Found scalar instruction: %i.next = add nuw nsw i64 %i, 2
				; CHECK: LV: Found an estimated cost of 2 for VF 2 For instruction: %i.next = add nuw nsw i64 %i, 2
				; CHECK: LV: Not considering vector loop of width 2 because it will not generate any vector instructions
				;
				define void @all_scalar(i64* %a, i64 %n) {
				entry:
				br label %for.body

				for.body:
				%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]
				%tmp0 = getelementptr i64, i64* %a, i64 %i
				store i64 0, i64* %tmp0, align 1
				%i.next = add nuw nsw i64 %i, 2
				%cond = icmp eq i64 %i.next, %n
				br i1 %cond, label %for.end, label %for.body

				for.end:
				ret void
				}