This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
2/5
LoopVectorize.cpp
-
test/Transforms/LoopVectorize/AArch64/
-
Transforms/
-
LoopVectorize/
-
AArch64/
-
no_vector_instructions.ll

Differential D98512

[LoopVectorize] Simplify scalar cost calculation in getInstructionCost
ClosedPublic

Authored by david-arm on Mar 12 2021, 7:50 AM.

Download Raw Diff

Details

Reviewers

sdesmalen
bsmith
craig.topper
kmclaughlin
dmgreen
fhahn

Commits

rG240aa96cf25d: [LoopVectorize] Simplify scalar cost calculation in getInstructionCost

Summary

This patch simplifies the calculation of certain costs in
getInstructionCost when isScalarAfterVectorization() returns a true value.
There are a few places where we multiply a cost by a number N, i.e.

unsigned N = isScalarAfterVectorization(I, VF) ? VF.getKnownMinValue() : 1;
return N * TTI.getArithmeticInstrCost(...

After some investigation it seems that there are only these cases that occur
in practice:

VF is a scalar, in which case N = 1.
VF is a vector. We can only get here if: a) the instruction is a GEP/bitcast with scalar uses, or b) this is an update to an induction variable that remains scalar.

I have changed the code so that N is assumed to always be 1. For GEPs
the cost is always 0, since this is calculated later on as part of the
load/store cost. For all other cases I have added an assert that none of the
users needs scalarising, which didn't fire in any unit tests.

Only one test required fixing and I believe the original cost for the scalar
add instruction to have been wrong, since only one copy remains after
vectorisation.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

david-arm created this revision.Mar 12 2021, 7:50 AM

Herald added a subscriber: hiraditya. · View Herald TranscriptMar 12 2021, 7:50 AM

david-arm requested review of this revision.Mar 12 2021, 7:50 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 12 2021, 7:50 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

david-arm added a reviewer: dmgreen.Mar 12 2021, 7:54 AM

david-arm mentioned this in D98054: [LoopVectorize][SVE] Fix crash when vectorising FP negation.Mar 12 2021, 8:22 AM

david-arm added a child revision: D98054: [LoopVectorize][SVE] Fix crash when vectorising FP negation.

Harbormaster completed remote builds in B93502: Diff 330247.Mar 12 2021, 9:53 AM

I agree that the IV update in the test should have a cost of 1. (It seems strange it doesn't already, I suspect it isn't matched the same as an increment 1).

Would it always be true for the addresses of scalarized loads/stores? They do need to calculate N addresses.

This didn't show any change in the tests I ran on it, but there is potentially not a lot in there that would scalarize memops.

Hi @dmgreen, thanks for taking a look at the patch. So you raise a good point - the loads/stores will be dealt with before reaching this variant getInstructionCost, however the GEPs/Bitcasts for those will still be costed here. However, for GEPs at least you can see this code in getInstructionCost:

case Instruction::GetElementPtr:
  // We mark this instruction as zero-cost because the cost of GEPs in
  // vectorized code depends on whether the corresponding memory instruction
  // is scalarized or not. Therefore, we handle GEPs with the memory
  // instruction cost.
  return 0;

which only leaves us with Bitcasts with pointer type results. Normal ptr-ptr bitcasts will be 0 cost I think, so I imagine what's left are bitcasts of non-ptr types to ptr types. It's very unfortunate that the Scalars list includes so many different types of Scalars that need costing differently. What I could do here is have a single special case for Bitcasts where the cost (likely 0) is multiplied by VF, with an assertion that we're not dealing with scalable vectors.

I guess for bitcasts the cost may be non-trivial for big endian targets.

In D98512#2624996, @dmgreen wrote:

I agree that the IV update in the test should have a cost of 1. (It seems strange it doesn't already, I suspect it isn't matched the same as an increment 1).

Would it always be true for the addresses of scalarized loads/stores? They do need to calculate N addresses.

Hmm, but if there are uses of the IV in this case, don't we need to generate a scalar value for each element in the VF? E.g consider the example below

define void @test(half* %b, i32 %n) {
entry:
  br label %for.body

for.body:
  %i = phi i32 [ 0, %entry ], [ %i.next, %for.body ]
  %tmp0 = getelementptr half, half* %b, i32 %i
  %lv = load half, half* %tmp0
  %add = fadd half %lv, 40.0
  store half %add, half* %tmp0, align 1
  %i.next = add nuw nsw i32 %i, 3
  %cond = icmp eq i32 %i.next, %n
  br i1 %cond, label %for.end, label %for.body

for.end:
  ret void
}

Generates the vector body below with -S -force-vector-width=4 -mtriple=arm64-apple-iphoneos. Note %5 = add i32 %offset.idx, 0, %6 = add i32 %offset.idx, 3, %7 = add i32 %offset.idx, 6 & %8 = add i32 %offset.idx, 9. With this change, the cost of the induction increment is 1, but scalarization requires us to create 4 scalar increments. I don't think the cost of that is accounted elsewhere?

vector.body:                                      ; preds = %vector.body, %vector.ph
  %index = phi i32 [ 0, %vector.ph ], [ %index.next, %vector.body ]
  %offset.idx = mul i32 %index, 3
  %5 = add i32 %offset.idx, 0
  %6 = add i32 %offset.idx, 3
  %7 = add i32 %offset.idx, 6
  %8 = add i32 %offset.idx, 9
  %9 = getelementptr half, half* %b, i32 %5
  %10 = getelementptr half, half* %b, i32 %6
  %11 = getelementptr half, half* %b, i32 %7
  %12 = getelementptr half, half* %b, i32 %8
  %13 = getelementptr half, half* %9, i32 0
  %14 = bitcast half* %13 to <12 x half>*
  %wide.vec = load <12 x half>, <12 x half>* %14, align 2
  %strided.vec = shufflevector <12 x half> %wide.vec, <12 x half> poison, <4 x i32> <i32 0, i32 3, i32 6, i32 9>
  %15 = fadd <4 x half> %strided.vec, <half 0xH5100, half 0xH5100, half 0xH5100, half 0xH5100>
  %16 = extractelement <4 x half> %15, i32 0
  store half %16, half* %9, align 1
  %17 = extractelement <4 x half> %15, i32 1
  store half %17, half* %10, align 1
  %18 = extractelement <4 x half> %15, i32 2
  store half %18, half* %11, align 1
  %19 = extractelement <4 x half> %15, i32 3
  store half %19, half* %12, align 1
  %index.next = add i32 %index, 4
  %20 = icmp eq i32 %index.next, %n.vec
  br i1 %20, label %middle.block, label %vector.body, !llvm.loop !0

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
7413–7415	I think conceptually there's nothing preventing us from reaching this code with an instruction that needs to be scalarized if the VF > 1. It might not occur in practice for some/most backends at the moment, because most of the opcodes above can be widened on most backends and we do not really properly track which instructions will be scalarized at this point, except for pointer & induction variables. If we make the code less general, I think we should have at least an assertion to make sure we never reach this code when VF > 1 && scalarization is requested, so we at least notice once it really matters and we need to be more general again.

Thanks for the review @fhahn. If anything this just highlights the poor code coverage we have at the moment for cost calculations when scalarising. At the very least I'll add some tests that cover some of these corner cases, since there was nothing that really broke when I made this change!

Although I think the adds you mentioned above @fhahn don't fall into the Scalars variable - I'll double check this though. This patch only affects instructions that are members of Scalars. Users of the IV don't fall into that - only updates to the IV I think.

Hi @fhahn, I tested the example you pasted above before and after applying my patch and got this debug output:

Before:
LV: Found an estimated cost of 0 for VF 4 For instruction:   %i = phi i32 [ 0, %entry ], [ %i.next, %for.body ]
LV: Found an estimated cost of 0 for VF 4 For instruction:   %tmp0 = getelementptr half, half* %b, i32 %i
LV: Found an estimated cost of 3 for VF 4 For instruction:   %lv = load half, half* %tmp0, align 2
LV: Found an estimated cost of 2 for VF 4 For instruction:   %add = fadd half %lv, 0xH5100
LV: Found an estimated cost of 23 for VF 4 For instruction:   store half %add, half* %tmp0, align 1
LV: Found an estimated cost of 4 for VF 4 For instruction:   %i.next = add nuw nsw i32 %i, 3
LV: Found an estimated cost of 1 for VF 4 For instruction:   %cond = icmp eq i32 %i.next, %n
LV: Found an estimated cost of 0 for VF 4 For instruction:   br i1 %cond, label %for.end, label %for.body

After:
LV: Found an estimated cost of 0 for VF 4 For instruction:   %i = phi i32 [ 0, %entry ], [ %i.next, %for.body ]
LV: Found an estimated cost of 0 for VF 4 For instruction:   %tmp0 = getelementptr half, half* %b, i32 %i
LV: Found an estimated cost of 3 for VF 4 For instruction:   %lv = load half, half* %tmp0, align 2
LV: Found an estimated cost of 2 for VF 4 For instruction:   %add = fadd half %lv, 0xH5100
LV: Found an estimated cost of 23 for VF 4 For instruction:   store half %add, half* %tmp0, align 1
LV: Found an estimated cost of 1 for VF 4 For instruction:   %i.next = add nuw nsw i32 %i, 3
LV: Found an estimated cost of 1 for VF 4 For instruction:   %cond = icmp eq i32 %i.next, %n
LV: Found an estimated cost of 0 for VF 4 For instruction:   br i1 %cond, label %for.end, label %for.body

It looks like the only change is the IV update which goes from 4 -> 1. I think the adds you mentioned are included as part of the load/store cost.

In D98512#2625701, @david-arm wrote:

It looks like the only change is the IV update which goes from 4 -> 1. I think the adds you mentioned are included as part of the load/store cost.

I am not sure. From a quick look at the code, it seems like the high load/store cost comes mostly from the required scalarization of the stored/loaded values and the cost of the extra increments seems not fully included at least on the IR instruction level. Unfortunately I don't think there's really any different option to write a test that hits the code paths in the patch, even if the scalarized values are used in other places later on. So it's probably not worth worrying too much about it. I still think we should at least have an assertion as mentioned above, to catch the gap if the cost-modeling is improved.

In D98512#2625736, @fhahn wrote:

In D98512#2625701, @david-arm wrote:

It looks like the only change is the IV update which goes from 4 -> 1. I think the adds you mentioned are included as part of the load/store cost.

I am not sure. From a quick look at the code, it seems like the high load/store cost comes mostly from the required scalarization of the stored/loaded values and the cost of the extra increments seems not fully included at least on the IR instruction level. Unfortunately I don't think there's really any different option to write a test that hits the code paths in the patch, even if the scalarized values are used in other places later on. So it's probably not worth worrying too much about it. I still think we should at least have an assertion as mentioned above, to catch the gap if the cost-modeling is improved.

Hi @fhahn I guess I'm just a bit confused as to what assert you're asking for here to be honest. We do reach the Add case for the scalar IV increment (that stays scalar after vectorisation) that happens in your test case above, where VF > 1. What assert are you proposing here? Something like:

assert(VF == 1 || !isScalarAfterVectorization(Instruction)); ?

If so, I think that this is guaranteed to fail with your test case. Alternatively, something like this should work I think:

assert(VF == 1 || !InstsToScalarize.count(Instruction));

since scalarised vector instructions should not get into this function.

In your test above the other adds that appear near the start of the loop after vectorisation I believe are due to expansion of the GEP during vectorisation, rather than at the cost phase. The cost for the GEP is explicitly 0 in getInstructionCost and the comment says that the cost is calculated as part of the load/store cost. The purpose of this patch isn't really to fix/improve the cost calculation of loads/stores by taking the extra generated adds into account - it's mainly just to simplify the calculation of costs for the few cases where isScalarAfterVectorization returns true.

In D98512#2626004, @david-arm wrote:
In D98512#2625736, @fhahn wrote:

In D98512#2625701, @david-arm wrote:

It looks like the only change is the IV update which goes from 4 -> 1. I think the adds you mentioned are included as part of the load/store cost.

I am not sure. From a quick look at the code, it seems like the high load/store cost comes mostly from the required scalarization of the stored/loaded values and the cost of the extra increments seems not fully included at least on the IR instruction level. Unfortunately I don't think there's really any different option to write a test that hits the code paths in the patch, even if the scalarized values are used in other places later on. So it's probably not worth worrying too much about it. I still think we should at least have an assertion as mentioned above, to catch the gap if the cost-modeling is improved.

Hi @fhahn I guess I'm just a bit confused as to what assert you're asking for here to be honest. We do reach the Add case for the scalar IV increment (that stays scalar after vectorisation) that happens in your test case above, where VF > 1. What assert are you proposing here? Something like:
assert(VF == 1 || !isScalarAfterVectorization(Instruction)); ?
If so, I think that this is guaranteed to fail with your test case. Alternatively, something like this should work I think:
assert(VF == 1 || !InstsToScalarize.count(Instruction));
since scalarised vector instructions should not get into this function.

Yes I think if we would use !isScalarAfterVectorization(Instruction) in the assertion, we would at least need to also allow VF != 1 for induction updates, but that's going to be a bit hard to check. So checking InstsToScalarize is probably the best option for now

In your test above the other adds that appear near the start of the loop after vectorisation I believe are due to expansion of the GEP during vectorisation, rather than at the cost phase. The cost for the GEP is explicitly 0 in getInstructionCost and the comment says that the cost is calculated as part of the load/store cost. The purpose of this patch isn't really to fix/improve the cost calculation of loads/stores by taking the extra generated adds into account - it's mainly just to simplify the calculation of costs for the few cases where isScalarAfterVectorization returns true.

That should be fine. Could you also add a comment explaining why we do not need to scale if isScalarAfterVectorization currently.

Added an assert that if an instruction remains scalar after vectorisation, then both the instruction and all it's users will not be scalarised. If any users were to be scalarised then potentially that means multiple copies of the instruction will be generated. In the case of GEPs the cost is actually calculated as part of the load/store cost.

david-arm edited the summary of this revision. (Show Details)Mar 22 2021, 5:30 AM

david-arm edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B94968: Diff 332257.Mar 22 2021, 6:16 AM

LGTM, thanks! Please wait a day or 2 with committing in case there are additional comments.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
7253	nit: Change to have an early exit for `VF.isScalar`?
7267	nit: I'd just define `hasSingleCopyAfterVectorization` as lambda here, because the comment already explains the intent.

This revision is now accepted and ready to land.Mar 23 2021, 2:29 PM

sdesmalen added inline comments.Mar 24 2021, 2:15 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
7250	I'd prefer to either remove the ifndef here so that it's always defined (and the compiler can optimize it away if it isn't called), or implementing the lambda suggestion from Florian.
7560–7561	This should probably return `TTI.getArithmeticInstrCost(..)` now, because the VF.getFixedValue() multiplier and possible scalarization overhead are now represented as part of `VectorTy` and the call to `TTI.getArithmeticInstrCost`. There be no reason why this default opcode should behave differently from the other opcodes now that you've done the work to verify the right type is passed to the `TTI.get`*Cost calls.

david-arm updated this revision to Diff 333001.Mar 24 2021, 8:16 AM

david-arm marked 2 inline comments as done.

Harbormaster completed remote builds in B95498: Diff 333001.Mar 24 2021, 2:53 PM

This revision was landed with ongoing or failed builds.Mar 26 2021, 4:27 AM

Closed by commit rG240aa96cf25d: [LoopVectorize] Simplify scalar cost calculation in getInstructionCost (authored by david-arm). · Explain Why

This revision was automatically updated to reflect the committed changes.

david-arm added a commit: rG240aa96cf25d: [LoopVectorize] Simplify scalar cost calculation in getInstructionCost.

david-arm added a reverting change: rGc39460cc4f7c: Revert "[LoopVectorize] Simplify scalar cost calculation in getInstructionCost".Mar 26 2021, 4:37 AM

david-arm mentioned this in D99718: [LoopVectorize] Simplify scalar cost calculation in getInstructionCost.Apr 1 2021, 4:17 AM

david-arm removed a child revision: D98054: [LoopVectorize][SVE] Fix crash when vectorising FP negation.Apr 1 2021, 4:20 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

LoopVectorize.cpp

63 lines

test/

Transforms/

LoopVectorize/

AArch64/

no_vector_instructions.ll

2 lines

Diff 333532

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 7,241 Lines • ▼ Show 20 Lines	if (isa<LoadInst>(I)) {
}		}
} else		} else
// Make sure I gets scalarized and a cost estimate without		// Make sure I gets scalarized and a cost estimate without
// scalarization overhead.		// scalarization overhead.
ForcedScalars[VF].insert(I);		ForcedScalars[VF].insert(I);
}		}
}		}

InstructionCost		InstructionCost
		sdesmalenUnsubmitted Not Done Reply Inline Actions I'd prefer to either remove the ifndef here so that it's always defined (and the compiler can optimize it away if it isn't called), or implementing the lambda suggestion from Florian. sdesmalen: I'd prefer to either remove the ifndef here so that it's always defined (and the compiler can…
LoopVectorizationCostModel::getInstructionCost(Instruction *I, ElementCount VF,		LoopVectorizationCostModel::getInstructionCost(Instruction *I, ElementCount VF,
Type *&VectorTy) {		Type *&VectorTy) {
Type *RetTy = I->getType();		Type *RetTy = I->getType();
		fhahnUnsubmitted Done Reply Inline Actions nit: Change to have an early exit for `VF.isScalar`? fhahn: nit: Change to have an early exit for `VF.isScalar`?
if (canTruncateToMinimalBitwidth(I, VF))		if (canTruncateToMinimalBitwidth(I, VF))
RetTy = IntegerType::get(RetTy->getContext(), MinBWs[I]);		RetTy = IntegerType::get(RetTy->getContext(), MinBWs[I]);
VectorTy = isScalarAfterVectorization(I, VF) ? RetTy : ToVectorTy(RetTy, VF);
auto SE = PSE.getSE();		auto SE = PSE.getSE();
TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;		TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;

		auto hasSingleCopyAfterVectorization = [this](Instruction *I,
		ElementCount VF) -> bool {
		if (VF.isScalar())
		return true;

		auto Scalarized = InstsToScalarize.find(VF);
		assert(Scalarized != InstsToScalarize.end() &&
		"VF not yet analyzed for scalarization profitability");
		return !Scalarized->second.count(I) &&
		fhahnUnsubmitted Done Reply Inline Actions nit: I'd just define `hasSingleCopyAfterVectorization` as lambda here, because the comment already explains the intent. fhahn: nit: I'd just define `hasSingleCopyAfterVectorization` as lambda here, because the comment…
		llvm::all_of(I->users(), [&](User *U) {
		auto *UI = cast<Instruction>(U);
		return !Scalarized->second.count(UI);
		});
		};

		if (isScalarAfterVectorization(I, VF)) {
		VectorTy = RetTy;
		// With the exception of GEPs, after scalarization there should only be one
		// copy of the instruction generated in the loop. This is because the VF is
		// either 1, or any instructions that need scalarizing have already been
		// dealt with by the the time we get here. As a result, it means we don't
		// have to multiply the instruction cost by VF.
		assert(I->getOpcode() == Instruction::GetElementPtr \|\|
		hasSingleCopyAfterVectorization(I, VF));
		} else
		VectorTy = ToVectorTy(RetTy, VF);

// TODO: We need to estimate the cost of intrinsic calls.		// TODO: We need to estimate the cost of intrinsic calls.
switch (I->getOpcode()) {		switch (I->getOpcode()) {
case Instruction::GetElementPtr:		case Instruction::GetElementPtr:
// We mark this instruction as zero-cost because the cost of GEPs in		// We mark this instruction as zero-cost because the cost of GEPs in
// vectorized code depends on whether the corresponding memory instruction		// vectorized code depends on whether the corresponding memory instruction
// is scalarized or not. Therefore, we handle GEPs with the memory		// is scalarized or not. Therefore, we handle GEPs with the memory
// instruction cost.		// instruction cost.
return 0;		return 0;
▲ Show 20 Lines • Show All 111 Lines • ▼ Show 20 Lines	case Instruction::Xor: {
Value *Op2 = I->getOperand(1);		Value *Op2 = I->getOperand(1);
TargetTransformInfo::OperandValueProperties Op2VP;		TargetTransformInfo::OperandValueProperties Op2VP;
TargetTransformInfo::OperandValueKind Op2VK =		TargetTransformInfo::OperandValueKind Op2VK =
TTI.getOperandInfo(Op2, Op2VP);		TTI.getOperandInfo(Op2, Op2VP);
if (Op2VK == TargetTransformInfo::OK_AnyValue && Legal->isUniform(Op2))		if (Op2VK == TargetTransformInfo::OK_AnyValue && Legal->isUniform(Op2))
Op2VK = TargetTransformInfo::OK_UniformValue;		Op2VK = TargetTransformInfo::OK_UniformValue;

SmallVector<const Value *, 4> Operands(I->operand_values());		SmallVector<const Value *, 4> Operands(I->operand_values());
unsigned N = isScalarAfterVectorization(I, VF) ? VF.getKnownMinValue() : 1;		return TTI.getArithmeticInstrCost(
return N * TTI.getArithmeticInstrCost(		I->getOpcode(), VectorTy, CostKind, TargetTransformInfo::OK_AnyValue,
I->getOpcode(), VectorTy, CostKind,
TargetTransformInfo::OK_AnyValue,
Op2VK, TargetTransformInfo::OP_None, Op2VP, Operands, I);		Op2VK, TargetTransformInfo::OP_None, Op2VP, Operands, I);
		fhahnUnsubmitted Not Done Reply Inline Actions I think conceptually there's nothing preventing us from reaching this code with an instruction that needs to be scalarized if the VF > 1. It might not occur in practice for some/most backends at the moment, because most of the opcodes above can be widened on most backends and we do not really properly track which instructions will be scalarized at this point, except for pointer & induction variables. If we make the code less general, I think we should have at least an assertion to make sure we never reach this code when VF > 1 && scalarization is requested, so we at least notice once it really matters and we need to be more general again. fhahn: I think conceptually there's nothing preventing us from reaching this code with an instruction…
}		}
case Instruction::FNeg: {		case Instruction::FNeg: {
assert(!VF.isScalable() && "VF is assumed to be non scalable.");		assert(!VF.isScalable() && "VF is assumed to be non scalable.");
unsigned N = isScalarAfterVectorization(I, VF) ? VF.getKnownMinValue() : 1;		return TTI.getArithmeticInstrCost(
return N * TTI.getArithmeticInstrCost(		I->getOpcode(), VectorTy, CostKind, TargetTransformInfo::OK_AnyValue,
I->getOpcode(), VectorTy, CostKind,		TargetTransformInfo::OK_AnyValue, TargetTransformInfo::OP_None,
TargetTransformInfo::OK_AnyValue,		TargetTransformInfo::OP_None, I->getOperand(0), I);
TargetTransformInfo::OK_AnyValue,
TargetTransformInfo::OP_None, TargetTransformInfo::OP_None,
I->getOperand(0), I);
}		}
case Instruction::Select: {		case Instruction::Select: {
SelectInst *SI = cast<SelectInst>(I);		SelectInst *SI = cast<SelectInst>(I);
const SCEV *CondSCEV = SE->getSCEV(SI->getCondition());		const SCEV *CondSCEV = SE->getSCEV(SI->getCondition());
bool ScalarCond = (SE->isLoopInvariant(CondSCEV, TheLoop));		bool ScalarCond = (SE->isLoopInvariant(CondSCEV, TheLoop));
Type *CondTy = SI->getCondition()->getType();		Type *CondTy = SI->getCondition()->getType();
if (!ScalarCond)		if (!ScalarCond)
CondTy = VectorType::get(CondTy, VF);		CondTy = VectorType::get(CondTy, VF);
▲ Show 20 Lines • Show All 107 Lines • ▼ Show 20 Lines	if (canTruncateToMinimalBitwidth(I, VF)) {
largestIntegerVectorType(ToVectorTy(I->getType(), VF), MinVecTy);		largestIntegerVectorType(ToVectorTy(I->getType(), VF), MinVecTy);
} else if (Opcode == Instruction::ZExt \|\| Opcode == Instruction::SExt) {		} else if (Opcode == Instruction::ZExt \|\| Opcode == Instruction::SExt) {
SrcVecTy = largestIntegerVectorType(SrcVecTy, MinVecTy);		SrcVecTy = largestIntegerVectorType(SrcVecTy, MinVecTy);
VectorTy =		VectorTy =
smallestIntegerVectorType(ToVectorTy(I->getType(), VF), MinVecTy);		smallestIntegerVectorType(ToVectorTy(I->getType(), VF), MinVecTy);
}		}
}		}

unsigned N;		return TTI.getCastInstrCost(Opcode, VectorTy, SrcVecTy, CCH, CostKind, I);
if (isScalarAfterVectorization(I, VF)) {
assert(!VF.isScalable() && "VF is assumed to be non scalable");
N = VF.getKnownMinValue();
} else
N = 1;
return N *
TTI.getCastInstrCost(Opcode, VectorTy, SrcVecTy, CCH, CostKind, I);
}		}
case Instruction::Call: {		case Instruction::Call: {
bool NeedToScalarize;		bool NeedToScalarize;
CallInst *CI = cast<CallInst>(I);		CallInst *CI = cast<CallInst>(I);
InstructionCost CallCost = getVectorCallCost(CI, VF, NeedToScalarize);		InstructionCost CallCost = getVectorCallCost(CI, VF, NeedToScalarize);
if (getVectorIntrinsicIDForCall(CI, TLI)) {		if (getVectorIntrinsicIDForCall(CI, TLI)) {
InstructionCost IntrinsicCost = getVectorIntrinsicCost(CI, VF);		InstructionCost IntrinsicCost = getVectorIntrinsicCost(CI, VF);
return std::min(CallCost, IntrinsicCost);		return std::min(CallCost, IntrinsicCost);
}		}
return CallCost;		return CallCost;
}		}
case Instruction::ExtractValue:		case Instruction::ExtractValue:
return TTI.getInstructionCost(I, TTI::TCK_RecipThroughput);		return TTI.getInstructionCost(I, TTI::TCK_RecipThroughput);
default:		default:
// The cost of executing VF copies of the scalar instruction. This opcode		// This opcode is unknown. Assume that it is the same as 'mul'.
		sdesmalenUnsubmitted Not Done Reply Inline Actions This should probably return `TTI.getArithmeticInstrCost(..)` now, because the VF.getFixedValue() multiplier and possible scalarization overhead are now represented as part of `VectorTy` and the call to `TTI.getArithmeticInstrCost`. There be no reason why this default opcode should behave differently from the other opcodes now that you've done the work to verify the right type is passed to the `TTI.get`Cost calls. sdesmalen:* This should probably return `TTI.getArithmeticInstrCost(..)` now, because the VF.getFixedValue…
// is unknown. Assume that it is the same as 'mul'.		return TTI.getArithmeticInstrCost(Instruction::Mul, VectorTy, CostKind);
return VF.getKnownMinValue() * TTI.getArithmeticInstrCost(
Instruction::Mul, VectorTy, CostKind) +
getScalarizationOverhead(I, VF);
} // end of switch.		} // end of switch.
}		}

char LoopVectorize::ID = 0;		char LoopVectorize::ID = 0;

static const char lv_name[] = "Loop Vectorization";		static const char lv_name[] = "Loop Vectorization";

INITIALIZE_PASS_BEGIN(LoopVectorize, LV_NAME, lv_name, false, false)		INITIALIZE_PASS_BEGIN(LoopVectorize, LV_NAME, lv_name, false, false)
▲ Show 20 Lines • Show All 2,404 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/AArch64/no_vector_instructions.ll

	; REQUIRES: asserts			; REQUIRES: asserts
	; RUN: opt < %s -loop-vectorize -force-vector-interleave=1 -S -debug-only=loop-vectorize 2>&1 \| FileCheck %s			; RUN: opt < %s -loop-vectorize -force-vector-interleave=1 -S -debug-only=loop-vectorize 2>&1 \| FileCheck %s

	target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
	target triple = "aarch64--linux-gnu"			target triple = "aarch64--linux-gnu"

	; CHECK-LABEL: all_scalar			; CHECK-LABEL: all_scalar
	; CHECK: LV: Found scalar instruction: %i.next = add nuw nsw i64 %i, 2			; CHECK: LV: Found scalar instruction: %i.next = add nuw nsw i64 %i, 2
	; CHECK: LV: Found an estimated cost of 2 for VF 2 For instruction: %i.next = add nuw nsw i64 %i, 2			; CHECK: LV: Found an estimated cost of 1 for VF 2 For instruction: %i.next = add nuw nsw i64 %i, 2
	; CHECK: LV: Not considering vector loop of width 2 because it will not generate any vector instructions			; CHECK: LV: Not considering vector loop of width 2 because it will not generate any vector instructions
	;			;
	define void @all_scalar(i64* %a, i64 %n) {			define void @all_scalar(i64* %a, i64 %n) {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]			%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]
	Show All 32 Lines