Page MenuHomePhabricator

[LV] Exclude loop-invariant inputs from scalar cost computation.
Needs ReviewPublic

Authored by fhahn on Mar 29 2019, 8:03 AM.

Details

Summary

Loop invariant operands do not need to be scalarized, as we are using
the values outside the loop. We should ignore them when computing the
scalarization overhead.

Fixes PR41294

Event Timeline

fhahn created this revision.Mar 29 2019, 8:03 AM
Ayal added a subscriber: Ayal.Apr 5 2019, 1:03 AM

The culprit here is the assumption made by TTI.getOperandsScalarizationOverhead(Operands, VF) that all its Operands will be vectorized according to VF, and would thus require extraction to feed a scalarized/replicated user. But any such Operand might not get vectorized, and possibly must not get vectorized, e.g., due to an incompatible type as demonstrated by PR41294 and the testcase. In some cases an Operand will obviously not be vectorized, such as if it's loop-invariant or live-in. More generally, LV uses the following:

auto needsExtract = [&](Instruction *I) -> bool {
  return TheLoop->contains(I) && !isScalarAfterVectorization(I, VF);
};

which would require passing not only TheLoop into getScalarizationOverhead(I, VF, TTI) but also the CM --- better turn this static function into a method of CM?

Note that there's also CM.isProfitableToScalarize(I, VF)), but it relies on having computed costs, so difficult to use when computing the cost (of a User). Skipping it would only affect accuracy of resulting cost, considering Operands that can be vectorized but will not be due to profitability.

Fixing getVectorCallCost deserves another testcase.
Seems like getVectorIntrinsicCost also requires fixing?

Would be good to hoist such invariant instructions

%a = extractvalue { i64, i64 } %sv, 0
%b = extractvalue { i64, i64 } %sv, 1

out of the loop before LV, or at-least have LV recognize them as uniform.

llvm/test/Transforms/LoopVectorize/AArch64/instructions-with-struct-ops.ll
3 ↗(On Diff #192826)

Record PR41294 in comment, file name, or testcase function name.

4 ↗(On Diff #192826)

for extractelement >> for extractvalue

10 ↗(On Diff #192826)

the extractelement >> the extractvalue

fhahn planned changes to this revision.Apr 10 2019, 1:03 PM

Thanks for your comments Ayal! I'll update the patch in a bit.

fhahn updated this revision to Diff 198430.May 7 2019, 3:54 AM

Split out moving various functions to LoopVectorizationCostModel to D61638.

Hoisted needsExtract to do the check if we need to extract/scalarize an operand
to LoopVectorizationCostModel and extend it to check for loop invariant operands.

fhahn marked 3 inline comments as done.May 7 2019, 4:07 AM

The culprit here is the assumption made by TTI.getOperandsScalarizationOverhead(Operands, VF) that all its Operands will be vectorized according to VF, and would thus require extraction to feed a scalarized/replicated user. But any such Operand might not get vectorized, and possibly must not get vectorized, e.g., due to an incompatible type as demonstrated by PR41294 and the testcase. In some cases an Operand will obviously not be vectorized, such as if it's loop-invariant or live-in. More generally, LV uses the following:

auto needsExtract = [&](Instruction *I) -> bool {
  return TheLoop->contains(I) && !isScalarAfterVectorization(I, VF);
};

which would require passing not only TheLoop into getScalarizationOverhead(I, VF, TTI) but also the CM --- better turn this static function into a method of CM?

Done.

Note that there's also CM.isProfitableToScalarize(I, VF)), but it relies on having computed costs, so difficult to use when computing the cost (of a User). Skipping it would only affect accuracy of resulting cost, considering Operands that can be vectorized but will not be due to profitability.

Fixing getVectorCallCost deserves another testcase.

Added a test case with call. But thinking about it again, it does not really test the issue. Not sure it is actually possible to test getVectorCallCost, as there are no vector call functions that take struct values?

Seems like getVectorIntrinsicCost also requires fixing?

Yep, I'll update it in a second. In practice, I don't think there are any intrinsics that take struct types, but maybe in the future it might become a problem.

Would be good to hoist such invariant instructions

%a = extractvalue { i64, i64 } %sv, 0
%b = extractvalue { i64, i64 } %sv, 1

out of the loop before LV, or at-least have LV recognize them as uniform.

Yep, I can look into that as a follow up. LICM should hoist those things, but I think in general we cannot guarantee that all loop-invariant instructions are hoisted out before LoopVectorize (the test case came from a fuzzer). Do you think we should try to hoist them in LV? I would assume a later run of LICM will hoist them.

fhahn updated this revision to Diff 198433.May 7 2019, 4:10 AM

Filter operands for getIntrinsicCallCost as well.

fhahn added a reviewer: Ayal.May 14 2019, 12:32 PM

Ping. Ayal, does this address your comments appropriately?

Ayal added a comment.May 14 2019, 3:06 PM

The culprit here is the assumption made by TTI.getOperandsScalarizationOverhead(Operands, VF) that all its Operands will be vectorized according to VF, and would thus require extraction to feed a scalarized/replicated user. But any such Operand might not get vectorized, and possibly must not get vectorized, e.g., due to an incompatible type as demonstrated by PR41294 and the testcase. In some cases an Operand will obviously not be vectorized, such as if it's loop-invariant or live-in. More generally, LV uses the following:

auto needsExtract = [&](Instruction *I) -> bool {
  return TheLoop->contains(I) && !isScalarAfterVectorization(I, VF);
};

which would require passing not only TheLoop into getScalarizationOverhead(I, VF, TTI) but also the CM --- better turn this static function into a method of CM?

Done.

Very good, thanks for the refactoring.

Note that there's also CM.isProfitableToScalarize(I, VF)), but it relies on having computed costs, so difficult to use when computing the cost (of a User). Skipping it would only affect accuracy of resulting cost, considering Operands that can be vectorized but will not be due to profitability.

Fixing getVectorCallCost deserves another testcase.

Added a test case with call. But thinking about it again, it does not really test the issue. Not sure it is actually possible to test getVectorCallCost, as there are no vector call functions that take struct values?

Good point. Does the test case pass w/o this patch?

Seems like getVectorIntrinsicCost also requires fixing?

Yep, I'll update it in a second. In practice, I don't think there are any intrinsics that take struct types, but maybe in the future it might become a problem.

Would be good to hoist such invariant instructions

%a = extractvalue { i64, i64 } %sv, 0
%b = extractvalue { i64, i64 } %sv, 1

out of the loop before LV, or at-least have LV recognize them as uniform.

Yep, I can look into that as a follow up. LICM should hoist those things, but I think in general we cannot guarantee that all loop-invariant instructions are hoisted out before LoopVectorize (the test case came from a fuzzer). Do you think we should try to hoist them in LV? I would assume a later run of LICM will hoist them.

The thought was to check LICM, and check LV's uniform analysis.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1335

This is admittedly part of the original comment, but while we're here - "cannot be scalarized" >> "is expected to be vectorized" - every V can be scalarized. Furthermore, if we're unsure whether V will be vectorized or not it's safer to assume that it will be scalarized rather than vectorized, as the latter might not be possible. So if Scalars.find(VF) isn't set yet, which will cause isScalarAfterVectorization() to assert, better have needsExtract() return false.

5696

Yes, there's a phase ordering issue here, and also one regarding isProfitableToScalarize() as mentioned above (which would be good to note somewhere). But the instruction(s) that may trigger an assert, being loads or stores, are the operands of I rather than I itself, right? In any case, suggest to have FilterOperands()/needsExtract() understand if they can use isScalarAfterVectorization() or else be conservative, as noted above. Perhaps make Filter[Extracting]Operands() a method, to be reused.