Loop invariant operands do not need to be scalarized, as we are using
the values outside the loop. We should ignore them when computing the
scalarization overhead.
Fixes PR41294
Differential D59995
[LV] Exclude loop-invariant inputs from scalar cost computation. fhahn on Mar 29 2019, 8:03 AM. Authored by
Details Loop invariant operands do not need to be scalarized, as we are using Fixes PR41294
Diff Detail
Event TimelineComment Actions The culprit here is the assumption made by TTI.getOperandsScalarizationOverhead(Operands, VF) that all its Operands will be vectorized according to VF, and would thus require extraction to feed a scalarized/replicated user. But any such Operand might not get vectorized, and possibly must not get vectorized, e.g., due to an incompatible type as demonstrated by PR41294 and the testcase. In some cases an Operand will obviously not be vectorized, such as if it's loop-invariant or live-in. More generally, LV uses the following: auto needsExtract = [&](Instruction *I) -> bool { return TheLoop->contains(I) && !isScalarAfterVectorization(I, VF); }; which would require passing not only TheLoop into getScalarizationOverhead(I, VF, TTI) but also the CM --- better turn this static function into a method of CM? Note that there's also CM.isProfitableToScalarize(I, VF)), but it relies on having computed costs, so difficult to use when computing the cost (of a User). Skipping it would only affect accuracy of resulting cost, considering Operands that can be vectorized but will not be due to profitability. Fixing getVectorCallCost deserves another testcase. Would be good to hoist such invariant instructions %a = extractvalue { i64, i64 } %sv, 0 %b = extractvalue { i64, i64 } %sv, 1 out of the loop before LV, or at-least have LV recognize them as uniform.
Comment Actions Split out moving various functions to LoopVectorizationCostModel to D61638. Hoisted needsExtract to do the check if we need to extract/scalarize an operand Comment Actions Done.
Added a test case with call. But thinking about it again, it does not really test the issue. Not sure it is actually possible to test getVectorCallCost, as there are no vector call functions that take struct values?
Yep, I'll update it in a second. In practice, I don't think there are any intrinsics that take struct types, but maybe in the future it might become a problem.
Yep, I can look into that as a follow up. LICM should hoist those things, but I think in general we cannot guarantee that all loop-invariant instructions are hoisted out before LoopVectorize (the test case came from a fuzzer). Do you think we should try to hoist them in LV? I would assume a later run of LICM will hoist them. Comment Actions Very good, thanks for the refactoring.
Good point. Does the test case pass w/o this patch?
The thought was to check LICM, and check LV's uniform analysis.
Comment Actions Addressed Ayal's comments, thanks and sorry for the long delay, it somehow dropped off my radar. Yes both tests fail with the patch. The only thing that cannot be tested is the combination of intrinsic taking struct
I am not entirely sure I understand what you mean here. I checked LICM and it will Currently they are not marked as uniform in LV, but I can fix that in a follow-up Comment Actions Some format typos, and clarifying if needsExtract() should assume vectorized or scalarized before scalars are computed.
Comment Actions
Comment Actions LGTM, with some additional thoughts provoked by this fix.
Comment Actions This triggers failed asserts in some cases, reproable with https://martin.st/temp/loadimage-preproc.cpp.xz, with clang++ -target i686-w64-mingw32 -c -O3 loadimage-preproc.cpp. Will post a proper bug report later today, unless someone else beats me to it. Comment Actions Thanks for reporting the issue, should be fixed in rL366049. @Ayal, the problem was that we are not computing the scalarized cost in getVectorIntrinsicCost and TTI::getIntrinsicInstrCost expects the full list of arguments, so we should not filter it. I'll check if there's a reason we do not consider the scalarized cost there. |
Thanks @fhahn, just pointing out the above documentation is also inaccurate, along with the associated cost.