This is an archive of the discontinued LLVM Phabricator instance.

Fix incorrect expand of non-linear addrecs
AbandonedPublic

Authored by apilipenko on May 28 2019, 5:27 PM.

Details

Summary

We have a problem in SCEVExpander around expansion of non-linear addrecs. Expanding an addrec SCEVExpander emits an expression to compute the addrec at a given iteration number (using SCEVAddRecExpr::evaluateAtIteration). As an iteration number it uses the canonical induction variable of the loop. Canonical IV here is an IV starting at 0 and incremented by 1 on every iteration, {0,+,1}. When the loop doesn't have a canonical IV SCEVExpander inserts one. It uses the type of the addrec to be expanded as the type for the canonical IV. This is not always correct.

If the addrec to expand is a linear addrec {start,+,step}, the expression to compute the value at the given iteration i is:

(start + i * step) mod MaxType,

where MaxType is the maximum value in the type of the addrec.

A canonical IV of the same type as addrec corresponds to (i mod MaxType). Using this IV as the iteration number i works fine for linear addrecs:

(start + i * step) mod MaxType = (start mod MaxType) + (i mod MaxType) * (step mod MaxType)

This is because mod commutates with + and *, so we can sink mod (truncation) down to the operands.

But it's not correct for non-linear addrecs because the expression to compute the value on the given iteration involves division (see BinomialCoefficient function in ScalarEvolution.cpp).

For example, look at scev-expand-canonical-iv-type.ll test in this patch. In this example, loop-vectorize expands i8 {0,+,2,+,1} addrec in the loop without a canonical IV. It inserts a canonical IV of type i8 and uses in the expression to compute the value of the addrec at the given iteration.

The expression is:

((i * (i - 1)) /u 2 + 2 * i) mod 256

Using a i8 canonical IV effectively turns it into:

((i mod 256) * ((i mod 256) - 1)) /u 2 + 2 * (i mod 256)

This is not equal to the original expression, because mod and division (truncation and lshr) don't commutate. In this case we need to used a canonical IV of a wider type. For the exact SCEVs of this example see below [1].

SCEVExpander needs to be aware that expansion of an addrec might need an canonical IV of a type wider than the addrec. This patch fixes the issue by introducing SCEVAddRecExpr::minIterationWidthForEvaluateAtIteration method and using it in SCEVExpander.

This patch can be split into three distinct changes. I plan to split them when integrating, but post the initial review with all three in one for better context.

  1. Prepare SCEVExpander::visitAddRecExpr to use CanonicalIV wider than the addrec.
  2. Introduce SCEVAddRecExpr::minIterationWidthForEvaluateAtIteration, MinIterationWidthForBinomialCoefficient with an assertion in BinomialCoefficient.
  3. Use SCEVAddRecExpr::minIterationWidthForEvaluateAtIteration in SCEVExpander::visitAddRecExpr to compute the type of the canonical IV.

This patch is currently going through our internal fuzzing and performance testing.

[1] Debug output from evaluateAtIteration before the fix:

evaluateAtIteration this = {0,+,2,+,1}<%outer_loop>
evaluateAtIteration it = i8 %indvar
evaluateAtIteration result = ((trunc i9 (((zext i8 (-1 + %indvar) to i9) * (zext i8 %indvar to i9)) /u 2) to i8) + (2 * %indvar))

After the fix:

evaluateAtIteration this = {0,+,2,+,1}<%outer_loop>
evaluateAtIteration it = i9 %indvar
evaluateAtIteration result = ((trunc i9 (((-1 + %indvar) * %indvar) /u 2) to i8) + (2 * (trunc i9 %indvar to i8)))

Diff Detail

Event Timeline

apilipenko created this revision.May 28 2019, 5:27 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 28 2019, 5:27 PM
Herald added a subscriber: javed.absar. · View Herald Transcript
apilipenko edited the summary of this revision. (Show Details)May 28 2019, 5:27 PM

Artur, nice find. In terms of staging complexity, have you considered how impactful it would be to simply refuse to generate the value at the given iteration in this case? evaluateAtIteration is allowed to return SCEVCouldNotCompute. I'm tempted to first introduce a bailout for a quick correctness fix - maybe along side your assert to see if we're missing any other cases - and then spend more time considering your full fix.

One observation: I really don't think we want to be emitting an i9. We probably want to be rounding up to the nearest legal type. For i65, I doubt we want to be generating it at all. It's probably best to simply bail out in that case.

p.s. Please go ahead and land your preparatory changes (1). The context was helpful, but we should get those out of the way to simplify the rest of the review. (This is mostly just factoring out CanonicalIVType right?)

lib/Analysis/ScalarEvolutionExpander.cpp
1486

Removing this entirely seems like overkill. Maybe either add a restriction to a) affine addrecs or b) the bitwidth of the existing canonical IV is sufficient?

1568

Not sure you actually want getUnknown? Did you mean getSCEV?

Artur, nice find. In terms of staging complexity, have you considered how impactful it would be to simply refuse to generate the value at the given iteration in this case? evaluateAtIteration is allowed to return SCEVCouldNotCompute. I'm tempted to first introduce a bailout for a quick correctness fix - maybe along side your assert to see if we're missing any other cases - and then spend more time considering your full fix.

Yes, evaluateAtIteration can return SCEVCouldNotCompute, but it doesn't seem like SCEVExpander is ready for that. What we can do instead is to use expandAddRecExprLiterally for non-affine addrecs.

One observation: I really don't think we want to be emitting an i9. We probably want to be rounding up to the nearest legal type. For i65, I doubt we want to be generating it at all. It's probably best to simply bail out in that case.

evaluateAtIteration is already emitting i9 types for high order binomial coefficients. See the debug output in the review description. But I guess you are right, that we don't want to introduce canonical IVs of non-legal types. If we need to bail out for some types the the bail out would be expandAddRecExprLiterally.

In general I like the idea of using expandAddRecExprLiterally for non-affine addrecs, I'm going to run some performance experiments with this approach to see the impact.

...
In general I like the idea of using expandAddRecExprLiterally for non-affine addrecs, I'm going to run some performance experiments with this approach to see the impact.

Artur and I spoke offline. The workaround suggestion I made doesn't appear to really work well, but Artur is going to move forward with the expandAddRecExprLiterally idea when we'd overflow the IV type. This seems to be a practical fix for non-affine IVs.

Fall back to non-canonical mode for non-affine addrecs.

reames requested changes to this revision.Jun 3 2019, 2:04 PM
reames added inline comments.
include/llvm/Analysis/ScalarEvolutionExpressions.h
347

Does this mean that evaluateAtIteration may return an incorrect result? If so, I'd very much like to see an assert which trips on that usage.

lib/Analysis/ScalarEvolutionExpander.cpp
1486

How about only falling back to literal expansion for non-affine addrecs which actually need the wider expansion?

This revision now requires changes to proceed.Jun 3 2019, 2:04 PM
apilipenko abandoned this revision.Feb 21 2020, 4:30 PM

Finished by @ebrevnov as D65276.