We have a problem in SCEVExpander around expansion of non-linear addrecs. Expanding an addrec SCEVExpander emits an expression to compute the addrec at a given iteration number (using SCEVAddRecExpr::evaluateAtIteration). As an iteration number it uses the canonical induction variable of the loop. Canonical IV here is an IV starting at 0 and incremented by 1 on every iteration, {0,+,1}. When the loop doesn't have a canonical IV SCEVExpander inserts one. It uses the type of the addrec to be expanded as the type for the canonical IV. This is not always correct.
If the addrec to expand is a linear addrec {start,+,step}, the expression to compute the value at the given iteration i is:
(start + i * step) mod MaxType,
where MaxType is the maximum value in the type of the addrec.
A canonical IV of the same type as addrec corresponds to (i mod MaxType). Using this IV as the iteration number i works fine for linear addrecs:
(start + i * step) mod MaxType = (start mod MaxType) + (i mod MaxType) * (step mod MaxType)
This is because mod commutates with + and *, so we can sink mod (truncation) down to the operands.
But it's not correct for non-linear addrecs because the expression to compute the value on the given iteration involves division (see BinomialCoefficient function in ScalarEvolution.cpp).
For example, look at scev-expand-canonical-iv-type.ll test in this patch. In this example, loop-vectorize expands i8 {0,+,2,+,1} addrec in the loop without a canonical IV. It inserts a canonical IV of type i8 and uses in the expression to compute the value of the addrec at the given iteration.
The expression is:
((i * (i - 1)) /u 2 + 2 * i) mod 256
Using a i8 canonical IV effectively turns it into:
((i mod 256) * ((i mod 256) - 1)) /u 2 + 2 * (i mod 256)
This is not equal to the original expression, because mod and division (truncation and lshr) don't commutate. In this case we need to used a canonical IV of a wider type. For the exact SCEVs of this example see below [1].
SCEVExpander needs to be aware that expansion of an addrec might need an canonical IV of a type wider than the addrec. This patch fixes the issue by introducing SCEVAddRecExpr::minIterationWidthForEvaluateAtIteration method and using it in SCEVExpander.
This patch can be split into three distinct changes. I plan to split them when integrating, but post the initial review with all three in one for better context.
- Prepare SCEVExpander::visitAddRecExpr to use CanonicalIV wider than the addrec.
- Introduce SCEVAddRecExpr::minIterationWidthForEvaluateAtIteration, MinIterationWidthForBinomialCoefficient with an assertion in BinomialCoefficient.
- Use SCEVAddRecExpr::minIterationWidthForEvaluateAtIteration in SCEVExpander::visitAddRecExpr to compute the type of the canonical IV.
This patch is currently going through our internal fuzzing and performance testing.
[1] Debug output from evaluateAtIteration before the fix:
evaluateAtIteration this = {0,+,2,+,1}<%outer_loop> evaluateAtIteration it = i8 %indvar evaluateAtIteration result = ((trunc i9 (((zext i8 (-1 + %indvar) to i9) * (zext i8 %indvar to i9)) /u 2) to i8) + (2 * %indvar))
After the fix:
evaluateAtIteration this = {0,+,2,+,1}<%outer_loop> evaluateAtIteration it = i9 %indvar evaluateAtIteration result = ((trunc i9 (((-1 + %indvar) * %indvar) /u 2) to i8) + (2 * (trunc i9 %indvar to i8)))
Does this mean that evaluateAtIteration may return an incorrect result? If so, I'd very much like to see an assert which trips on that usage.