Hi,
Now for induction variables it's possible to have only -1 and +1 step values. But for induction variables with other steps, LV will do nothing.
Alexey wrote a path in D6051 to support arbitrary induction variable steps. I think his patch is very useful so I toke it over and fixed some bugs (Minor bugs about induction calculation which caused some runtime failures in SPEC2000 and LNT). Now this patch can pass our internal tests.
There are two kinds of induction variables:
- integer induction: for (int i = 0; i < 1024; i+=2) { int tmp = *A++; sum += i * tmp; }
"i" is an integer induction variable of step 2. Actually such case can be well vectorized if we support arbitrary induction variable steps.
- pointer induction: for (int i = 0; i < 1024; i++) { int tmp0 = *A++; int tmp1 = *A++; sum += tmp0 * tmp1; }
pointer "A" is an pointer induction variable of step 2. Even we support arbitrary stepsCurrently, we still can not vectorize such case well. LoopVectorizer will say "vectorization is possible but not benefical". But we still can force the LoopVectorizer to do vectorization.
Actually if the targets support masked/interleaved load/store, we can vectorize the second case very well. For example, AArch64 backend supports interleaved load/store. To vectorize "tmp0" and "tmp1", we only need one interleaved load such as "LD2 {V0, V1}, [X0]". Vector register V0 and V1 will contain interleaved data. V0 contains "A[0], A[2], A[4], A[6]", and V1 contains "A[1], A[3], A[5], A[7]".
This patch has no big impact on performance according to the tests on AArch64 targets. There are few cases like the first case. There are many cases like the second case, the loop vectorizer thinks it is not beneficial to do vectorization, and most likely it will just do interleave. But I think if we support masked/interleaved load/store in the future, we can get many performance improvements. But I think the first step is that the LoopVectorizer should support arbitrary induction variable steps.
Review please.
Thanks,
-Hao
I prefer that is* function return a boolean, but this returns a (-1, 0, 1) value. Maybe getConsecutiveDirection() would be better?