Now for induction variables it's possible to have only -1 and +1 step values.
This patch adds support for other than +1 and -1 constant step values.
Details
Diff Detail
Event Timeline
Can you please include a more-verbose description of what's going on here? Since we don't have masked stores, I assume this affects only loops without stores (which means reductions). For reductions with non-unit stride, we can indeed load data in larger vectors and then only use some of the vector lanes for the reduced value (and I agree this is a useful capability to have), but for that I'd expect to see changes in the code after this comment:
if (VF > 1) {
  // VF is a power of 2 so we can emit the reduction using log2(VF) shuffles
  // and vector ops, reducing the set of values being computed by half each
  // round.but I don't see any changes there, so I don't understand how you're selecting out only the correct vector lanes.
Hi Alexey,
Did you verify the correctness and the performance impacts of this patch? You should run the LLVM test suite with your changes.
This patch looks very suspicious. The test that you are chancing (example13) has code that looks like this:
for (j = 0; j < N; j+=8) {
  diff += (a[i][j] - b[i][j]);
}How are you generating consecutive vector loads? This looks like a bug. Also, you did not explain what programs you are expecting to vectorize and how. This is a huge patch and you should explain what changes you made and why.
-Nadav
Hi Alexey,
I'm investigating a similar issue. I think your patch is great. I tried your patch with several benchmarks for correctness. But there are some failures in LNT (ClamAV, TSVC) and SPEC CPU2000 (164.gzip, 256.bzip2).
As this patch was sent two months ago. I'm not sure whether you are still working on this patch? Or I'd like to help to fix such failures and send it out again.
Thanks,
-Hao
Hi Hao,
I am not working on this patch now.
You can take it and fix the problems.
Thanks,
Alexey