If the masked gathers can be reordered, it may produce strided access
pattern and the reordering does not affect common reodering, better to
try to reorder masked gathers for better performance.
Details
Details
Diff Detail
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Comment Actions
I've skimmed this, and it looks reasonable to me, but not really an SLP reviewer.
I can confirm that this addresses the reduced case from https://github.com/llvm/llvm-project/issues/63854, but not the original un-reduced example from x264. I suspect the unreduced example is still hitting https://github.com/llvm/llvm-project/issues/63855.
Comment Actions
For clarity, this patch is not. The original issue which sparked both of these was x264. I have not reanalyzed, but given only one of them is being fixed here, my assumption is that the other still stands.
Comment Actions
noticed that this patch causes a crash with this example code
$ cat t.c int a[][1]; int b, c, d; int *e, *f, *g; int h[4]; void i() { int j, k, l; e = a[b]; g = h; long m = e[1] * f[63]; l = m >> c; g[32] = l; m = e[33] * f[62]; k = m >> c; g[33] = k; m = e[7] * f[61]; j = m >> c; g[34] = j; m = e[49] * f[60]; d = m >> c; g[35] = d; } $ clang -cc1 -target-cpu alderlake -O3 -vectorize-slp -emit-llvm t.c
Comment Actions
Thanks for report, will investigate it and fix ASAP
Must be fixed in 7ff83ed6cda068d99ec2926216d9868754da6e79