This is an archive of the discontinued LLVM Phabricator instance.

[SLP]Fix PR63854: Add proper sorting of pointers for masked stores.
ClosedPublic

Authored by ABataev on Aug 3 2023, 9:05 AM.

Details

Summary

If the masked gathers can be reordered, it may produce strided access
pattern and the reordering does not affect common reodering, better to
try to reorder masked gathers for better performance.

Diff Detail

Event Timeline

ABataev created this revision.Aug 3 2023, 9:05 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 3 2023, 9:05 AM
ABataev requested review of this revision.Aug 3 2023, 9:05 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 3 2023, 9:05 AM

I've skimmed this, and it looks reasonable to me, but not really an SLP reviewer.

I can confirm that this addresses the reduced case from https://github.com/llvm/llvm-project/issues/63854, but not the original un-reduced example from x264. I suspect the unreduced example is still hitting https://github.com/llvm/llvm-project/issues/63855.

I've skimmed this, and it looks reasonable to me, but not really an SLP reviewer.

I can confirm that this addresses the reduced case from https://github.com/llvm/llvm-project/issues/63854, but not the original un-reduced example from x264. I suspect the unreduced example is still hitting https://github.com/llvm/llvm-project/issues/63855.

I don't see how it is related to PR63855. Could you point, please?

reames added a comment.Aug 4 2023, 1:03 PM

I've skimmed this, and it looks reasonable to me, but not really an SLP reviewer.

I can confirm that this addresses the reduced case from https://github.com/llvm/llvm-project/issues/63854, but not the original un-reduced example from x264. I suspect the unreduced example is still hitting https://github.com/llvm/llvm-project/issues/63855.

I don't see how it is related to PR63855. Could you point, please?

For clarity, this patch is not. The original issue which sparked both of these was x264. I have not reanalyzed, but given only one of them is being fixed here, my assumption is that the other still stands.

ABataev updated this revision to Diff 551482.Aug 18 2023, 6:02 AM

Rebase, ping!

ABataev updated this revision to Diff 552119.Aug 21 2023, 1:31 PM

Rebase, ping!

reames accepted this revision.Aug 21 2023, 2:54 PM

I've skimmed this, and it looks reasonable to me, but not really an SLP reviewer.

Since no one else has chimed in, LGTM

This revision is now accepted and ready to land.Aug 21 2023, 2:54 PM

noticed that this patch causes a crash with this example code

$ cat t.c
int a[][1];
int b, c, d;
int *e, *f, *g;
int h[4];
void i() {
  int j, k, l;
  e = a[b];
  g = h;
  long m = e[1] * f[63];
  l = m >> c;
  g[32] = l;
  m = e[33] * f[62];
  k = m >> c;
  g[33] = k;
  m = e[7] * f[61];
  j = m >> c;
  g[34] = j;
  m = e[49] * f[60];
  d = m >> c;
  g[35] = d;
}
$ clang -cc1 -target-cpu alderlake -O3 -vectorize-slp -emit-llvm t.c

noticed that this patch causes a crash with this example code

$ cat t.c
int a[][1];
int b, c, d;
int *e, *f, *g;
int h[4];
void i() {
  int j, k, l;
  e = a[b];
  g = h;
  long m = e[1] * f[63];
  l = m >> c;
  g[32] = l;
  m = e[33] * f[62];
  k = m >> c;
  g[33] = k;
  m = e[7] * f[61];
  j = m >> c;
  g[34] = j;
  m = e[49] * f[60];
  d = m >> c;
  g[35] = d;
}
$ clang -cc1 -target-cpu alderlake -O3 -vectorize-slp -emit-llvm t.c

Thanks for report, will investigate it and fix ASAP

noticed that this patch causes a crash with this example code

$ cat t.c
int a[][1];
int b, c, d;
int *e, *f, *g;
int h[4];
void i() {
  int j, k, l;
  e = a[b];
  g = h;
  long m = e[1] * f[63];
  l = m >> c;
  g[32] = l;
  m = e[33] * f[62];
  k = m >> c;
  g[33] = k;
  m = e[7] * f[61];
  j = m >> c;
  g[34] = j;
  m = e[49] * f[60];
  d = m >> c;
  g[35] = d;
}
$ clang -cc1 -target-cpu alderlake -O3 -vectorize-slp -emit-llvm t.c

Thanks for report, will investigate it and fix ASAP

noticed that this patch causes a crash with this example code

$ cat t.c
int a[][1];
int b, c, d;
int *e, *f, *g;
int h[4];
void i() {
  int j, k, l;
  e = a[b];
  g = h;
  long m = e[1] * f[63];
  l = m >> c;
  g[32] = l;
  m = e[33] * f[62];
  k = m >> c;
  g[33] = k;
  m = e[7] * f[61];
  j = m >> c;
  g[34] = j;
  m = e[49] * f[60];
  d = m >> c;
  g[35] = d;
}
$ clang -cc1 -target-cpu alderlake -O3 -vectorize-slp -emit-llvm t.c

Must be fixed in 7ff83ed6cda068d99ec2926216d9868754da6e79