This is an archive of the discontinued LLVM Phabricator instance.

[DivRemPairs] Freeze operands if they can be undef values
ClosedPublic

Authored by aqjune on Mar 20 2020, 2:09 AM.

Details

Summary

DivRemPairs is unsound with respect to undef values.

// bb1:
//   %rem = srem %x, %y
// bb2:
//   %div = sdiv %x, %y
// -->
// bb1:
//   %div = sdiv %x, %y
//   %mul = mul %div, %y
//   %rem = sub %x, %mul

If X can be undef, X should be frozen first.
For example, let's assume that Y = 1 & X = undef:

  %div = sdiv undef, 1 // %div = undef
  %rem = srem undef, 1 // %rem = 0
=>
  %div = sdiv undef, 1 // %div = undef
  %mul = mul %div, 1   // %mul = undef
  %rem = sub %x, %mul  // %rem = undef - undef = undef

http://volta.cs.utah.edu:8080/z/m7Xrx5

Same for Y. If X = 1 and Y = (undef | 1), %rem in src is either 1 or 0,
but %rem in tgt can be one of many integer values.

This resolves https://bugs.llvm.org/show_bug.cgi?id=42619 .

This miscompilation disappears if undef value is removed, but it may take a while.
DivRemPair happens pretty late during the optimization pipeline, so this optimization seemed as a good candidate to fix without major regression using freeze than other broken optimizations.

Diff Detail

Event Timeline

aqjune created this revision.Mar 20 2020, 2:09 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 20 2020, 2:09 AM
aqjune edited the summary of this revision. (Show Details)Mar 20 2020, 2:11 AM
aqjune added subscribers: nlopes, regehr.

I agree that this is a good candidate for introducing more freeze.
Just to check my understanding of current behavior: the freeze insts will survive to SelectionDAGBuilder, and there they get removed?

llvm/lib/Transforms/Scalar/DivRemPairs.cpp
307

freezed -> frozen

315

freezed -> frozen

spatel added inline comments.Mar 20 2020, 5:00 AM
llvm/lib/Transforms/Scalar/DivRemPairs.cpp
318

I think this should either be "FreezeInst *" or "auto *" to conform to LLVM coding conventions. (Same for line 325).

aqjune added a comment.EditedMar 21 2020, 12:26 AM

I assessed the performance impact by running test-suite & comparing generated assemblies before/after this patch, and found that there was only one assembly file having diff.
It was SingleSource/Regression/C/gcc-c-torture/execute/scal-to-vec1.c , and I could find that freeze was not simplifying vector constant with non-undef integers. I'll update https://reviews.llvm.org/D76010 to cover it.

Just to check my understanding of current behavior: the freeze insts will survive to SelectionDAGBuilder, and there they get removed?

Yes, freeze is simply removed when lowering to SelDag.
The relevant patch is https://reviews.llvm.org/D29014 . I was postponing to landing it to master because MachineIR people didn't accept the patch.
I think it is a good idea to splitting the patch D29014 to cover only SelDag first. What do you think?

aqjune updated this revision to Diff 251824.Mar 21 2020, 12:27 AM

Address comments

aqjune marked 3 inline comments as done.Mar 21 2020, 12:27 AM
aqjune edited the summary of this revision. (Show Details)Mar 21 2020, 1:07 AM

Yes, freeze is simply removed when lowering to SelDag.
The relevant patch is https://reviews.llvm.org/D29014 . I was postponing to landing it to master because MachineIR people didn't accept the patch.
I think it is a good idea to splitting the patch D29014 to cover only SelDag first. What do you think?

Yes, let's split it up, so we can make progress. Thanks for pushing things forward!

spatel accepted this revision.Mar 22 2020, 8:39 AM

LGTM - but let's get the relevant parts of D76010 and D29014 committed first, so we don't introduce any known regressions/logic bugs.
You can make this patch depend on the others here in Phabricator to show the relationship.

This revision is now accepted and ready to land.Mar 22 2020, 8:39 AM

As D76702 is merged, I'm looking how SingleSource/Regression/C/gcc-c-torture/execute/scal-to-vec1.c is changed now

I checked that the generated IR of scal-to-vec1.c with this patch is equivalent to the one without this patch because freeze was constant-folded away.
May I land this? Now I'll work on resolving the regression that was described at D29014.

I checked that the generated IR of scal-to-vec1.c with this patch is equivalent to the one without this patch because freeze was constant-folded away.
May I land this? Now I'll work on resolving the regression that was described at D29014.

Yes, please land.

This revision was automatically updated to reflect the committed changes.

Hi, we're seeing a performance drop of 1.3% on SPEC 2017 mcf_r (compiled with LTO enabled) on AArch64 that bisects down to this patch. I'm testing whether D76010 happens to fix this regression (I'll comment when I get the results), but if not then this might need some investigation to see what's going on.

Hi, we're seeing a performance drop of 1.3% on SPEC 2017 mcf_r (compiled with LTO enabled) on AArch64 that bisects down to this patch. I'm testing whether D76010 happens to fix this regression (I'll comment when I get the results), but if not then this might need some investigation to see what's going on.

Hi, thank you for the info. Once the blocked transformation is identified, I'll help finding out solutions for the problem.

Hi, I can confirm that D76010 unfortunately doesn't fix the regression.

I've narrowed the problem down to Loop Strength Reduction. It looks like SCEV can't see "through" the freeze node.

I'll try to get a reduced reproducer that shows the problem; and I'll be happy to test a patch.

Thanks!

Hi, I can confirm that D76010 unfortunately doesn't fix the regression.

I've narrowed the problem down to Loop Strength Reduction. It looks like SCEV can't see "through" the freeze node.

I'll try to get a reduced reproducer that shows the problem; and I'll be happy to test a patch.

Thanks!

I see. This link might be helpful: https://reviews.llvm.org/D70623
A possible wokaround would be to remove nsw/nuw tags from induction variable and freeze the initial value.

loop:
  %i = phi %init, %i.inc.fr
  %i.inc = add nsw 1, %i
  %i.inc.fr = freeze %i.inc // we can optimize this out by removing nsw flag from %i.inc and freezing %init
  br (%i.inc.fr < %n), loop, exit
=>
  %init.fr = freeze %init
loop:
  %i = phi %init.fr, %i.inc
  %i.inc = add 1, %i
  br (%i.inc < %n), loop, exit

BTW, IIUC one of the main motivation of DivRemPairs was to help better codegen. If there is any (aggressive) optimization after DivRemPairs, wouldn't it possibly shuffle the arranged instructions?
I wonder whether making DivRemPairs fire after other optimizations in LTO makes sense.

[...] It looks like SCEV can't see "through" the freeze node. [...]

I see. This link might be helpful: https://reviews.llvm.org/D70623

Ah, so it is quite difficult to extend SCEV. (at least, it's making my head hurt already!)

I tried turning off DivRemPairs during the pre-link optimzations, but this also broke the optimization in SCEV/loop-reduce, so I don't think it will help to move DivRemPairs later in the pipe.

I'm not sure I understand the other workaround.

However, I did manage to reduce the regression:

$ clang --target=aarch64-linux-gnu -O3 -flto -fno-strict-aliasing -o reduced.o -c reduced.c
$ clang --target=aarch64-linux-gnu -O3 -flto -fno-strict-aliasing -o reduced reduced.o

reduced.c:

typedef long a;
struct arc {
  int b;
} * h;
typedef struct {
  a c;
  struct arc arcs;
  a d, e, f;
} g;
g j;
a k, m;
g *l;
a n();
int main() { n(&j); }
a o(a p) {
  a q = p % l->d;
  if (q > l->e)
    k = p / l->d + (l->e * l->f + (q - l->e) * (l->f - 1));
  else
    k = p / l->d + l->f;
  return k;
}
a n(g *p) {
  a i;
  for (i = 0; i < p->c; i++, h = &p->arcs + o(++m))
    h->b = m;
  return 0;
}

Thank you for the reduced example. I'll have a look.

Thanks, much appreciated!

Hi,

I wrote https://bugs.llvm.org/show_bug.cgi?id=45885 about a crash which starts happening with this patch.

My guess is that isGuaranteedNotToBeUndefOrPoison doesn't like "weird" code that may appear in blocks that are not reachable from entry.

Hi. okay, I'll have a look.