This is an archive of the discontinued LLVM Phabricator instance.

[x86] Fix PR37100 by teaching the EFLAGS copy lowering to rewrite uses across basic blocks in the limited cases where it is very straight forward to do so.
ClosedPublic

Authored by chandlerc on Apr 15 2018, 9:29 AM.

Details

Summary

This will also be useful for other places where we do some limited
EFLAGS propagation across CFG edges and need to handle copy rewrites
afterward. I think this is rapidly approaching the maximum we can and
should be doing here. Everything else begins to require either heroic
analysis to prove how to do PHI insertion manually, or somehow managing
arbitrary PHI-ing of EFLAGS with general PHI insertion. Neither of these
seem at all promising so if those cases come up, we'll almost certainly
need to rewrite the parts of LLVM that produce those patterns.

We do now require dominator trees in order to reliably diagnose patterns
that would require PHI nodes. This is a bit unfortunate but it seems
better than the completely mysterious crash we would get otherwise.

Diff Detail

Repository
rL LLVM

Event Timeline

chandlerc created this revision.Apr 15 2018, 9:29 AM
emaste added a subscriber: emaste.Apr 15 2018, 8:03 PM
rnk added a subscriber: rnk.Apr 16 2018, 2:26 PM
rnk added inline comments.
llvm/test/CodeGen/X86/copy-eflags.ll
293 ↗(On Diff #142573)

This infinite loop with selects of undef feels over-reduced. It seems fragile, since a pass could come along and optimize the select to the non-undef operand.

I liked the test case from PR37133, but it's reliance on debug info (any, not just standalone) scares me:

int a, b;
unsigned long long c;
void e() {
  long long d;
  b = 33;
  d = a;
  if (c < a) {
    b = 32;
    d = c;
  }
  if (d)
    b = 2;
}

We should reopen it or file a split separate bug, because we generate two more BBs on that code when debug info is enabled. :(

In any case, is there a better way to write this test case using legalized arithmetic? An i128 x64 test case with live-out eflags might also be interesting.

chandlerc added inline comments.Apr 16 2018, 3:37 PM
llvm/test/CodeGen/X86/copy-eflags.ll
293 ↗(On Diff #142573)

I actually tried some to remove the undef stuff. I couldn't find a pattern that triggered the bug. =[ I guess I can keep trying...

chandlerc updated this revision to Diff 142721.Apr 16 2018, 5:03 PM

Arrange this test case to be slightly less brittle.

chandlerc added inline comments.Apr 16 2018, 5:06 PM
llvm/test/CodeGen/X86/copy-eflags.ll
293 ↗(On Diff #142573)

Got the undefs out. How much more should I do here?

rnk accepted this revision.Apr 17 2018, 9:44 AM

lgtm

llvm/test/CodeGen/X86/copy-eflags.ll
293 ↗(On Diff #142573)

I guess the live-out eflags are getting created by a combination of SBB and select-to-jump x86 lowering logic. I guess this is as reduced as we can get.

This revision is now accepted and ready to land.Apr 17 2018, 9:44 AM
This revision was automatically updated to reflect the committed changes.
mzolotukhin added inline comments.
llvm/trunk/lib/Target/X86/X86FlagsCopyLowering.cpp
445–457

Could all of this be under DEBUG? The fatal error we're reporting here looks like an internal compiler assert anyway.
If we put in entirely under DEBUG, we can avoid building dominators in Release compilers.

chandlerc added inline comments.Apr 19 2018, 12:23 AM
llvm/trunk/lib/Target/X86/X86FlagsCopyLowering.cpp
445–457

The reason we've kept a number of these as fatal errors is due to our concern about latent problems in LLVM. We're essentially introducing a pretty novel set of constraints on what kinds of EFLAGS copies can be lowered, and trying to get somewhat more reliable failure modes if this bites us...

And at least one of these already helped uncover a bug (that this patch is fixing)...

Is the domtree analysis cost showing up significantly in compile times for you? Are there other concerns?

mzolotukhin added inline comments.Apr 19 2018, 12:43 AM
llvm/trunk/lib/Target/X86/X86FlagsCopyLowering.cpp
445–457

I see a value of having it on at least for some time, but I think in the long run we should turn it into an assert. Would it be possible?

I just noticed a new pass in the pipeline and wanted to check if it can be avoided. On O0 dom-tree analysis costs ~0.25% geomean (on CTMark), so each separate invocation doesn't seem to cost much, but they add up eventually.

By the way, why didn't we catch the bug you are talking about with RA compiler bots? Maybe that's an argument for adding more RA bots?

chandlerc added inline comments.Apr 19 2018, 1:26 AM
llvm/trunk/lib/Target/X86/X86FlagsCopyLowering.cpp
445–457

Sorry, I misunderstood. Long term, we can almost certainly move all of this to an assert-only kind of check. We just are benefiting (for now) from more aggressive checking.

But you ask a great question: why don't the release+asserts deployments catch these?

The frustrating thing is that some of the patterns here are *really* rare, or really tricky to reproduce. One of the test cases we got only reproduces when you build the input with debug info enabled! We still don't know why. The other test case only reproduces on 32-bit x86, where we (sadly) have much less testing.

It's also somewhat rare for benchmarks to hit this because the old lowering was *really* slow. If it were happening frequently in benchmarks, it might have been fixed sooner or the benchmarks would have been changed to not do that.... =/