This is an archive of the discontinued LLVM Phabricator instance.

Disable "[SCEV] Prove implications of different type via truncation"
AbandonedPublic

Authored by bjope on Oct 4 2021, 8:59 AM.

Details

Summary

This disables the feature added in commit 5ef84688fba28b9f0f that
proved implications of different type via truncations. This due to
problems seen (and reported in PR51869) about infinite loops that
started to happen after that commit.

It was not trivial to just revert the offending commit, so I added
an option to make it possible to enable/disable the feature from the
command line instead. With the default currently being that the
feature is disabled.

Some things worth mentioning:

  • test/Analysis/ScalarEvolution/srem.ll was impacted by the original patch, but nowadays we get the same result regardless of scalar-evolution-prove-implications-via-truncation.
  • test/Transforms/IndVarSimplify/widen-loop-comp.ll did not exist(?) when the feature was introduced. The test case was already a bit special as it uses scalar-evolution-use-expensive-range-sharpening. Now it also use scalar-evolution-prove-implications-via-truncation, so it is not testing the default setting.
  • The ProveImplicationViaNarrowing unittest test case for SCEV that was added in commit 5ef84688fba28b9f0f still passes even when disabling the feature. So this patch simply leaves that test case as is.

Event Timeline

bjope created this revision.Oct 4 2021, 8:59 AM
bjope requested review of this revision.Oct 4 2021, 8:59 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 4 2021, 8:59 AM
nikic added a subscriber: nikic.Oct 4 2021, 9:22 AM

Can the test case be reduced? I don't think we want to add a 5000 line test case for this.

bjope added a comment.Oct 4 2021, 10:16 AM

Can the test case be reduced? I don't think we want to add a 5000 line test case for this.

I made some attempts using bugpoint earlier (and this is actually a bit reduced compared to the original test case). A bit tricky to set it up as the fail scenario is that there should be a timeout, and then it will take lots of time to get bugpoint to actually reduce anything.
I also made some attempts adding some kind of debug printouts to detect failure by identifying some repeated patterns when calling ScalarEvolution::isImpliedCond, but never managed to get that working properly.
If I knew what the problem is it might be easier to reduce it ;-)

If anyone got some ideas on how to tackle the problem with reducing it I wouldn't mind testing those ideas.

(Maybe I should give "bugpoint + timeout" another try. But I think I need quite a long timeout setting, so it takes some time if it only attempts to remove 1 instruction every 5 minutes or so.)

bjope planned changes to this revision.Oct 4 2021, 10:41 AM

Can the test case be reduced? I don't think we want to add a 5000 line test case for this.

I made some attempts using bugpoint earlier (and this is actually a bit reduced compared to the original test case). A bit tricky to set it up as the fail scenario is that there should be a timeout, and then it will take lots of time to get bugpoint to actually reduce anything.
I also made some attempts adding some kind of debug printouts to detect failure by identifying some repeated patterns when calling ScalarEvolution::isImpliedCond, but never managed to get that working properly.
If I knew what the problem is it might be easier to reduce it ;-)

If anyone got some ideas on how to tackle the problem with reducing it I wouldn't mind testing those ideas.

(Maybe I should give "bugpoint + timeout" another try. But I think I need quite a long timeout setting, so it takes some time if it only attempts to remove 1 instruction every 5 minutes or so.)

Was a couple of weeks ago that I tried reducing this the last time. Now I've at least managed to reduce it a bit more manually (removing some stores/calls/branches, and then running with -dce -simplifycfg, etc). I'll continue a bit more and then I'll update this patch.

bjope updated this revision to Diff 377039.Oct 4 2021, 3:18 PM

Reduced the test case. Maybe too much, because now it terminates after around 10 minutes. The previous (>5000 lines) version of the test case is still running after 2 hours (and I do not even know if it will terminate).

mkazantsev requested changes to this revision.Oct 4 2021, 10:52 PM

This patch is one year old. It is extremely surprising that you are trying to disable it rather than find the root cause.

I don't believe this is infinite recursion. Before all, infinite recursion should end up with stack overflow and not timeout. I can believe there is some exponential explosion happening, but in this case you need to find how exactly it happens and fix the acutal reason. The fact that this patch triggers something very slow doesn't convince me that this patch is guilty, and this test doesn't allow to figure it out easily.

Could you please check (at least, but not limited to):

  • Do we end up creating some HUGE SCEVs in your test?
  • Do we end up making the same implication query millions of times?
  • Do we end up walking huge dom trees for execution of some queries?

Before we decide to switch something off, I want to know why exactly.

This revision now requires changes to proceed.Oct 4 2021, 10:52 PM

I ran -analyze -scalar-evolution on this test and it indeed works slowly. It doesn't hang, though. It's just extremely slow work due to exponential behavior during range inference. Not obvious that it is somehow caused by trunc inference. Pretty sure it's possible to construct same test without trunc/sexts.

bjope added a comment.Oct 5 2021, 1:24 AM

This patch is one year old. It is extremely surprising that you are trying to disable it rather than find the root cause.

I don't believe this is infinite recursion. Before all, infinite recursion should end up with stack overflow and not timeout. I can believe there is some exponential explosion happening, but in this case you need to find how exactly it happens and fix the acutal reason. The fact that this patch triggers something very slow doesn't convince me that this patch is guilty, and this test doesn't allow to figure it out easily.

Could you please check (at least, but not limited to):

  • Do we end up creating some HUGE SCEVs in your test?
  • Do we end up making the same implication query millions of times?
  • Do we end up walking huge dom trees for execution of some queries?

Before we decide to switch something off, I want to know why exactly.

Hi @mkazantsev,

I have tried debugging PR51869 a bit, but I don't really know this code. I've wrote a bit about it in https://bugs.llvm.org/show_bug.cgi?id=51869, and added you as susbscriber, hoping to get some feedback or reactions. But since there was no reactions I did not add much info (I thought that maybe the author of the code at least could take a quick look to see that it wasn't an obvious mistake. I for example see that there are a couple of checks related to LHS before calling sKnownPredicate(ICmpInst::ICMP_ULE, FoundLHS, MaxValue), but nothing for RHS and it is the sKnownPredicate(ICmpInst::ICMP_ULE, FoundRHS, MaxValue) call that didn't seem to return (at least not with the reproducer in the PR51869). If I remember correctly I could not see any deep recursion, but there were lots of calls to ScalarEvolution::isImpliedCond( and I suspected some kind of ping-pong effect.

We've seen timeouts in several different test cases during the last year. And bisecting points at your commit every time. However, it was quite recently we managed to reduce one of those test cases down to something smaller (~5000 lines of IR, running on a single pass, without any downstream intrinsics and without any unusual address spaces etc). I think a collegue of mine has tried to reach out for help, and we have added some comments in https://reviews.llvm.org/D89548 trying to get some reactions. This ticket was kind of another attempt to get some help/feedback. Nice to see that it got some interest. However, this ticket is about disabling the analysis that is causing bad time complexity, and perhaps we should discuss the actual problem with PR51869 in bugzilla.

Btw, since the only test case that seem to validate that we actually get some transforms given that this feature is turned on is the test case using -scalar-evolution-use-expensive-range-sharpening which is disabled by default. So we do not really have any lit tests verifying that this extra analysis is working as intended, except for that special test case. That could actually be another reason to disable this, IMHO.

Anyway, I want to switch this off because it has been causing lots of trouble and I do not know exactly why. If I knew why, then hopefully that could be fixed instead. My idea was to leave that until someone bothered about PR51869 (or until someone though it would be worth trying to enable this feature again (because right now it seem a bit broken when it comes to time complexity). I'd be happy to see a proper solution to the problem though. So the I figure the goal should be to enable -scalar-evolution-prove-implications-via-truncation again later. But I think I need some help debugging and figuring out why we start to see these timeout problems given your patch.

fhahn added a subscriber: fhahn.Oct 5 2021, 2:17 AM

This patch is one year old. It is extremely surprising that you are trying to disable it rather than find the root cause.

I don't believe this is infinite recursion. Before all, infinite recursion should end up with stack overflow and not timeout. I can believe there is some exponential explosion happening, but in this case you need to find how exactly it happens and fix the acutal reason. The fact that this patch triggers something very slow doesn't convince me that this patch is guilty, and this test doesn't allow to figure it out easily.

Could you please check (at least, but not limited to):

  • Do we end up creating some HUGE SCEVs in your test?
  • Do we end up making the same implication query millions of times?
  • Do we end up walking huge dom trees for execution of some queries?

Before we decide to switch something off, I want to know why exactly.

I added a smaller (~500 lines) reproducer that takes ~15s on my machine on main: https://bugs.llvm.org/attachment.cgi?id=25328. With D111066 it goes down to 0.2s, which I think clearly shows that . So it's not an infinite loop but a compile-time explosion with ~60% of compile-time spent in evaluatePredicateAt, ~25% in getStrengthenedNoWrapFlagsFromBinOp and ~10% in getSCEV.

It looks related to the depth of chained basic blocks. The reproducer has a lot of basic block chained together with unconditional branches. I already removed a lot of similar chains to bring down the compile-time.

fhahn added a comment.Oct 5 2021, 2:18 AM

This patch is one year old. It is extremely surprising that you are trying to disable it rather than find the root cause.

I don't believe this is infinite recursion. Before all, infinite recursion should end up with stack overflow and not timeout. I can believe there is some exponential explosion happening, but in this case you need to find how exactly it happens and fix the acutal reason. The fact that this patch triggers something very slow doesn't convince me that this patch is guilty, and this test doesn't allow to figure it out easily.

Could you please check (at least, but not limited to):

  • Do we end up creating some HUGE SCEVs in your test?
  • Do we end up making the same implication query millions of times?
  • Do we end up walking huge dom trees for execution of some queries?

Before we decide to switch something off, I want to know why exactly.

I added a smaller (~500 lines) reproducer that takes ~15s on my machine on main: https://bugs.llvm.org/attachment.cgi?id=25328. With D111066 it goes down to 0.2s, which I think clearly shows that . So it's not an infinite loop but a compile-time explosion with ~60% of compile-time spent in evaluatePredicateAt, ~25% in getStrengthenedNoWrapFlagsFromBinOp and ~10% in getSCEV.

Ah the recent reproducer from @bjope shows the issue with even fewer lines!

bjope added inline comments.Oct 8 2021, 7:28 AM
llvm/lib/Analysis/ScalarEvolution.cpp
10666

I tried changing the order of this if statement and the nestled if-statement. I.e. doing the isImpliedCondBalancedTypes check first, then it will converge faster. Well at least for the reproducer.

No idea if it is more common that the isKnownPredicate checks will be true more often that the isImpliedCondBalancedTypes check. One reasoning could be that it perhaps converge faster since doing the new lookup on the truncated expressions is changing the state a bit more compared to just calling isKnownPredicate on the current state. But I have no idea really. Maybe one could gather some statistics for this based on some test suite.

bjope added a comment.Oct 19 2021, 8:22 AM

An alternative solution is presented in D112080 (using a different order for the checks involved).

bjope abandoned this revision.Oct 19 2021, 12:46 PM

Abandoned. Obsoleted by D111896.

llvm/lib/Analysis/ScalarEvolution.cpp