This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Analysis/
-
Analysis/
3/3
ScalarEvolution.cpp
-
test/Analysis/ScalarEvolution/
-
Analysis/
-
ScalarEvolution/
2/2
max-backedge-taken-count-guard-info-rewrite-expressions.ll

Differential D113578

[SCEV] Apply loop guards when computing max BTC for arbitrary steps.
ClosedPublic

Authored by fhahn on Nov 10 2021, 9:33 AM.

Download Raw Diff

Details

Reviewers

reames
nikic
mkazantsev
efriedma

Commits

rGe8b55cf7b70a: [SCEV] Apply loop guards when computing max BTC for arbitrary steps.

Summary

Similar other cases in the current function (e.g. when the step is 1 or
-1), applying loop guards can lead to tighter upper bounds for the
backedge-taken counts.

Fixes PR52464.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

fhahn created this revision.Nov 10 2021, 9:33 AM

Herald added a subscriber: hiraditya. · View Herald TranscriptNov 10 2021, 9:33 AM

fhahn requested review of this revision.Nov 10 2021, 9:33 AM

Herald added a project: Restricted Project. · View Herald TranscriptNov 10 2021, 9:33 AM

fhahn added a child revision: D113577: [SCEV] Support rewriting ZExt expressions with loop guard info..Nov 10 2021, 9:34 AM

nikic added inline comments.Nov 10 2021, 9:46 AM

llvm/lib/Analysis/ScalarEvolution.cpp
9677	Same as in other cases, this assertions likely doesn't always hold.

Harbormaster completed remote builds in B133510: Diff 386208.Nov 10 2021, 10:27 AM

mkazantsev added inline comments.Nov 10 2021, 9:58 PM

llvm/lib/Analysis/ScalarEvolution.cpp
9677	Isn't breach of this is a bug we want to fix?

I'd rather move the assert inside of applyLoopGuards. If it breaks, it's a subject for improving our reasoning. I can imagine it could be breaking because of issues with caching (e.g. if addrecs are involved), though. So, unless the failures are mass and disruptive, I'd prefer it to stay and catch us opportunities.

In D113578#3123701, @mkazantsev wrote:

I'd rather move the assert inside of applyLoopGuards. If it breaks, it's a subject for improving our reasoning. I can imagine it could be breaking because of issues with caching (e.g. if addrecs are involved), though. So, unless the failures are mass and disruptive, I'd prefer it to stay and catch us opportunities.

We already know that the assertion is going to fail and also know why. See https://reviews.llvm.org/D102267#inline-967507 for two suggestions on how to avoid pessimization, or at least the known cases of it. This should either be addressed first, or the assert should be omitted.

fhahn mentioned this in rG69c1cbe20f5d: [SCEV] Add test case where applying zext info pessimizes BTC..Nov 12 2021, 4:20 AM

Replace over-aggressive assertion with selecting the minimum max BTC. Added test case where the assert is voilated in 69c1cbe20f5d.

fhahn marked 2 inline comments as done.Nov 12 2021, 4:29 AM

fhahn added inline comments.

llvm/lib/Analysis/ScalarEvolution.cpp
9677	Same as in other cases, this assertions likely doesn't always hold. Yeah, turns out I was a bit too optimistic. I didn't hit the assertion when building a larger number of projects, but I managed to construct a crashing test from other existing problematic tests. I added the test in 69c1cbe20f5d and turned the assertion into a select of the minimum. I am also looking into weeding out the cases where applying the guard info pessimizes results. Unfortunately there are some cases where it pessimizes the range even if we only allow compares with constants. I'm still digging into the details, but it might be a case where the original range is not correct. I'm planning on sharing more on that soon.

Harbormaster completed remote builds in B133925: Diff 386803.Nov 12 2021, 5:23 AM

llvm/test/Analysis/ScalarEvolution/max-backedge-taken-count-guard-info-rewrite-expressions.ll
66–67	The zext is not in the condition here, so it doesn't really seem relevant?

This revision is now accepted and ready to land.Nov 12 2021, 9:03 AM

LGTM as well. Amusingly, I'd stumbled into a variant of this as well just yesterday. :)

rebase after landing D113577

Harbormaster completed remote builds in B134466: Diff 387552.Nov 16 2021, 3:52 AM

This revision was landed with ongoing or failed builds.Nov 17 2021, 3:01 AM

Closed by commit rGe8b55cf7b70a: [SCEV] Apply loop guards when computing max BTC for arbitrary steps. (authored by fhahn). · Explain Why

This revision was automatically updated to reflect the committed changes.

fhahn added a commit: rGe8b55cf7b70a: [SCEV] Apply loop guards when computing max BTC for arbitrary steps..

fhahn marked an inline comment as done.Nov 17 2021, 4:32 AM

fhahn added inline comments.

llvm/test/Analysis/ScalarEvolution/max-backedge-taken-count-guard-info-rewrite-expressions.ll
66–67	Thanks, I adjusted the name to `@rewrite_zext_with_info_from_icmp_ne` to hopefully make it a bit clearer in the committed version.

Hi Florian,

this revision is causing compiler crashes on a number of translation units in our code. This _may_ be an increase in stack depth, but I'm not sure yet. Is significantly increased stack depth expected after this patch? Is it justifiable?

The failures look more or less like this:

1.	<eof> parser at end of file
2.	Code generation
3.	Running pass 'Function Pass Manager' on module '...'.
4.	Running pass 'Loop Pass Manager' on function '@...'
5.	Running pass 'Induction Variable Users' on basic block '%107409'
  #0 0x0000562126256c38 llvm::sys::RunSignalHandlers() (clang+0x6c56c38)
  #1 0x000056212625926c SignalHandler(int) (clang+0x6c5926c)
  #2 0x00007faea56f9750 __restore_rt (libpthread.so.0+0x15750)
  #3 0x0000562125ec3b1b computeKnownBitsFromAssume(llvm::Value const*, llvm::KnownBits&, unsigned int, (anonymous namespace)::Query const&) (clang+0x68c3b1b)
  #4 0x0000562125eae55d computeKnownBits(llvm::Value const*, llvm::KnownBits&, unsigned int, (anonymous namespace)::Query const&) (clang+0x68ae55d)
  #5 0x0000562125ec2de5 computeKnownBitsFromOperator(llvm::Operator const*, llvm::APInt const&, llvm::KnownBits&, unsigned int, (anonymous namespace)::Query const&) (clang+0x68c2de5)
  #6 0x0000562125eaeb6e computeKnownBits(llvm::Value const*, llvm::APInt const&, llvm::KnownBits&, unsigned int, (anonymous namespace)::Query const&) (clang+0x68aeb6e)
  #7 0x0000562125eae55d computeKnownBits(llvm::Value const*, llvm::KnownBits&, unsigned int, (anonymous namespace)::Query const&) (clang+0x68ae55d)
  #8 0x0000562125eaecd9 llvm::computeKnownBits(llvm::Value const*, llvm::DataLayout const&, unsigned int, llvm::AssumptionCache*, llvm::Instruction const*, llvm::DominatorTree const*, llvm::OptimizationRemark
Emitter*, bool) (clang+0x68aecd9)
  #9 0x0000562125e47339 llvm::ScalarEvolution::GetMinTrailingZerosImpl(llvm::SCEV const*) (clang+0x6847339)
 #10 0x0000562125e3143a llvm::ScalarEvolution::GetMinTrailingZeros(llvm::SCEV const*) (clang+0x683143a)
 #11 0x0000562125e482ea llvm::ScalarEvolution::getRangeRef(llvm::SCEV const*, llvm::ScalarEvolution::RangeSignHint) (clang+0x68482ea)
 #12 0x0000562125e3ba27 StrengthenNoWrapFlags(llvm::ScalarEvolution*, llvm::SCEVTypes, llvm::ArrayRef<llvm::SCEV const*>, llvm::SCEV::NoWrapFlags) (clang+0x683ba27)
 #13 0x0000562125e2facf llvm::ScalarEvolution::getAddExpr(llvm::SmallVectorImpl<llvm::SCEV const*>&, llvm::SCEV::NoWrapFlags, unsigned int) (clang+0x682facf)
 #14 0x0000562125e3c19d llvm::ScalarEvolution::getGEPExpr(llvm::GEPOperator*, llvm::SmallVectorImpl<llvm::SCEV const*> const&) (clang+0x683c19d)
 #15 0x0000562125e46e7e llvm::ScalarEvolution::createNodeForGEP(llvm::GEPOperator*) (clang+0x6846e7e)
 #16 0x0000562125e3fe3d llvm::ScalarEvolution::createSCEV(llvm::Value*) (clang+0x683fe3d)
 #17 0x0000562125e38c60 llvm::ScalarEvolution::getSCEV(llvm::Value*) (clang+0x6838c60)
 #18 0x0000562125e452f4 llvm::ScalarEvolution::createAddRecFromPHI(llvm::PHINode*) (clang+0x68452f4)
 #19 0x0000562125e46938 llvm::ScalarEvolution::createNodeForPHI(llvm::PHINode*) (clang+0x6846938)
 #20 0x0000562125e3ff14 llvm::ScalarEvolution::createSCEV(llvm::Value*) (clang+0x683ff14)
 #21 0x0000562125e38c60 llvm::ScalarEvolution::getSCEV(llvm::Value*) (clang+0x6838c60)
 #22 0x0000562125e3bd47 llvm::ScalarEvolution::getGEPExpr(llvm::GEPOperator*, llvm::SmallVectorImpl<llvm::SCEV const*> const&) (clang+0x683bd47)
 #23 0x0000562125e46e7e llvm::ScalarEvolution::createNodeForGEP(llvm::GEPOperator*) (clang+0x6846e7e)
 #24 0x0000562125e3fe3d llvm::ScalarEvolution::createSCEV(llvm::Value*) (clang+0x683fe3d)
 #25 0x0000562125e38c60 llvm::ScalarEvolution::getSCEV(llvm::Value*) (clang+0x6838c60)
 #26 0x0000562125e4dd1d llvm::ScalarEvolution::applyLoopGuards(llvm::SCEV const*, llvm::Loop const*) (clang+0x684dd1d)
...

I'm now working on a self-contained test case, but if you have any ideas of fixes, I can verify them on our real code.

Regards,
Alex

I've created a standalone test case for the issue:

q2.cc194 KBDownload

Relevant compiler invocations:

$ ./clang-before -cc1 -triple x86_64-unknown-linux -S -target-feature +sse4.2 -fcolor-diagnostics -xc++ -std=c++17 -w -fsized-deallocation -O2 q2.cc

$ ./clang-after -cc1 -triple x86_64-unknown-linux -S -target-feature +sse4.2 -fcolor-diagnostics -xc++ -std=c++17 -w -fsized-deallocation -O2 q2.cc
Stack dump:
0.      Program arguments: ./clang-after -cc1 -triple x86_64-unknown-linux -S -target-feature +sse4.2 -fcolor-diagnostics -xc++ -std=c++17 -w -fsized-deallocation -O2 q2.cc
1.      <eof> parser at end of file
2.      Code generation
3.      Running pass 'Function Pass Manager' on module 'q2.cc'.
4.      Running pass 'Loop Pass Manager' on function '@_Z1fv'
5.      Running pass 'Induction Variable Users' on basic block '%arraydestroy.body85833'
...

I observe the problem on Linux on x86-64.

Please investigate the failure. If there's no obvious fix, please revert the patch while you're working on a solution.

A bit cleaner test case:

q2.cc93 KBDownload

$ ./clang-before -cc1 -triple x86_64-unknown-linux -S -target-feature +sse4.2 -O2 q2.cc
$ ./clang-after -cc1 -triple x86_64-unknown-linux -S -target-feature +sse4.2 -O2 q2.cc
Stack dump:
0.      Program arguments: ./clang-after -cc1 -triple x86_64-unknown-linux -S -target-feature +sse4.2 -O2 q2.cc
1.      <eof> parser at end of file
2.      Code generation
3.      Running pass 'Function Pass Manager' on module 'q2.cc'.
4.      Running pass 'Loop Pass Manager' on function '@_Z1fv'
5.      Running pass 'Induction Variable Users' on basic block '%arraydestroy.body86029'
...

Alternatively, the corresponding IR file:

q2.ll18 MBDownload

$ ./clang-after -cc1 -triple x86_64-unknown-linux -S -target-feature +sse4.2 -O2 q2.ll
Stack dump:
0.      Program arguments: ./clang-after -cc1 -triple x86_64-unknown-linux -S -target-feature +sse4.2 -O2 q2.ll
1.      Code generation
2.      Running pass 'Function Pass Manager' on module 'q2.ll'.
3.      Running pass 'Loop Pass Manager' on function '@_Z1fv'
4.      Running pass 'Induction Variable Users' on basic block '%arraydestroy.body86029'
...

In D113578#3188878, @alexfh wrote:

A bit cleaner test case:

q2.cc93 KBDownload

$ ./clang-before -cc1 -triple x86_64-unknown-linux -S -target-feature +sse4.2 -O2 q2.cc
$ ./clang-after -cc1 -triple x86_64-unknown-linux -S -target-feature +sse4.2 -O2 q2.cc
Stack dump:
0.      Program arguments: ./clang-after -cc1 -triple x86_64-unknown-linux -S -target-feature +sse4.2 -O2 q2.cc
1.      <eof> parser at end of file
2.      Code generation
3.      Running pass 'Function Pass Manager' on module 'q2.cc'.
4.      Running pass 'Loop Pass Manager' on function '@_Z1fv'
5.      Running pass 'Induction Variable Users' on basic block '%arraydestroy.body86029'
...

Alternatively, the corresponding IR file:

q2.ll18 MBDownload

$ ./clang-after -cc1 -triple x86_64-unknown-linux -S -target-feature +sse4.2 -O2 q2.ll
Stack dump:
0.      Program arguments: ./clang-after -cc1 -triple x86_64-unknown-linux -S -target-feature +sse4.2 -O2 q2.ll
1.      Code generation
2.      Running pass 'Function Pass Manager' on module 'q2.ll'.
3.      Running pass 'Loop Pass Manager' on function '@_Z1fv'
4.      Running pass 'Induction Variable Users' on basic block '%arraydestroy.body86029'
...

Thanks for the heads-up, I'll take a look now. But from the stack trace you shared, I doubt that the patch itself is causing the crash, but rather exposes an existing bug in GetMinTrailingZeros/computeKnownBits

In D113578#3191883, @fhahn wrote:

Thanks for the heads-up, I'll take a look now. But from the stack trace you shared, I doubt that the patch itself is causing the crash, but rather exposes an existing bug in GetMinTrailingZeros/computeKnownBits

Right, I've seen this sort of stuff a few times recently. I guess, in a sufficiently complex system this is not a rare occasion. Unfortunately, the burden of mitigating the underlying issue in practice frequently lies with the author of the patch exposing the issue (unless someone else volunteers to do this).

Do you see an obvious fix for the bug you mentioned or should we revert this patch while you're investigating?

In D113578#3188065, @alexfh wrote:

Unfortunately, the burden of mitigating the underlying issue in practice frequently lies with the author of the patch exposing the issue (unless someone else volunteers to do this).

This isn't quite as clear cut as you make it out to be here. Yes, we will frequently fix issues exposed in the process of introducing an unrelated bug. However, the standard is not (and has never been) any reported issue which happens to be exposed. We will certainly fix upstream tests, common workloads, and quickly reported issues, but at some point, the responsibility shifts to the downstream maintainer. That's one of the reasons rapid reporting is so strongly encouraged.

To be clear, I'm not commenting on which this situation might be. I'm just making the general point that this is a lot more complicated than your wording implies.

In D113578#3192532, @reames wrote:

In D113578#3188065, @alexfh wrote:

Unfortunately, the burden of mitigating the underlying issue in practice frequently lies with the author of the patch exposing the issue (unless someone else volunteers to do this).

This isn't quite as clear cut as you make it out to be here. Yes, we will frequently fix issues exposed in the process of introducing an unrelated bug. However, the standard is not (and has never been) any reported issue which happens to be exposed. We will certainly fix upstream tests, common workloads, and quickly reported issues, but at some point, the responsibility shifts to the downstream maintainer. That's one of the reasons rapid reporting is so strongly encouraged.

To be clear, I'm not commenting on which this situation might be. I'm just making the general point that this is a lot more complicated than your wording implies.

Thanks for the comment. Note that I used "frequently", not "always". I'm not trying to impose any rules, just stating my observation.

As for rapid vs delayed reporting, IMO, the main difference is whether the patch got dependencies (in LLVM code, in downstream LLVM dependencies, in code compiled by LLVM). Does it sound right to you?

In D113578#3192613, @alexfh wrote:

As for rapid vs delayed reporting, IMO, the main difference is whether the patch got dependencies (in LLVM code, in downstream LLVM dependencies, in code compiled by LLVM). Does it sound right to you?

Can you rephrase this? I'm not sure what you mean by "got dependencies", and am not sure what you're trying to ask here.

I took a closer look and confirmed that the patch indeed slightly increases the call stack for the reproducer and this causes the stack overflow.

The issue is roughly the following: we apply the loop guards to the last loop in a large chain of loops. The loop guard itself is the exit condition of an earlier loop. To construct the SCEV, we compute the exit count along one of the code paths. This in turn applies loop guards which is in an earlier predecessor loop. To construct the SCEV, we again need to compute the exit count of this loop, applying loop guards and so on.

This can be fixed by adjusting the order we apply loop guards. At the moment, the guard closest to the starting loop is evaluated first, triggering the large evaluation chain. If we instead apply the earliest guard first no excessive call chain is needed. This seems to have a tiny positive compile-time impact http://llvm-compile-time-tracker.com/compare.php?from=529833377ccdf4381f8bc9961bfa96ec4f5e2eed&to=b07452a70ac22ae4b1f0dbf4b84df3ee44c171a1&stat=instructions

It is not quite NFC, because the order in which guards are applied can impact the result in some cases. I looked into some and so far it looks like this new order makes applying loop guards slightly more effective. I am planning on isolating a test that shows the improved analysis results and commit the fix soonish.

In D113578#3194811, @fhahn wrote:

I took a closer look and confirmed that the patch indeed slightly increases the call stack for the reproducer and this causes the stack overflow.

The issue is roughly the following: we apply the loop guards to the last loop in a large chain of loops. The loop guard itself is the exit condition of an earlier loop. To construct the SCEV, we compute the exit count along one of the code paths. This in turn applies loop guards which is in an earlier predecessor loop. To construct the SCEV, we again need to compute the exit count of this loop, applying loop guards and so on.

This can be fixed by adjusting the order we apply loop guards. At the moment, the guard closest to the starting loop is evaluated first, triggering the large evaluation chain. If we instead apply the earliest guard first no excessive call chain is needed. This seems to have a tiny positive compile-time impact http://llvm-compile-time-tracker.com/compare.php?from=529833377ccdf4381f8bc9961bfa96ec4f5e2eed&to=b07452a70ac22ae4b1f0dbf4b84df3ee44c171a1&stat=instructions

It is not quite NFC, because the order in which guards are applied can impact the result in some cases. I looked into some and so far it looks like this new order makes applying loop guards slightly more effective. I am planning on isolating a test that shows the improved analysis results and commit the fix soonish.

Thanks! I guess, I can come up a couple of distinct test cases that have the same effect (compiler stack overflow). Would you like to have a look at them as well?

In D113578#3195050, @alexfh wrote:

In D113578#3194811, @fhahn wrote:

I took a closer look and confirmed that the patch indeed slightly increases the call stack for the reproducer and this causes the stack overflow.

The issue is roughly the following: we apply the loop guards to the last loop in a large chain of loops. The loop guard itself is the exit condition of an earlier loop. To construct the SCEV, we compute the exit count along one of the code paths. This in turn applies loop guards which is in an earlier predecessor loop. To construct the SCEV, we again need to compute the exit count of this loop, applying loop guards and so on.

This can be fixed by adjusting the order we apply loop guards. At the moment, the guard closest to the starting loop is evaluated first, triggering the large evaluation chain. If we instead apply the earliest guard first no excessive call chain is needed. This seems to have a tiny positive compile-time impact http://llvm-compile-time-tracker.com/compare.php?from=529833377ccdf4381f8bc9961bfa96ec4f5e2eed&to=b07452a70ac22ae4b1f0dbf4b84df3ee44c171a1&stat=instructions

It is not quite NFC, because the order in which guards are applied can impact the result in some cases. I looked into some and so far it looks like this new order makes applying loop guards slightly more effective. I am planning on isolating a test that shows the improved analysis results and commit the fix soonish.

Thanks! I guess, I can come up a couple of distinct test cases that have the same effect (compiler stack overflow). Would you like to have a look at them as well?

The fix is available here: https://github.com/llvm/llvm-project/commit/b07452a70ac22ae4b1f0dbf4b84df3ee44c171a1, in case you want to give it a try against the other test cases.

(general discussion, not necessarily related to this patch)

In D113578#3192620, @reames wrote:

In D113578#3192613, @alexfh wrote:

As for rapid vs delayed reporting, IMO, the main difference is whether the patch got dependencies (in LLVM code, in downstream LLVM dependencies, in code compiled by LLVM). Does it sound right to you?

Can you rephrase this? I'm not sure what you mean by "got dependencies", and am not sure what you're trying to ask here.

You mentioned quickly reported issues ("We will certainly fix upstream tests, common workloads, and quickly reported issues, but at some point, the responsibility shifts to the downstream maintainer. That's one of the reasons rapid reporting is so strongly encouraged."), and I wanted to gauge my understanding of this. I see multiple reasons why quickly reported issues may be handled differently:

as the time passes, the author of the patch may have switched context and it would be more effort to fix the issue;
the patch that caused the issue may be more difficult to revert cleanly or even changed significantly, when
1. more patches landed touching the same lines of code
2. other project code started depending on the new interfaces or behavior introduced by the patch
3. something outside the project started depending on the new APIs, features, or behavior introduced by the patch.
the quicker the issue gets reported, the more the chances that it's a widespread issue with a higher impact.

Do these sound about right? Am I missing something else? What's your view on the relative importance of the factors above?

In D113578#3195059, @fhahn wrote:

In D113578#3195050, @alexfh wrote:

In D113578#3194811, @fhahn wrote:

I took a closer look and confirmed that the patch indeed slightly increases the call stack for the reproducer and this causes the stack overflow.

The issue is roughly the following: we apply the loop guards to the last loop in a large chain of loops. The loop guard itself is the exit condition of an earlier loop. To construct the SCEV, we compute the exit count along one of the code paths. This in turn applies loop guards which is in an earlier predecessor loop. To construct the SCEV, we again need to compute the exit count of this loop, applying loop guards and so on.

This can be fixed by adjusting the order we apply loop guards. At the moment, the guard closest to the starting loop is evaluated first, triggering the large evaluation chain. If we instead apply the earliest guard first no excessive call chain is needed. This seems to have a tiny positive compile-time impact http://llvm-compile-time-tracker.com/compare.php?from=529833377ccdf4381f8bc9961bfa96ec4f5e2eed&to=b07452a70ac22ae4b1f0dbf4b84df3ee44c171a1&stat=instructions

It is not quite NFC, because the order in which guards are applied can impact the result in some cases. I looked into some and so far it looks like this new order makes applying loop guards slightly more effective. I am planning on isolating a test that shows the improved analysis results and commit the fix soonish.

Thanks! I guess, I can come up a couple of distinct test cases that have the same effect (compiler stack overflow). Would you like to have a look at them as well?

The fix is available here: https://github.com/llvm/llvm-project/commit/b07452a70ac22ae4b1f0dbf4b84df3ee44c171a1, in case you want to give it a try against the other test cases.

The patch fixes the crash, but it looks like the compilation time with the patch grows by a factor of ~7 for one of the problematic files. I'll try to make a cleaner experiment with two compilers built with exactly the same compiler options, but I doubt it will change much.

In D113578#3195097, @alexfh wrote:

(general discussion, not necessarily related to this patch)

(snip)
Do these sound about right? Am I missing something else? What's your view on the relative importance of the factors above?

You hit a bunch of important points. Here's a couple extra:

We want incentives to encourage following upstream development. We don't want large amount of testing happening far from ToT. We want to encourage testing as close to ToT as practical because that has the broadest community benefit.
Some of our contributors are volunteers. If they spent all of their time fixing issues reported with long delays, they may chose not to contribute. (This is not abstract - there are areas of LLVM I do not spent my volunteer time on because the overhead of working in that area is too high.)
Some of our contributors are employees. Generally, in most organizations getting funding for open ended maintenance is hard. Having bugs found quickly so that it's within the scope of the current project makes them much more likely to be fixed. (Ever)

There are many many interacting reasons here. None are "the reason", all contribute in some way.

In D113578#3195175, @alexfh wrote:

In D113578#3195059, @fhahn wrote:

In D113578#3195050, @alexfh wrote:

In D113578#3194811, @fhahn wrote:

I took a closer look and confirmed that the patch indeed slightly increases the call stack for the reproducer and this causes the stack overflow.

The issue is roughly the following: we apply the loop guards to the last loop in a large chain of loops. The loop guard itself is the exit condition of an earlier loop. To construct the SCEV, we compute the exit count along one of the code paths. This in turn applies loop guards which is in an earlier predecessor loop. To construct the SCEV, we again need to compute the exit count of this loop, applying loop guards and so on.

This can be fixed by adjusting the order we apply loop guards. At the moment, the guard closest to the starting loop is evaluated first, triggering the large evaluation chain. If we instead apply the earliest guard first no excessive call chain is needed. This seems to have a tiny positive compile-time impact http://llvm-compile-time-tracker.com/compare.php?from=529833377ccdf4381f8bc9961bfa96ec4f5e2eed&to=b07452a70ac22ae4b1f0dbf4b84df3ee44c171a1&stat=instructions

It is not quite NFC, because the order in which guards are applied can impact the result in some cases. I looked into some and so far it looks like this new order makes applying loop guards slightly more effective. I am planning on isolating a test that shows the improved analysis results and commit the fix soonish.

Thanks! I guess, I can come up a couple of distinct test cases that have the same effect (compiler stack overflow). Would you like to have a look at them as well?

The fix is available here: https://github.com/llvm/llvm-project/commit/b07452a70ac22ae4b1f0dbf4b84df3ee44c171a1, in case you want to give it a try against the other test cases.

The patch fixes the crash, but it looks like the compilation time with the patch grows by a factor of ~7 for one of the problematic files. I'll try to make a cleaner experiment with two compilers built with exactly the same compiler options, but I doubt it will change much.

Scratch that. It wasn't a valid experiment at all. Looking further.

In D113578#3195263, @alexfh wrote:

Scratch that. It wasn't a valid experiment at all. Looking further.

Ok, please let me know if there's a regression. Otherwise I'll go ahead with the fix, as the compile-time-tracker shows this to be compile-time neutral (or a tiny bit positive).

fhahn mentioned this in rGf5f421e0eefa: [SCEV] Apply loop guards in reverse order..Dec 16 2021, 2:53 AM

In D113578#3195448, @fhahn wrote:

In D113578#3195263, @alexfh wrote:

Scratch that. It wasn't a valid experiment at all. Looking further.

Ok, please let me know if there's a regression. Otherwise I'll go ahead with the fix, as the compile-time-tracker shows this to be compile-time neutral (or a tiny bit positive).

https://reviews.llvm.org/rGf5f421e0eefa492545c3848e318c21ed04cb1ddd fixes all the problematic cases related to this patch. Compile time doesn't seem to be affected. Thanks!

Revision Contents

Path

Size

llvm/

lib/

Analysis/

ScalarEvolution.cpp

9 lines

test/

Analysis/

ScalarEvolution/

max-backedge-taken-count-guard-info-rewrite-expressions.ll

15 lines

Diff 387887

llvm/lib/Analysis/ScalarEvolution.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 9,663 Lines • ▼ Show 20 Lines	if (Exact != getCouldNotCompute()) {
Max = getConstant(APIntOps::umin(MaxInt, getUnsignedRangeMax(Exact)));		Max = getConstant(APIntOps::umin(MaxInt, getUnsignedRangeMax(Exact)));
}		}
return ExitLimit(Exact, Max, false, Predicates);		return ExitLimit(Exact, Max, false, Predicates);
}		}

// Solve the general equation.		// Solve the general equation.
const SCEV *E = SolveLinEquationWithOverflow(StepC->getAPInt(),		const SCEV *E = SolveLinEquationWithOverflow(StepC->getAPInt(),
getNegativeSCEV(Start), *this);		getNegativeSCEV(Start), *this);
const SCEV *M = E == getCouldNotCompute()
? E		const SCEV *M = E;
: getConstant(getUnsignedRangeMax(E));		if (E != getCouldNotCompute()) {
		APInt MaxWithGuards = getUnsignedRangeMax(applyLoopGuards(E, L));
		M = getConstant(APIntOps::umin(MaxWithGuards, getUnsignedRangeMax(E)));
		}
		nikicUnsubmitted Done Reply Inline Actions Same as in other cases, this assertions likely doesn't always hold. nikic: Same as in other cases, this assertions likely doesn't always hold.
		mkazantsevUnsubmitted Done Reply Inline Actions Isn't breach of this is a bug we want to fix? mkazantsev: Isn't breach of this is a bug we want to fix?
		fhahnAuthorUnsubmitted Done Reply Inline Actions Same as in other cases, this assertions likely doesn't always hold. Yeah, turns out I was a bit too optimistic. I didn't hit the assertion when building a larger number of projects, but I managed to construct a crashing test from other existing problematic tests. I added the test in 69c1cbe20f5d and turned the assertion into a select of the minimum. I am also looking into weeding out the cases where applying the guard info pessimizes results. Unfortunately there are some cases where it pessimizes the range even if we only allow compares with constants. I'm still digging into the details, but it might be a case where the original range is not correct. I'm planning on sharing more on that soon. fhahn: > Same as in other cases, this assertions likely doesn't always hold. Yeah, turns out I was a…
return ExitLimit(E, M, false, Predicates);		return ExitLimit(E, M, false, Predicates);
}		}

ScalarEvolution::ExitLimit		ScalarEvolution::ExitLimit
ScalarEvolution::howFarToNonZero(const SCEV V, const Loop L) {		ScalarEvolution::howFarToNonZero(const SCEV V, const Loop L) {
// Loops that look like: while (X == 0) are very strange indeed. We don't		// Loops that look like: while (X == 0) are very strange indeed. We don't
// handle them yet except for the trivial case. This could be expanded in the		// handle them yet except for the trivial case. This could be expanded in the
// future as needed.		// future as needed.
▲ Show 20 Lines • Show All 4,265 Lines • Show Last 20 Lines

llvm/test/Analysis/ScalarEvolution/max-backedge-taken-count-guard-info-rewrite-expressions.ll

Show First 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	loop:
store i32 0, i32* %gep		store i32 0, i32* %gep
%index.next = add nuw i64 %index, 4		%index.next = add nuw i64 %index, 4
%ec = icmp eq i64 %index.next, %n.vec		%ec = icmp eq i64 %index.next, %n.vec
br i1 %ec, label %exit, label %loop		br i1 %ec, label %exit, label %loop

exit:		exit:
ret i32 0		ret i32 0
}		}

; Test case from PR52464.		; Test case from PR52464. applyLoopGuards needs to apply information about %and
		nikicUnsubmitted Done Reply Inline Actions The zext is not in the condition here, so it doesn't really seem relevant? nikic: The zext is not in the condition here, so it doesn't really seem relevant?
		fhahnAuthorUnsubmitted Done Reply Inline Actions Thanks, I adjusted the name to `@rewrite_zext_with_info_from_icmp_ne` to hopefully make it a bit clearer in the committed version. fhahn: Thanks, I adjusted the name to `@rewrite_zext_with_info_from_icmp_ne` to hopefully make it a…
define i32 @rewrite_zext_icmp_ne(i32 %N) {		; to %ext, which requires rewriting the zext.
; CHECK-LABEL: Determining loop execution counts for: @rewrite_zext_icmp_ne		define i32 @rewrite_zext_with_info_from_icmp_ne(i32 %N) {
; CHECK-NEXT: Loop %loop: backedge-taken count is ((-4 + (4 * ((4 + (zext i32 (-1 + (zext i2 (trunc i32 %N to i2) to i32))<nsw> to i64))<nuw><nsw> /u 4))<nuw><nsw>)<nsw> /u 4)		; CHECK-LABEL: Determining loop execution counts for: @rewrite_zext_with_info_from_icmp_ne
; CHECK-NEXT: Loop %loop: max backedge-taken count is 1073741823		; CHECK-NEXT: Loop %loop: backedge-taken count is 0
; CHECK-NEXT: Loop %loop: Predicated backedge-taken count is ((-4 + (4 * ((4 + (zext i32 (-1 + (zext i2 (trunc i32 %N to i2) to i32))<nsw> to i64))<nuw><nsw> /u 4))<nuw><nsw>)<nsw> /u 4)		; CHECK-NEXT: Loop %loop: max backedge-taken count is 0
		; CHECK-NEXT: Loop %loop: Predicated backedge-taken count is 0
; CHECK-NEXT: Predicates:		; CHECK-NEXT: Predicates:
; CHECK-EMPTY:		; CHECK-EMPTY:
; CHECK-NEXT: Loop %loop: Trip multiple is 1		; CHECK-NEXT: Loop %loop: Trip multiple is 1
;		;
entry:		entry:
%and = and i32 %N, 3		%and = and i32 %N, 3
%cmp6.not = icmp eq i32 %and, 0		%cmp6.not = icmp eq i32 %and, 0
br i1 %cmp6.not, label %exit, label %loop.ph		br i1 %cmp6.not, label %exit, label %loop.ph
Show All 11 Lines	loop:
call void @use(i64 %iv.next)		call void @use(i64 %iv.next)
%ec = icmp eq i64 %iv.next, %n.vec		%ec = icmp eq i64 %iv.next, %n.vec
br i1 %ec, label %exit, label %loop		br i1 %ec, label %exit, label %loop

exit:		exit:
ret i32 0		ret i32 0
}		}

; Similar to @rewrite_zext_icmp_ne, but the loop is not guarded by %and != 0,		; Similar to @rewrite_zext_with_info_from_icmp_ne, but the loop is not guarded by %and != 0,
; hence the subsequent subtraction may yield a negative number.		; hence the subsequent subtraction may yield a negative number.
define i32 @rewrite_zext_no_icmp_ne(i32 %N) {		define i32 @rewrite_zext_no_icmp_ne(i32 %N) {
; CHECK-LABEL: Determining loop execution counts for: @rewrite_zext_no_icmp_ne		; CHECK-LABEL: Determining loop execution counts for: @rewrite_zext_no_icmp_ne
; CHECK-NEXT: Loop %loop: backedge-taken count is ((-4 + (4 * ((4 + (zext i32 (-1 + (zext i2 (trunc i32 %N to i2) to i32))<nsw> to i64))<nuw><nsw> /u 4))<nuw><nsw>)<nsw> /u 4)		; CHECK-NEXT: Loop %loop: backedge-taken count is ((-4 + (4 * ((4 + (zext i32 (-1 + (zext i2 (trunc i32 %N to i2) to i32))<nsw> to i64))<nuw><nsw> /u 4))<nuw><nsw>)<nsw> /u 4)
; CHECK-NEXT: Loop %loop: max backedge-taken count is 1073741823		; CHECK-NEXT: Loop %loop: max backedge-taken count is 1073741823
; CHECK-NEXT: Loop %loop: Predicated backedge-taken count is ((-4 + (4 * ((4 + (zext i32 (-1 + (zext i2 (trunc i32 %N to i2) to i32))<nsw> to i64))<nuw><nsw> /u 4))<nuw><nsw>)<nsw> /u 4)		; CHECK-NEXT: Loop %loop: Predicated backedge-taken count is ((-4 + (4 * ((4 + (zext i32 (-1 + (zext i2 (trunc i32 %N to i2) to i32))<nsw> to i64))<nuw><nsw> /u 4))<nuw><nsw>)<nsw> /u 4)
; CHECK-NEXT: Predicates:		; CHECK-NEXT: Predicates:
; CHECK-EMPTY:		; CHECK-EMPTY:
▲ Show 20 Lines • Show All 128 Lines • Show Last 20 Lines