Similar other cases in the current function (e.g. when the step is 1 or
-1), applying loop guards can lead to tighter upper bounds for the
backedge-taken counts.
Fixes PR52464.
Differential D113578
[SCEV] Apply loop guards when computing max BTC for arbitrary steps. fhahn on Nov 10 2021, 9:33 AM. Authored by
Details Similar other cases in the current function (e.g. when the step is 1 or Fixes PR52464.
Diff Detail
Unit Tests Event Timeline
Comment Actions I'd rather move the assert inside of applyLoopGuards. If it breaks, it's a subject for improving our reasoning. I can imagine it could be breaking because of issues with caching (e.g. if addrecs are involved), though. So, unless the failures are mass and disruptive, I'd prefer it to stay and catch us opportunities. Comment Actions We already know that the assertion is going to fail and also know why. See https://reviews.llvm.org/D102267#inline-967507 for two suggestions on how to avoid pessimization, or at least the known cases of it. This should either be addressed first, or the assert should be omitted. Comment Actions Replace over-aggressive assertion with selecting the minimum max BTC. Added test case where the assert is voilated in 69c1cbe20f5d.
Comment Actions LG
Comment Actions LGTM as well. Amusingly, I'd stumbled into a variant of this as well just yesterday. :)
Comment Actions Hi Florian, this revision is causing compiler crashes on a number of translation units in our code. This _may_ be an increase in stack depth, but I'm not sure yet. Is significantly increased stack depth expected after this patch? Is it justifiable? The failures look more or less like this: 1. <eof> parser at end of file 2. Code generation 3. Running pass 'Function Pass Manager' on module '...'. 4. Running pass 'Loop Pass Manager' on function '@...' 5. Running pass 'Induction Variable Users' on basic block '%107409' #0 0x0000562126256c38 llvm::sys::RunSignalHandlers() (clang+0x6c56c38) #1 0x000056212625926c SignalHandler(int) (clang+0x6c5926c) #2 0x00007faea56f9750 __restore_rt (libpthread.so.0+0x15750) #3 0x0000562125ec3b1b computeKnownBitsFromAssume(llvm::Value const*, llvm::KnownBits&, unsigned int, (anonymous namespace)::Query const&) (clang+0x68c3b1b) #4 0x0000562125eae55d computeKnownBits(llvm::Value const*, llvm::KnownBits&, unsigned int, (anonymous namespace)::Query const&) (clang+0x68ae55d) #5 0x0000562125ec2de5 computeKnownBitsFromOperator(llvm::Operator const*, llvm::APInt const&, llvm::KnownBits&, unsigned int, (anonymous namespace)::Query const&) (clang+0x68c2de5) #6 0x0000562125eaeb6e computeKnownBits(llvm::Value const*, llvm::APInt const&, llvm::KnownBits&, unsigned int, (anonymous namespace)::Query const&) (clang+0x68aeb6e) #7 0x0000562125eae55d computeKnownBits(llvm::Value const*, llvm::KnownBits&, unsigned int, (anonymous namespace)::Query const&) (clang+0x68ae55d) #8 0x0000562125eaecd9 llvm::computeKnownBits(llvm::Value const*, llvm::DataLayout const&, unsigned int, llvm::AssumptionCache*, llvm::Instruction const*, llvm::DominatorTree const*, llvm::OptimizationRemark Emitter*, bool) (clang+0x68aecd9) #9 0x0000562125e47339 llvm::ScalarEvolution::GetMinTrailingZerosImpl(llvm::SCEV const*) (clang+0x6847339) #10 0x0000562125e3143a llvm::ScalarEvolution::GetMinTrailingZeros(llvm::SCEV const*) (clang+0x683143a) #11 0x0000562125e482ea llvm::ScalarEvolution::getRangeRef(llvm::SCEV const*, llvm::ScalarEvolution::RangeSignHint) (clang+0x68482ea) #12 0x0000562125e3ba27 StrengthenNoWrapFlags(llvm::ScalarEvolution*, llvm::SCEVTypes, llvm::ArrayRef<llvm::SCEV const*>, llvm::SCEV::NoWrapFlags) (clang+0x683ba27) #13 0x0000562125e2facf llvm::ScalarEvolution::getAddExpr(llvm::SmallVectorImpl<llvm::SCEV const*>&, llvm::SCEV::NoWrapFlags, unsigned int) (clang+0x682facf) #14 0x0000562125e3c19d llvm::ScalarEvolution::getGEPExpr(llvm::GEPOperator*, llvm::SmallVectorImpl<llvm::SCEV const*> const&) (clang+0x683c19d) #15 0x0000562125e46e7e llvm::ScalarEvolution::createNodeForGEP(llvm::GEPOperator*) (clang+0x6846e7e) #16 0x0000562125e3fe3d llvm::ScalarEvolution::createSCEV(llvm::Value*) (clang+0x683fe3d) #17 0x0000562125e38c60 llvm::ScalarEvolution::getSCEV(llvm::Value*) (clang+0x6838c60) #18 0x0000562125e452f4 llvm::ScalarEvolution::createAddRecFromPHI(llvm::PHINode*) (clang+0x68452f4) #19 0x0000562125e46938 llvm::ScalarEvolution::createNodeForPHI(llvm::PHINode*) (clang+0x6846938) #20 0x0000562125e3ff14 llvm::ScalarEvolution::createSCEV(llvm::Value*) (clang+0x683ff14) #21 0x0000562125e38c60 llvm::ScalarEvolution::getSCEV(llvm::Value*) (clang+0x6838c60) #22 0x0000562125e3bd47 llvm::ScalarEvolution::getGEPExpr(llvm::GEPOperator*, llvm::SmallVectorImpl<llvm::SCEV const*> const&) (clang+0x683bd47) #23 0x0000562125e46e7e llvm::ScalarEvolution::createNodeForGEP(llvm::GEPOperator*) (clang+0x6846e7e) #24 0x0000562125e3fe3d llvm::ScalarEvolution::createSCEV(llvm::Value*) (clang+0x683fe3d) #25 0x0000562125e38c60 llvm::ScalarEvolution::getSCEV(llvm::Value*) (clang+0x6838c60) #26 0x0000562125e4dd1d llvm::ScalarEvolution::applyLoopGuards(llvm::SCEV const*, llvm::Loop const*) (clang+0x684dd1d) ... I'm now working on a self-contained test case, but if you have any ideas of fixes, I can verify them on our real code. Regards, Comment Actions I've created a standalone test case for the issue: Relevant compiler invocations: $ ./clang-before -cc1 -triple x86_64-unknown-linux -S -target-feature +sse4.2 -fcolor-diagnostics -xc++ -std=c++17 -w -fsized-deallocation -O2 q2.cc $ ./clang-after -cc1 -triple x86_64-unknown-linux -S -target-feature +sse4.2 -fcolor-diagnostics -xc++ -std=c++17 -w -fsized-deallocation -O2 q2.cc Stack dump: 0. Program arguments: ./clang-after -cc1 -triple x86_64-unknown-linux -S -target-feature +sse4.2 -fcolor-diagnostics -xc++ -std=c++17 -w -fsized-deallocation -O2 q2.cc 1. <eof> parser at end of file 2. Code generation 3. Running pass 'Function Pass Manager' on module 'q2.cc'. 4. Running pass 'Loop Pass Manager' on function '@_Z1fv' 5. Running pass 'Induction Variable Users' on basic block '%arraydestroy.body85833' ... Comment Actions I observe the problem on Linux on x86-64. Please investigate the failure. If there's no obvious fix, please revert the patch while you're working on a solution. Comment Actions A bit cleaner test case: $ ./clang-before -cc1 -triple x86_64-unknown-linux -S -target-feature +sse4.2 -O2 q2.cc $ ./clang-after -cc1 -triple x86_64-unknown-linux -S -target-feature +sse4.2 -O2 q2.cc Stack dump: 0. Program arguments: ./clang-after -cc1 -triple x86_64-unknown-linux -S -target-feature +sse4.2 -O2 q2.cc 1. <eof> parser at end of file 2. Code generation 3. Running pass 'Function Pass Manager' on module 'q2.cc'. 4. Running pass 'Loop Pass Manager' on function '@_Z1fv' 5. Running pass 'Induction Variable Users' on basic block '%arraydestroy.body86029' ... Alternatively, the corresponding IR file: $ ./clang-after -cc1 -triple x86_64-unknown-linux -S -target-feature +sse4.2 -O2 q2.ll Stack dump: 0. Program arguments: ./clang-after -cc1 -triple x86_64-unknown-linux -S -target-feature +sse4.2 -O2 q2.ll 1. Code generation 2. Running pass 'Function Pass Manager' on module 'q2.ll'. 3. Running pass 'Loop Pass Manager' on function '@_Z1fv' 4. Running pass 'Induction Variable Users' on basic block '%arraydestroy.body86029' ... Comment Actions Thanks for the heads-up, I'll take a look now. But from the stack trace you shared, I doubt that the patch itself is causing the crash, but rather exposes an existing bug in GetMinTrailingZeros/computeKnownBits Comment Actions Right, I've seen this sort of stuff a few times recently. I guess, in a sufficiently complex system this is not a rare occasion. Unfortunately, the burden of mitigating the underlying issue in practice frequently lies with the author of the patch exposing the issue (unless someone else volunteers to do this). Do you see an obvious fix for the bug you mentioned or should we revert this patch while you're investigating? Comment Actions This isn't quite as clear cut as you make it out to be here. Yes, we will frequently fix issues exposed in the process of introducing an unrelated bug. However, the standard is not (and has never been) any reported issue which happens to be exposed. We will certainly fix upstream tests, common workloads, and quickly reported issues, but at some point, the responsibility shifts to the downstream maintainer. That's one of the reasons rapid reporting is so strongly encouraged. To be clear, I'm not commenting on which this situation might be. I'm just making the general point that this is a lot more complicated than your wording implies. Comment Actions Thanks for the comment. Note that I used "frequently", not "always". I'm not trying to impose any rules, just stating my observation. As for rapid vs delayed reporting, IMO, the main difference is whether the patch got dependencies (in LLVM code, in downstream LLVM dependencies, in code compiled by LLVM). Does it sound right to you? Comment Actions Can you rephrase this? I'm not sure what you mean by "got dependencies", and am not sure what you're trying to ask here. Comment Actions I took a closer look and confirmed that the patch indeed slightly increases the call stack for the reproducer and this causes the stack overflow. The issue is roughly the following: we apply the loop guards to the last loop in a large chain of loops. The loop guard itself is the exit condition of an earlier loop. To construct the SCEV, we compute the exit count along one of the code paths. This in turn applies loop guards which is in an earlier predecessor loop. To construct the SCEV, we again need to compute the exit count of this loop, applying loop guards and so on. This can be fixed by adjusting the order we apply loop guards. At the moment, the guard closest to the starting loop is evaluated first, triggering the large evaluation chain. If we instead apply the earliest guard first no excessive call chain is needed. This seems to have a tiny positive compile-time impact http://llvm-compile-time-tracker.com/compare.php?from=529833377ccdf4381f8bc9961bfa96ec4f5e2eed&to=b07452a70ac22ae4b1f0dbf4b84df3ee44c171a1&stat=instructions It is not quite NFC, because the order in which guards are applied can impact the result in some cases. I looked into some and so far it looks like this new order makes applying loop guards slightly more effective. I am planning on isolating a test that shows the improved analysis results and commit the fix soonish. Comment Actions Thanks! I guess, I can come up a couple of distinct test cases that have the same effect (compiler stack overflow). Would you like to have a look at them as well? Comment Actions The fix is available here: https://github.com/llvm/llvm-project/commit/b07452a70ac22ae4b1f0dbf4b84df3ee44c171a1, in case you want to give it a try against the other test cases. Comment Actions (general discussion, not necessarily related to this patch) You mentioned quickly reported issues ("We will certainly fix upstream tests, common workloads, and quickly reported issues, but at some point, the responsibility shifts to the downstream maintainer. That's one of the reasons rapid reporting is so strongly encouraged."), and I wanted to gauge my understanding of this. I see multiple reasons why quickly reported issues may be handled differently:
Do these sound about right? Am I missing something else? What's your view on the relative importance of the factors above? Comment Actions The patch fixes the crash, but it looks like the compilation time with the patch grows by a factor of ~7 for one of the problematic files. I'll try to make a cleaner experiment with two compilers built with exactly the same compiler options, but I doubt it will change much. Comment Actions You hit a bunch of important points. Here's a couple extra:
There are many many interacting reasons here. None are "the reason", all contribute in some way. Comment Actions Ok, please let me know if there's a regression. Otherwise I'll go ahead with the fix, as the compile-time-tracker shows this to be compile-time neutral (or a tiny bit positive). Comment Actions https://reviews.llvm.org/rGf5f421e0eefa492545c3848e318c21ed04cb1ddd fixes all the problematic cases related to this patch. Compile time doesn't seem to be affected. Thanks! |
Same as in other cases, this assertions likely doesn't always hold.