Even if the exact exit count is unknown, we can still prove that this exit will not be taken.
If we can prove that the predicate is monotonic, fulfilled on first & last iteration, and no
overflow happened in between, then the check can be removed.
Details
Diff Detail
Event Timeline
This looks reasonable to me, but i'd prefer to defer to other reviewers.
llvm/lib/Transforms/Scalar/IndVarSimplify.cpp | ||
---|---|---|
2398 | // Value of IV on *suggested* max possible last iteration.? | |
2409–2414 | This seems fragile, i'd suggest ICmpInst::Predicate NoOverflowPred = CmpInst::isSigned(Pred) ? ICmpInst::ICMP_SLE : ICmpInst::ICMP_ULE; if (Step == MinusOne) NoOverflowPred = CmpInst::getSwappedPredicate(NoOverflowPred); |
IOW is this missing some precondition (potentially just a comment) that in signed case, the trip count is a positive number?
llvm/lib/Transforms/Scalar/IndVarSimplify.cpp | ||
---|---|---|
2374–2387 | It might be good to hoist that into some ICmp method, | |
2411–2421 | I agree about unsigned case: ---------------------------------------- define i1 @src(i32 %base, i32 %num) { %0: %t0 = uadd_overflow {i32, i1, i24} %base, %num %t1 = extractvalue {i32, i1, i24} %t0, 1 %t2 = xor i1 %t1, 1 ret i1 %t2 } => define i1 @tgt(i32 %base, i32 %num) { %0: %t0 = add i32 %base, %num %t1 = icmp ule i32 %base, %t0 ret i1 %t1 } Transformation seems to be correct! but signed case seems to be different: ---------------------------------------- define i1 @src(i32 %base, i32 %num) { %0: %t0 = sadd_overflow {i32, i1, i24} %base, %num %t1 = extractvalue {i32, i1, i24} %t0, 1 %t2 = xor i1 %t1, 1 ret i1 %t2 } => define i1 @tgt(i32 %base, i32 %num) { %0: %t0 = add i32 %base, %num %t1 = icmp sle i32 %base, %t0 ret i1 %t1 } Transformation doesn't verify! ERROR: Value mismatch Example: i32 %base = #x34600ff0 (878710768) i32 %num = #xcba01020 (3416264736, -878702560) Source: {i32, i1, i24} %t0 = { #x00002010 (8208), #x0 (0), poison } i1 %t1 = #x0 (0) i1 %t2 = #x1 (1) Target: i32 %t0 = #x00002010 (8208) i1 %t1 = #x0 (0) Source value: #x1 (1) Target value: #x0 (0) ---------------------------------------- define i1 @src(i32 %base, i31 %num_) { %0: %num = zext i31 %num_ to i32 %t0 = sadd_overflow {i32, i1, i24} %base, %num %t1 = extractvalue {i32, i1, i24} %t0, 1 %t2 = xor i1 %t1, 1 ret i1 %t2 } => define i1 @tgt(i32 %base, i31 %num_) { %0: %num = zext i31 %num_ to i32 %t0 = add i32 %base, %num %t1 = icmp sle i32 %base, %t0 ret i1 %t1 } Transformation seems to be correct! |
llvm/include/llvm/Analysis/ScalarEvolution.h | ||
---|---|---|
923 | For the context, use an instruction even if you don't actually need it yet. We made this mistake in LVI and spent years digging out. The problem with using the BB is you have to be very careful in documenting where in said block the fact holds. | |
llvm/lib/Transforms/Scalar/IndVarSimplify.cpp | ||
2374 | It really looks like at least part of this logic can be replaced with SE->isMonotonicPredicate(LHSS, Pred, IncreasingOut). In particular, I think that covers all your tricky overflow logic and does so more generally. If I'm right about that, you can also generalize this. Consider:
This is starting to look a lot like it might belong in isLoopInvariantPredicate. It seems to be a generalization of the logic there. |
Hi Roman,
I'm not sure I understand your counter-example for signed. In case of base = 878710768, num= -878702560 (aka 3416264736), step = 1, it should be last = base + step * num = 8208, and the transform should not happen because of NoOverflowPred which is base <=s last.
In case of base = 878710768, num= -878702560 (aka 3416264736), step = -1, it should be last = base + step * num = 1 757 413 328, and again the transform should not happen because of NoOverflowPred which is base >=s last (for step = -1).
The condition with NoOverflowPred is actually the more general version of "number of iterations is positive" that you mentioned.
For example, for base = - 2^31 and num = -100 (aka 2^32 - 100) it is OK to make such transform because iterating from SINT_MIN with this number of iterations does not overflow: last = -2^31 + 2^32 - 100 = 2^31 - 100 (positive).
llvm/include/llvm/Analysis/ScalarEvolution.h | ||
---|---|---|
923 | Just curious: what kind of mistake it was? |
llvm/lib/Transforms/Scalar/IndVarSimplify.cpp | ||
---|---|---|
2374 | It does not. It requires proved nsw/nuw flags. We are OK if the last iteration actually does overflow, but in this case we exit the loop before our check. |
llvm/lib/Transforms/Scalar/IndVarSimplify.cpp | ||
---|---|---|
2374–2387 | isRelational seems to be what we need. |
- Replaced context API with instruction (added corresponding TODO);
- Simplified predicate check.
llvm/lib/Transforms/Scalar/IndVarSimplify.cpp | ||
---|---|---|
2374 | Is it fair to say that what you're implementing is isMonotonicPredicate(Pred, RHS, MaxIteration)? And that you're specifically focused on exploiting the overflow outside of MaxIteration? If so, this seems like it might come down to a missing overflow flag on the SCEV. SCEVs overflow flags hold for all current uses, so as long as the overflow value isn't used on the last iteration (or outside the loop), SCEV should be able to infer the lack of overflow. |
Maybe this code should be moved into SCEV's function that does something similar (but more restrictively).
llvm/lib/Analysis/ScalarEvolution.cpp | ||
---|---|---|
9252 | IIUC there a reason we cannot use isMonotonicPredicate here is that it is not using the information from MaxIter (which is the max exit count in the case of this patch)? Would it be possible move the logic to use the max exit count of a loop into isMonotonicPredicate? |
Still digging through the main logic. In the meantime see some minor comments inline.
llvm/include/llvm/Analysis/ScalarEvolution.h | ||
---|---|---|
957 | Style. Suggest to drop At from the name so as to make it more readable isLoopInvariantPredicateDuringFirstIterations. | |
llvm/lib/Transforms/Scalar/IndVarSimplify.cpp | ||
2358 | Looks like a separable enhancement. | |
2492–2505 | Separable refactoring? |
llvm/lib/Analysis/ScalarEvolution.cpp | ||
---|---|---|
9248 | Do we actually need to pass MaxIter? We already have the loop, we can check the max iterations for the loop inside of this function. If we make this interface change, isLoopInvariantPredicateAtDuringFirstIterations interface becomes very similar to isLoopInvariantPredicate. At that point we can add context to isLoopInvariantPredicate and implement the proposed logic in isLoopInvariantPredicate. |
llvm/lib/Analysis/ScalarEvolution.cpp | ||
---|---|---|
9252 | The main reason is that isMonotonicPredicate checks no-wrap flag, and we are interested in unsigned range for IV with negative step. Formally, it overflows on every iteration. So isMonotonicPredicate cannot deal with it. Theoretically we could expand isMonotonicPredicate, making it smarter and able to handle this. But it is used in 3 different transforms, and such change would have unpredictable impact. So I'd rather do it separately. |
llvm/lib/Analysis/ScalarEvolution.cpp | ||
---|---|---|
9248 | We do not necessarily want to evaluate the last iteration. In some cases (and this is in follow-up patches), if we prove that there are two exits with same iteration count, the 2nd one will not execute on the last iteration. So we might want to check the condition for pre-last iteration for it. |
llvm/lib/Analysis/ScalarEvolution.cpp | ||
---|---|---|
9248 | Besides, before this code was written, max iter computation was a costly operation (I've added caching for it just today). But per argument above, I don't think we want to compute it inside. I agree that isLoopInvariantPredicate might be remade into calling this function with last iteration. Let's not to API changes in the patches that are not about API changes. It can go separately after this work has concluded. |
llvm/lib/Transforms/Scalar/IndVarSimplify.cpp | ||
---|---|---|
2492–2505 | No. The old version does not use the max iter. The only difference of the new code is that it does. |
llvm/include/llvm/Analysis/ScalarEvolution.h | ||
---|---|---|
957 | No. At here means that this predicate is not true everywhere, but at specific point. |
llvm/lib/Transforms/Scalar/IndVarSimplify.cpp | ||
---|---|---|
2358 | Will do. |
llvm/lib/Transforms/Scalar/IndVarSimplify.cpp | ||
---|---|---|
2492–2505 | You can still extract the lambda and add a new parameter in the functional patch. |
Updated function's name, removing At and adding the notion of the fact that it's not just a random predicate, but an exit condition.
llvm/lib/Transforms/Scalar/IndVarSimplify.cpp | ||
---|---|---|
2374 | It's more complex than that. We can't set nuw on SCEVAddRec with step -1. But we still can prove the condition {len,+,-1} <= u len is monotonic. |
Accepting this change to unblock the progress.
Please, follow up with commoning this with the existing monotonic predicate functionality.
I plan expanding logic of this code to handle more cases. Once all planned enhancements are made, we can think about merging it with isMonotonicPredicate.
This breaks stage2 compilation on Green Dragon:
/Users/buildslave/jenkins/workspace/lldb-cmake/host-compiler/bin/clang++ -DGTEST_HAS_RTTI=0 -D_DEBUG -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Ilib/Target/AArch64 -I/Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/lib/Target/AArch64 -Iinclude -I/Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/include -Wdocumentation -fPIC -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -fmodules -fmodules-cache-path=/Users/buildslave/jenkins/workspace/lldb-cmake/lldb-build/module.cache -fcxx-modules -Xclang -fmodules-local-submodule-visibility -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -fdiagnostics-color -O3 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.15.sdk -fno-exceptions -fno-rtti -UNDEBUG -std=c++14 -MD -MT lib/Target/AArch64/CMakeFiles/LLVMAArch64CodeGen.dir/GISel/AArch64RegisterBankInfo.cpp.o -MF lib/Target/AArch64/CMakeFiles/LLVMAArch64CodeGen.dir/GISel/AArch64RegisterBankInfo.cpp.o.d -o lib/Target/AArch64/CMakeFiles/LLVMAArch64CodeGen.dir/GISel/AArch64RegisterBankInfo.cpp.o -c /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/lib/Target/AArch64/GISel/AArch64RegisterBankInfo.cpp Assertion failed: (WeightSum <= UINT32_MAX && "Expected weights to scale down to 32 bits"), function calcMetadataWeights, file /Users/buildslave/jenkins/workspace/clang-stage1-RA/llvm-project/llvm/lib/Analysis/BranchProbabilityInfo.cpp, line 493. PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace, preprocessed source, and associated run script. Stack dump: 0. Program arguments: /Users/buildslave/jenkins/workspace/lldb-cmake/host-compiler/bin/clang++ -DGTEST_HAS_RTTI=0 -D_DEBUG -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Ilib/Target/AArch64 -I/Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/lib/Target/AArch64 -Iinclude -I/Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/include -Wdocumentation -fPIC -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -fmodules -fmodules-cache-path=/Users/buildslave/jenkins/workspace/lldb-cmake/lldb-build/module.cache -fcxx-modules -Xclang -fmodules-local-submodule-visibility -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -fdiagnostics-color -O3 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.15.sdk -fno-exceptions -fno-rtti -UNDEBUG -std=c++14 -MD -MT lib/Target/AArch64/CMakeFiles/LLVMAArch64CodeGen.dir/GISel/AArch64RegisterBankInfo.cpp.o -MF lib/Target/AArch64/CMakeFiles/LLVMAArch64CodeGen.dir/GISel/AArch64RegisterBankInfo.cpp.o.d -o lib/Target/AArch64/CMakeFiles/LLVMAArch64CodeGen.dir/GISel/AArch64RegisterBankInfo.cpp.o -c /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/lib/Target/AArch64/GISel/AArch64RegisterBankInfo.cpp 1. <eof> parser at end of file 2. Per-module optimization passes 3. Running pass 'CallGraph Pass Manager' on module '/Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/lib/Target/AArch64/GISel/AArch64RegisterBankInfo.cpp'. 4. Running pass 'Branch Probability Analysis' on function '@"_ZZN4llvm23AArch64RegisterBankInfoC1ERKNS_18TargetRegisterInfoEENK3$_2clEv"' Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it): 0 clang++ 0x000000010f1b5a6b llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) + 43 1 clang++ 0x000000010f1b4a95 llvm::sys::RunSignalHandlers() + 85 2 clang++ 0x000000010f1b51f2 llvm::sys::CleanupOnSignal(unsigned long) + 210 3 clang++ 0x000000010f0fdb0a (anonymous namespace)::CrashRecoveryContextImpl::HandleCrash(int, unsigned long) + 106 4 clang++ 0x000000010f0fdc87 CrashRecoverySignalHandler(int) + 135 5 libsystem_platform.dylib 0x00007fff6b6575fd _sigtramp + 29 6 libsystem_platform.dylib 0x00000001205ae380 _sigtramp + 18446603343552146848 7 libsystem_c.dylib 0x00007fff6b529808 abort + 120 8 libsystem_c.dylib 0x00007fff6b528ac6 err + 0 9 clang++ 0x0000000111de3973 llvm::BranchProbabilityInfo::calcMetadataWeights(llvm::BasicBlock const*) (.cold.17) + 35 10 clang++ 0x000000010e2cdafa llvm::BranchProbabilityInfo::calcMetadataWeights(llvm::BasicBlock const*) + 2186 11 clang++ 0x000000010e2d0b64 llvm::BranchProbabilityInfo::calculate(llvm::Function const&, llvm::LoopInfo const&, llvm::TargetLibraryInfo const*, llvm::PostDominatorTree*) + 1316 12 clang++ 0x000000010e2d11e2 llvm::BranchProbabilityInfoWrapperPass::runOnFunction(llvm::Function&) + 322 13 clang++ 0x000000010ea0f1d4 llvm::FPPassManager::runOnFunction(llvm::Function&) + 1092 14 clang++ 0x000000010e2fc3e4 (anonymous namespace)::CGPassManager::runOnModule(llvm::Module&) + 1588 15 clang++ 0x000000010ea0f7ca llvm::legacy::PassManagerImpl::run(llvm::Module&) + 986 16 clang++ 0x000000010f424844 clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::DataLayout const&, llvm::Module*, clang::BackendAction, std::__1::unique_ptr<llvm::raw_pwrite_stream, std::__1::default_delete<llvm::raw_pwrite_stream> >) + 12820 17 clang++ 0x000000010f6ee76a clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) + 1130 18 clang++ 0x00000001108bee63 clang::ParseAST(clang::Sema&, bool, bool) + 643 19 clang++ 0x000000010f9d5064 clang::FrontendAction::Execute() + 84 20 clang++ 0x000000010f977b83 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) + 2275 21 clang++ 0x000000010fa4d9d7 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) + 1639 22 clang++ 0x000000010d202d35 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) + 2309 23 clang++ 0x000000010d200999 ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&) + 377 24 clang++ 0x000000010f811687 void llvm::function_ref<void ()>::callback_fn<clang::driver::CC1Command::Execute(llvm::ArrayRef<llvm::Optional<llvm::StringRef> >, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >*, bool*) const::$_1>(long) + 23 25 clang++ 0x000000010f0fda54 llvm::CrashRecoveryContext::RunSafely(llvm::function_ref<void ()>) + 228 26 clang++ 0x000000010f810d3d clang::driver::CC1Command::Execute(llvm::ArrayRef<llvm::Optional<llvm::StringRef> >, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >*, bool*) const + 429 27 clang++ 0x000000010f7e39cd clang::driver::Compilation::ExecuteCommand(clang::driver::Command const&, clang::driver::Command const*&) const + 221 28 clang++ 0x000000010f7e3eed clang::driver::Compilation::ExecuteJobs(clang::driver::JobList const&, llvm::SmallVectorImpl<std::__1::pair<int, clang::driver::Command const*> >&) const + 125 29 clang++ 0x000000010f7fa9cc clang::driver::Driver::ExecuteCompilation(clang::driver::Compilation&, llvm::SmallVectorImpl<std::__1::pair<int, clang::driver::Command const*> >&) + 204 30 clang++ 0x000000010d2000c5 main + 10517 31 libdyld.dylib 0x00007fff6b45acc9 start + 1 32 libdyld.dylib 0x0000000000000034 start + 18446603338716435308 clang-12: error: clang frontend command failed with exit code 134 (use -v to see invocation) clang version 12.0.0 (https://github.com/llvm/llvm-project.git 673f2f702b03be8c003889cbb5923e111c3e24d0) Target: x86_64-apple-darwin19.5.0 Thread model: posix InstalledDir: /Users/buildslave/jenkins/workspace/lldb-cmake/host-compiler/bin clang-12: note: diagnostic msg: ******************** PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT: Preprocessed source(s) and associated run script(s) are located at: clang-12: note: diagnostic msg: /var/folders/09/r4vw4v8n5kb67jl66zvlbljw0000gn/T/AArch64RegisterBankInfo-189543.cpp clang-12: note: diagnostic msg: /var/folders/09/r4vw4v8n5kb67jl66zvlbljw0000gn/T/AArch64RegisterBankInfo-189543.cache clang-12: note: diagnostic msg: /var/folders/09/r4vw4v8n5kb67jl66zvlbljw0000gn/T/AArch64RegisterBankInfo-189543.sh clang-12: note: diagnostic msg: Crash backtrace is located in clang-12: note: diagnostic msg: /Users/buildslave/Library/Logs/DiagnosticReports/clang-12_<YYYY-MM-DD-HHMMSS>_<hostname>.crash clang-12: note: diagnostic msg: (choose the .crash file that corresponds to your crash) clang-12: note: diagnostic msg:
I uploaded a reproducer for the crash here: https://teemperor.de/pub/clang-crash-D87832-repro.zip
I'll revert this in the meantime to get the bots running again.
Reverted this and a0d84d80315d0c725b5efcd889928bad1171ba56 (as it seems to be a follow-up cleanup patch) in e038b60d9169733367393f733058f0ff23c28d3f
I was able to successfully build clang stage 2. The attached repro doesn't reproduce the failure either.
The failure doesn't reproruce and looks related to this story:
commit 2a4e704c92e8ec3d9217a7333368ea53cf3a583f Author: Nico Weber <thakis@chromium.org> Date: Tue Oct 27 09:18:42 2020 -0400 Revert "Use uint64_t for branch weights instead of uint32_t" This reverts commit e5766f25c62c185632e3a75bf45b313eadab774b. Makes clang assert when building Chromium, see https://crbug.com/1142813 for a repro. commit e5766f25c62c185632e3a75bf45b313eadab774b Author: Arthur Eubanks <aeubanks@google.com> Date: Wed Sep 30 12:11:46 2020 -0700 Use uint64_t for branch weights instead of uint32_t CallInst::updateProfWeight() creates branch_weights with i64 instead of i32. To be more consistent everywhere and remove lots of casts from uint64_t to uint32_t, use i64 for branch_weights. Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D88609
I am returning the change as the revert was erroneous.
@teemperor please make sure you're reverting the patch that actually causes the failure...
For the context, use an instruction even if you don't actually need it yet. We made this mistake in LVI and spent years digging out. The problem with using the BB is you have to be very careful in documenting where in said block the fact holds.