Page MenuHomePhabricator

[IndVars] Remove monotonic checks with unknown exit count
ClosedPublic

Authored by mkazantsev on Sep 17 2020, 7:51 AM.

Details

Summary

Even if the exact exit count is unknown, we can still prove that this exit will not be taken.
If we can prove that the predicate is monotonic, fulfilled on first & last iteration, and no
overflow happened in between, then the check can be removed.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
mkazantsev reclaimed this revision.Sep 21 2020, 2:40 AM
mkazantsev retitled this revision from [IndVars] Remove checks basing on iteration count to [IndVars] Remove monotonic checks with unknown exit count.
mkazantsev edited the summary of this revision. (Show Details)

Moved code to proper place.

This looks reasonable to me, but i'd prefer to defer to other reviewers.

llvm/lib/Transforms/Scalar/IndVarSimplify.cpp
2375

// Value of IV on *suggested* max possible last iteration.?

2386–2391

This seems fragile, i'd suggest

ICmpInst::Predicate NoOverflowPred = CmpInst::isSigned(Pred) ? ICmpInst::ICMP_SLE : ICmpInst::ICMP_ULE;
if (Step == MinusOne)
  NoOverflowPred = CmpInst::getSwappedPredicate(NoOverflowPred);
mkazantsev marked 2 inline comments as done.

Addressed comments & some minor refactoring.

IOW is this missing some precondition (potentially just a comment) that in signed case, the trip count is a positive number?

llvm/lib/Transforms/Scalar/IndVarSimplify.cpp
2351–2364

It might be good to hoist that into some ICmp method,
this is at least the second such code block.

2388–2398

I agree about unsigned case:

----------------------------------------
define i1 @src(i32 %base, i32 %num) {
%0:
  %t0 = uadd_overflow {i32, i1, i24} %base, %num
  %t1 = extractvalue {i32, i1, i24} %t0, 1
  %t2 = xor i1 %t1, 1
  ret i1 %t2
}
=>
define i1 @tgt(i32 %base, i32 %num) {
%0:
  %t0 = add i32 %base, %num
  %t1 = icmp ule i32 %base, %t0
  ret i1 %t1
}
Transformation seems to be correct!

but signed case seems to be different:

----------------------------------------
define i1 @src(i32 %base, i32 %num) {
%0:
  %t0 = sadd_overflow {i32, i1, i24} %base, %num
  %t1 = extractvalue {i32, i1, i24} %t0, 1
  %t2 = xor i1 %t1, 1
  ret i1 %t2
}
=>
define i1 @tgt(i32 %base, i32 %num) {
%0:
  %t0 = add i32 %base, %num
  %t1 = icmp sle i32 %base, %t0
  ret i1 %t1
}
Transformation doesn't verify!
ERROR: Value mismatch

Example:
i32 %base = #x34600ff0 (878710768)
i32 %num = #xcba01020 (3416264736, -878702560)

Source:
{i32, i1, i24} %t0 = { #x00002010 (8208), #x0 (0), poison }
i1 %t1 = #x0 (0)
i1 %t2 = #x1 (1)

Target:
i32 %t0 = #x00002010 (8208)
i1 %t1 = #x0 (0)
Source value: #x1 (1)
Target value: #x0 (0)
----------------------------------------
define i1 @src(i32 %base, i31 %num_) {
%0:
  %num = zext i31 %num_ to i32
  %t0 = sadd_overflow {i32, i1, i24} %base, %num
  %t1 = extractvalue {i32, i1, i24} %t0, 1
  %t2 = xor i1 %t1, 1
  ret i1 %t2
}
=>
define i1 @tgt(i32 %base, i31 %num_) {
%0:
  %num = zext i31 %num_ to i32
  %t0 = add i32 %base, %num
  %t1 = icmp sle i32 %base, %t0
  ret i1 %t1
}
Transformation seems to be correct!
reames requested changes to this revision.Sep 28 2020, 10:11 AM
reames added inline comments.
llvm/include/llvm/Analysis/ScalarEvolution.h
935

For the context, use an instruction even if you don't actually need it yet. We made this mistake in LVI and spent years digging out. The problem with using the BB is you have to be very careful in documenting where in said block the fact holds.

llvm/lib/Transforms/Scalar/IndVarSimplify.cpp
2351

It really looks like at least part of this logic can be replaced with SE->isMonotonicPredicate(LHSS, Pred, IncreasingOut). In particular, I think that covers all your tricky overflow logic and does so more generally.

If I'm right about that, you can also generalize this. Consider:

  • If is monotonic, and proven start == proven end (without caring which direction proved), the condition is invariant for the iterations executed.

This is starting to look a lot like it might belong in isLoopInvariantPredicate. It seems to be a generalization of the logic there.

This revision now requires changes to proceed.Sep 28 2020, 10:11 AM
mkazantsev added a comment.EditedSep 28 2020, 10:25 PM

Hi Roman,

I'm not sure I understand your counter-example for signed. In case of base = 878710768, num= -878702560 (aka 3416264736), step = 1, it should be last = base + step * num = 8208, and the transform should not happen because of NoOverflowPred which is base <=s last.

In case of base = 878710768, num= -878702560 (aka 3416264736), step = -1, it should be last = base + step * num = 1 757 413 328, and again the transform should not happen because of NoOverflowPred which is base >=s last (for step = -1).

The condition with NoOverflowPred is actually the more general version of "number of iterations is positive" that you mentioned.

For example, for base = - 2^31 and num = -100 (aka 2^32 - 100) it is OK to make such transform because iterating from SINT_MIN with this number of iterations does not overflow: last = -2^31 + 2^32 - 100 = 2^31 - 100 (positive).

mkazantsev added inline comments.Sep 28 2020, 11:11 PM
llvm/include/llvm/Analysis/ScalarEvolution.h
935

Just curious: what kind of mistake it was?

mkazantsev added inline comments.Sep 28 2020, 11:31 PM
llvm/lib/Transforms/Scalar/IndVarSimplify.cpp
2351

It does not. It requires proved nsw/nuw flags. We are OK if the last iteration actually does overflow, but in this case we exit the loop before our check.

mkazantsev added inline comments.Sep 28 2020, 11:34 PM
llvm/lib/Transforms/Scalar/IndVarSimplify.cpp
2351–2364

isRelational seems to be what we need.

  1. Replaced context API with instruction (added corresponding TODO);
  2. Simplified predicate check.
reames added inline comments.Sep 29 2020, 9:45 AM
llvm/lib/Transforms/Scalar/IndVarSimplify.cpp
2351

Is it fair to say that what you're implementing is isMonotonicPredicate(Pred, RHS, MaxIteration)? And that you're specifically focused on exploiting the overflow outside of MaxIteration?

If so, this seems like it might come down to a missing overflow flag on the SCEV. SCEVs overflow flags hold for all current uses, so as long as the overflow value isn't used on the last iteration (or outside the loop), SCEV should be able to infer the lack of overflow.

mkazantsev planned changes to this revision.Sep 30 2020, 9:11 PM

Maybe this code should be moved into SCEV's function that does something similar (but more restrictively).

mkazantsev updated this revision to Diff 295748.Oct 2 2020, 1:10 AM

Factored out most logic into SCEV.

fhahn added inline comments.Oct 20 2020, 9:25 AM
llvm/lib/Analysis/ScalarEvolution.cpp
9372

IIUC there a reason we cannot use isMonotonicPredicate here is that it is not using the information from MaxIter (which is the max exit count in the case of this patch)? Would it be possible move the logic to use the max exit count of a loop into isMonotonicPredicate?

Still digging through the main logic. In the meantime see some minor comments inline.

llvm/include/llvm/Analysis/ScalarEvolution.h
969

Style. Suggest to drop At from the name so as to make it more readable isLoopInvariantPredicateDuringFirstIterations.

llvm/lib/Transforms/Scalar/IndVarSimplify.cpp
2335

Looks like a separable enhancement.

2429–2442

Separable refactoring?

apilipenko added inline comments.Oct 22 2020, 5:00 PM
llvm/lib/Analysis/ScalarEvolution.cpp
9368

Do we actually need to pass MaxIter? We already have the loop, we can check the max iterations for the loop inside of this function.

If we make this interface change, isLoopInvariantPredicateAtDuringFirstIterations interface becomes very similar to isLoopInvariantPredicate. At that point we can add context to isLoopInvariantPredicate and implement the proposed logic in isLoopInvariantPredicate.

mkazantsev added inline comments.Oct 23 2020, 4:11 AM
llvm/lib/Analysis/ScalarEvolution.cpp
9372

The main reason is that isMonotonicPredicate checks no-wrap flag, and we are interested in unsigned range for IV with negative step. Formally, it overflows on every iteration. So isMonotonicPredicate cannot deal with it.

Theoretically we could expand isMonotonicPredicate, making it smarter and able to handle this. But it is used in 3 different transforms, and such change would have unpredictable impact. So I'd rather do it separately.

mkazantsev added inline comments.Oct 23 2020, 4:14 AM
llvm/lib/Analysis/ScalarEvolution.cpp
9368

We do not necessarily want to evaluate the last iteration. In some cases (and this is in follow-up patches), if we prove that there are two exits with same iteration count, the 2nd one will not execute on the last iteration. So we might want to check the condition for pre-last iteration for it.

mkazantsev added inline comments.Oct 23 2020, 4:16 AM
llvm/lib/Analysis/ScalarEvolution.cpp
9368

Besides, before this code was written, max iter computation was a costly operation (I've added caching for it just today). But per argument above, I don't think we want to compute it inside.

I agree that isLoopInvariantPredicate might be remade into calling this function with last iteration. Let's not to API changes in the patches that are not about API changes. It can go separately after this work has concluded.

mkazantsev added inline comments.Oct 25 2020, 9:17 PM
llvm/lib/Transforms/Scalar/IndVarSimplify.cpp
2429–2442

No. The old version does not use the max iter. The only difference of the new code is that it does.

mkazantsev added inline comments.Oct 25 2020, 9:26 PM
llvm/include/llvm/Analysis/ScalarEvolution.h
969

No. At here means that this predicate is not true everywhere, but at specific point.

mkazantsev added inline comments.Oct 25 2020, 10:20 PM
llvm/lib/Transforms/Scalar/IndVarSimplify.cpp
2335

Will do.

apilipenko added inline comments.Oct 25 2020, 10:24 PM
llvm/lib/Transforms/Scalar/IndVarSimplify.cpp
2429–2442

You can still extract the lambda and add a new parameter in the functional patch.

Updated function's name, removing At and adding the notion of the fact that it's not just a random predicate, but an exit condition.

mkazantsev added inline comments.Oct 26 2020, 12:36 AM
llvm/lib/Transforms/Scalar/IndVarSimplify.cpp
2351

It's more complex than that. We can't set nuw on SCEVAddRec with step -1. But we still can prove the condition {len,+,-1} <= u len is monotonic.

Allright...

apilipenko accepted this revision.Oct 26 2020, 5:23 PM

Accepting this change to unblock the progress.

Please, follow up with commoning this with the existing monotonic predicate functionality.

I plan expanding logic of this code to handle more cases. Once all planned enhancements are made, we can think about merging it with isMonotonicPredicate.

This revision was not accepted when it landed; it landed in state Needs Review.Oct 26 2020, 9:36 PM
This revision was automatically updated to reflect the committed changes.

This breaks stage2 compilation on Green Dragon:

/Users/buildslave/jenkins/workspace/lldb-cmake/host-compiler/bin/clang++  -DGTEST_HAS_RTTI=0 -D_DEBUG -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Ilib/Target/AArch64 -I/Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/lib/Target/AArch64 -Iinclude -I/Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/include -Wdocumentation -fPIC -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -fmodules -fmodules-cache-path=/Users/buildslave/jenkins/workspace/lldb-cmake/lldb-build/module.cache -fcxx-modules -Xclang -fmodules-local-submodule-visibility -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -fdiagnostics-color -O3  -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.15.sdk    -fno-exceptions -fno-rtti -UNDEBUG -std=c++14 -MD -MT lib/Target/AArch64/CMakeFiles/LLVMAArch64CodeGen.dir/GISel/AArch64RegisterBankInfo.cpp.o -MF lib/Target/AArch64/CMakeFiles/LLVMAArch64CodeGen.dir/GISel/AArch64RegisterBankInfo.cpp.o.d -o lib/Target/AArch64/CMakeFiles/LLVMAArch64CodeGen.dir/GISel/AArch64RegisterBankInfo.cpp.o -c /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/lib/Target/AArch64/GISel/AArch64RegisterBankInfo.cpp
Assertion failed: (WeightSum <= UINT32_MAX && "Expected weights to scale down to 32 bits"), function calcMetadataWeights, file /Users/buildslave/jenkins/workspace/clang-stage1-RA/llvm-project/llvm/lib/Analysis/BranchProbabilityInfo.cpp, line 493.
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.	Program arguments: /Users/buildslave/jenkins/workspace/lldb-cmake/host-compiler/bin/clang++ -DGTEST_HAS_RTTI=0 -D_DEBUG -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Ilib/Target/AArch64 -I/Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/lib/Target/AArch64 -Iinclude -I/Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/include -Wdocumentation -fPIC -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -fmodules -fmodules-cache-path=/Users/buildslave/jenkins/workspace/lldb-cmake/lldb-build/module.cache -fcxx-modules -Xclang -fmodules-local-submodule-visibility -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -fdiagnostics-color -O3 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.15.sdk -fno-exceptions -fno-rtti -UNDEBUG -std=c++14 -MD -MT lib/Target/AArch64/CMakeFiles/LLVMAArch64CodeGen.dir/GISel/AArch64RegisterBankInfo.cpp.o -MF lib/Target/AArch64/CMakeFiles/LLVMAArch64CodeGen.dir/GISel/AArch64RegisterBankInfo.cpp.o.d -o lib/Target/AArch64/CMakeFiles/LLVMAArch64CodeGen.dir/GISel/AArch64RegisterBankInfo.cpp.o -c /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/lib/Target/AArch64/GISel/AArch64RegisterBankInfo.cpp 
1.	<eof> parser at end of file
2.	Per-module optimization passes
3.	Running pass 'CallGraph Pass Manager' on module '/Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/lib/Target/AArch64/GISel/AArch64RegisterBankInfo.cpp'.
4.	Running pass 'Branch Probability Analysis' on function '@"_ZZN4llvm23AArch64RegisterBankInfoC1ERKNS_18TargetRegisterInfoEENK3$_2clEv"'
Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it):
0  clang++                  0x000000010f1b5a6b llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) + 43
1  clang++                  0x000000010f1b4a95 llvm::sys::RunSignalHandlers() + 85
2  clang++                  0x000000010f1b51f2 llvm::sys::CleanupOnSignal(unsigned long) + 210
3  clang++                  0x000000010f0fdb0a (anonymous namespace)::CrashRecoveryContextImpl::HandleCrash(int, unsigned long) + 106
4  clang++                  0x000000010f0fdc87 CrashRecoverySignalHandler(int) + 135
5  libsystem_platform.dylib 0x00007fff6b6575fd _sigtramp + 29
6  libsystem_platform.dylib 0x00000001205ae380 _sigtramp + 18446603343552146848
7  libsystem_c.dylib        0x00007fff6b529808 abort + 120
8  libsystem_c.dylib        0x00007fff6b528ac6 err + 0
9  clang++                  0x0000000111de3973 llvm::BranchProbabilityInfo::calcMetadataWeights(llvm::BasicBlock const*) (.cold.17) + 35
10 clang++                  0x000000010e2cdafa llvm::BranchProbabilityInfo::calcMetadataWeights(llvm::BasicBlock const*) + 2186
11 clang++                  0x000000010e2d0b64 llvm::BranchProbabilityInfo::calculate(llvm::Function const&, llvm::LoopInfo const&, llvm::TargetLibraryInfo const*, llvm::PostDominatorTree*) + 1316
12 clang++                  0x000000010e2d11e2 llvm::BranchProbabilityInfoWrapperPass::runOnFunction(llvm::Function&) + 322
13 clang++                  0x000000010ea0f1d4 llvm::FPPassManager::runOnFunction(llvm::Function&) + 1092
14 clang++                  0x000000010e2fc3e4 (anonymous namespace)::CGPassManager::runOnModule(llvm::Module&) + 1588
15 clang++                  0x000000010ea0f7ca llvm::legacy::PassManagerImpl::run(llvm::Module&) + 986
16 clang++                  0x000000010f424844 clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::DataLayout const&, llvm::Module*, clang::BackendAction, std::__1::unique_ptr<llvm::raw_pwrite_stream, std::__1::default_delete<llvm::raw_pwrite_stream> >) + 12820
17 clang++                  0x000000010f6ee76a clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) + 1130
18 clang++                  0x00000001108bee63 clang::ParseAST(clang::Sema&, bool, bool) + 643
19 clang++                  0x000000010f9d5064 clang::FrontendAction::Execute() + 84
20 clang++                  0x000000010f977b83 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) + 2275
21 clang++                  0x000000010fa4d9d7 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) + 1639
22 clang++                  0x000000010d202d35 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) + 2309
23 clang++                  0x000000010d200999 ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&) + 377
24 clang++                  0x000000010f811687 void llvm::function_ref<void ()>::callback_fn<clang::driver::CC1Command::Execute(llvm::ArrayRef<llvm::Optional<llvm::StringRef> >, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >*, bool*) const::$_1>(long) + 23
25 clang++                  0x000000010f0fda54 llvm::CrashRecoveryContext::RunSafely(llvm::function_ref<void ()>) + 228
26 clang++                  0x000000010f810d3d clang::driver::CC1Command::Execute(llvm::ArrayRef<llvm::Optional<llvm::StringRef> >, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >*, bool*) const + 429
27 clang++                  0x000000010f7e39cd clang::driver::Compilation::ExecuteCommand(clang::driver::Command const&, clang::driver::Command const*&) const + 221
28 clang++                  0x000000010f7e3eed clang::driver::Compilation::ExecuteJobs(clang::driver::JobList const&, llvm::SmallVectorImpl<std::__1::pair<int, clang::driver::Command const*> >&) const + 125
29 clang++                  0x000000010f7fa9cc clang::driver::Driver::ExecuteCompilation(clang::driver::Compilation&, llvm::SmallVectorImpl<std::__1::pair<int, clang::driver::Command const*> >&) + 204
30 clang++                  0x000000010d2000c5 main + 10517
31 libdyld.dylib            0x00007fff6b45acc9 start + 1
32 libdyld.dylib            0x0000000000000034 start + 18446603338716435308
clang-12: error: clang frontend command failed with exit code 134 (use -v to see invocation)
clang version 12.0.0 (https://github.com/llvm/llvm-project.git 673f2f702b03be8c003889cbb5923e111c3e24d0)
Target: x86_64-apple-darwin19.5.0
Thread model: posix
InstalledDir: /Users/buildslave/jenkins/workspace/lldb-cmake/host-compiler/bin
clang-12: note: diagnostic msg: 
********************

PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
Preprocessed source(s) and associated run script(s) are located at:
clang-12: note: diagnostic msg: /var/folders/09/r4vw4v8n5kb67jl66zvlbljw0000gn/T/AArch64RegisterBankInfo-189543.cpp
clang-12: note: diagnostic msg: /var/folders/09/r4vw4v8n5kb67jl66zvlbljw0000gn/T/AArch64RegisterBankInfo-189543.cache
clang-12: note: diagnostic msg: /var/folders/09/r4vw4v8n5kb67jl66zvlbljw0000gn/T/AArch64RegisterBankInfo-189543.sh
clang-12: note: diagnostic msg: Crash backtrace is located in
clang-12: note: diagnostic msg: /Users/buildslave/Library/Logs/DiagnosticReports/clang-12_<YYYY-MM-DD-HHMMSS>_<hostname>.crash
clang-12: note: diagnostic msg: (choose the .crash file that corresponds to your crash)
clang-12: note: diagnostic msg:

I uploaded a reproducer for the crash here: https://teemperor.de/pub/clang-crash-D87832-repro.zip

I'll revert this in the meantime to get the bots running again.

Reverted this and a0d84d80315d0c725b5efcd889928bad1171ba56 (as it seems to be a follow-up cleanup patch) in e038b60d9169733367393f733058f0ff23c28d3f

This assert is highly confusing. I'll investigate.

mkazantsev reopened this revision.Oct 28 2020, 3:45 AM

I was able to successfully build clang stage 2. The attached repro doesn't reproduce the failure either.

The failure doesn't reproruce and looks related to this story:

commit 2a4e704c92e8ec3d9217a7333368ea53cf3a583f
Author: Nico Weber <thakis@chromium.org>
Date:   Tue Oct 27 09:18:42 2020 -0400

    Revert "Use uint64_t for branch weights instead of uint32_t"

    This reverts commit e5766f25c62c185632e3a75bf45b313eadab774b.
    Makes clang assert when building Chromium, see https://crbug.com/1142813
    for a repro.

commit e5766f25c62c185632e3a75bf45b313eadab774b
Author: Arthur Eubanks <aeubanks@google.com>
Date:   Wed Sep 30 12:11:46 2020 -0700

    Use uint64_t for branch weights instead of uint32_t

    CallInst::updateProfWeight() creates branch_weights with i64 instead of i32.
    To be more consistent everywhere and remove lots of casts from uint64_t
    to uint32_t, use i64 for branch_weights.

    Reviewed By: davidxl

    Differential Revision: https://reviews.llvm.org/D88609

I am returning the change as the revert was erroneous.

@teemperor please make sure you're reverting the patch that actually causes the failure...

This revision was not accepted when it landed; it landed in state Needs Review.Oct 28 2020, 4:52 AM
This revision was automatically updated to reflect the committed changes.

@teemperor please make sure you're reverting the patch that actually causes the failure...

Apologies, not sure how my bisecting found your commit...