This is an archive of the discontinued LLVM Phabricator instance.

[SCEV][reland] More precise trip multiples
ClosedPublic

Authored by caojoshua on Apr 29 2023, 1:24 PM.

Details

Summary

We currently have getMinTrailingZeros(), from which we can get a SCEV's
multiple by computing 1 << MinTrailingZeroes. However, this only gets us
multiples that are a power of 2. This patch introduces a way to get max
constant multiples that are not just a power of 2. The logic is similar
to that of getMinTrailingZeros. getMinTrailingZerosImpl is replaced by
computing the max constant multiple, and counting the number of trailing
bits.

I have so far found this useful in two places:

  1. Computing unsigned constant ranges. For example, if we have i8 {10,+,10}<nuw>, we know the max constant it can be is 250.
  1. My original intent was to use this in getSmallConstantTripMultiples, but it has no effect right now due to change from D110587. For example, if we have backedge count (6 * %N) - 1, the trip count becomes 1 + zext((6 * %N) - 1), and we cannot say that 6 is a multiple of the SCEV. I plan to look further into this separately.

The implementation assumes the value is unsigned. It can probably be
extended to handle signed values as well.

If the code sees that a SCEV does not have <nuw>, it will fall back to
finding the max multiple that is a power of 2. Multiples that are a
power of 2 will still be a multiple even after the SCEV overflows. This
does not apply to other values. This is the 1st commit message:


This relands https://reviews.llvm.org/D141823. The verification fails
when expensive checks are turned on. This can occur when:

  1. SCEV S's multiple is cached
  2. SCEV S's no wrap flags are strengthened, and the multiple changes
  3. SCEV verifier finds that S's cached and recomputed multiple are different

We eliminate most cases by forgetting SCEVAddRecExpr's cached values
when the flags are modified, but there are still cases for other SCEV
types. We relax the check by making sure the cached multiple divides the
recomputed multiple, ensuring the cached multiple is correct,
conservative multiple.

Diff Detail

Event Timeline

caojoshua created this revision.Apr 29 2023, 1:24 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 29 2023, 1:24 PM

An interesting non-SCEVAddRecExpr case that failures the verification is IndVarSimplify/udiv.ll. In verification, we try to compute the multiple of (8193 smax {6,+,3}<nuw><%for.body15>), which is 3. When originally caching the value for the SCEV, we had (8193 smax {6,+,3}<%for.body15>), which due to the lack of nuw, has a multiple of 1.

The only write operation to a SCEV that can affect its multiple is its wrap flags, or the wrap flags of its operands. The operands themselves never change. If only the flags change, the originally cached multiple should divide the recomputed multiple.

These cases are rare, and I'd say its ok for the cached multiple to not always be the best answer, and sometimes be a conservative answer.

caojoshua published this revision for review.Apr 29 2023, 2:13 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 29 2023, 2:13 PM
mkazantsev accepted this revision.May 4 2023, 9:28 PM

Do we know if we can forget smth when strenghtening flags without harming the compile time to get the more precise resutlt in these cases?

This revision is now accepted and ready to land.May 4 2023, 9:28 PM

Do we know if we can forget smth when strenghtening flags without harming the compile time to get the more precise resutlt in these cases?

In this patch we forget it specifically for getRecAddExpr's, which accounts for most use cases and should not be expensive. For other less common use cases, forgetting would be too expensive. In the example I have above:

An interesting non-SCEVAddRecExpr case that failures the verification is IndVarSimplify/udiv.ll. In verification, we try to compute the multiple of (8193 smax {6,+,3}<nuw><%for.body15>), which is 3. When originally caching the value for the SCEV, we had (8193 smax {6,+,3}<%for.body15>), which due to the lack of nuw, has a multiple of 1.

When we add the addition, we would need to forget the addition and all uses of the addition as well, which is the smax in this case. In worse case, there can be long chains of SCEV users that would need to be forgotten. I'd say its too compile-time expensive for very little benefit.

This revision was landed with ongoing or failed builds.May 7 2023, 10:02 PM
This revision was automatically updated to reflect the committed changes.

Hello,

The following starts crashing with this commit:

opt -disable-loop-unrolling -verify-scev -passes="module(default<Os>)" bbi-83087.ll -o /dev/null

It crashes with

opt: ../lib/Support/APInt.cpp:1667: llvm::APInt llvm::APInt::urem(const llvm::APInt &) const: Assertion `RHS.U.VAL != 0 && "Remainder by zero?"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: ../../main-github/llvm/build-all/bin/opt -disable-loop-unrolling -verify-scev -passes=module(default<Os>) bbi-83087.ll -o /dev/null
 #0 0x0000561a0f6c6ec7 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (../../main-github/llvm/build-all/bin/opt+0x2d8eec7)
 #1 0x0000561a0f6c4bee llvm::sys::RunSignalHandlers() (../../main-github/llvm/build-all/bin/opt+0x2d8cbee)
 #2 0x0000561a0f6c755f SignalHandler(int) (../../main-github/llvm/build-all/bin/opt+0x2d8f55f)
 #3 0x00007f0da7660630 __restore_rt (/lib64/libpthread.so.0+0xf630)
 #4 0x00007f0da4da7387 raise (/lib64/libc.so.6+0x36387)
 #5 0x00007f0da4da8a78 abort (/lib64/libc.so.6+0x37a78)
 #6 0x00007f0da4da01a6 __assert_fail_base (/lib64/libc.so.6+0x2f1a6)
 #7 0x00007f0da4da0252 (/lib64/libc.so.6+0x2f252)
 #8 0x0000561a0f61d076 llvm::APInt::urem(llvm::APInt const&) const (../../main-github/llvm/build-all/bin/opt+0x2ce5076)
 #9 0x0000561a0e87bcb8 llvm::ScalarEvolution::verify() const (../../main-github/llvm/build-all/bin/opt+0x1f43cb8)
#10 0x0000561a101f6d06 llvm::FunctionToLoopPassAdaptor::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (../../main-github/llvm/build-all/bin/opt+0x38bed06)
#11 0x0000561a0f8e7d5d llvm::detail::PassModel<llvm::Function, llvm::FunctionToLoopPassAdaptor, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function> >::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (../../main-github/llvm/build-all/bin/opt+0x2fafd5d)
#12 0x0000561a0f0a5e84 llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function> >::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (../../main-github/llvm/build-all/bin/opt+0x276de84)
#13 0x0000561a0d55aa3d llvm::detail::PassModel<llvm::Function, llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function> >, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function> >::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (../../main-github/llvm/build-all/bin/opt+0xc22a3d)
#14 0x0000561a0e677c0f llvm::CGSCCToFunctionPassAdaptor::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&) (../../main-github/llvm/build-all/bin/opt+0x1d3fc0f)
#15 0x0000561a0d55c5cd llvm::detail::PassModel<llvm::LazyCallGraph::SCC, llvm::CGSCCToFunctionPassAdaptor, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&>::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&) (../../main-github/llvm/build-all/bin/opt+0xc245cd)
#16 0x0000561a0e6724be llvm::PassManager<llvm::LazyCallGraph::SCC, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&>::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&) (../../main-github/llvm/build-all/bin/opt+0x1d3a4be)
#17 0x0000561a0f8d032d llvm::detail::PassModel<llvm::LazyCallGraph::SCC, llvm::PassManager<llvm::LazyCallGraph::SCC, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&>, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&>::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&) (../../main-github/llvm/build-all/bin/opt+0x2f9832d)
#18 0x0000561a0e675dd5 llvm::DevirtSCCRepeatedPass::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&) (../../main-github/llvm/build-all/bin/opt+0x1d3ddd5)
#19 0x0000561a0f8e9d5d llvm::detail::PassModel<llvm::LazyCallGraph::SCC, llvm::DevirtSCCRepeatedPass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&>::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&) (../../main-github/llvm/build-all/bin/opt+0x2fb1d5d)
#20 0x0000561a0e6744ca llvm::ModuleToPostOrderCGSCCPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (../../main-github/llvm/build-all/bin/opt+0x1d3c4ca)
#21 0x0000561a0f8d05cd llvm::detail::PassModel<llvm::Module, llvm::ModuleToPostOrderCGSCCPassAdaptor, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (../../main-github/llvm/build-all/bin/opt+0x2f985cd)
#22 0x0000561a0f0a5014 llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (../../main-github/llvm/build-all/bin/opt+0x276d014)
#23 0x0000561a0f9f437d llvm::ModuleInlinerWrapperPass::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (../../main-github/llvm/build-all/bin/opt+0x30bc37d)
#24 0x0000561a0f8d6e8d llvm::detail::PassModel<llvm::Module, llvm::ModuleInlinerWrapperPass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (../../main-github/llvm/build-all/bin/opt+0x2f9ee8d)
#25 0x0000561a0f0a5014 llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (../../main-github/llvm/build-all/bin/opt+0x276d014)
#26 0x0000561a0d196e8e llvm::runPassPipeline(llvm::StringRef, llvm::Module&, llvm::TargetMachine*, llvm::TargetLibraryInfoImpl*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::StringRef, llvm::ArrayRef<llvm::PassPlugin>, llvm::opt_tool::OutputKind, llvm::opt_tool::VerifierKind, bool, bool, bool, bool, bool, bool) (../../main-github/llvm/build-all/bin/opt+0x85ee8e)
#27 0x0000561a0d1a555e main (../../main-github/llvm/build-all/bin/opt+0x86d55e)
#28 0x00007f0da4d93555 __libc_start_main (/lib64/libc.so.6+0x22555)
#29 0x0000561a0d1915e0 _start (../../main-github/llvm/build-all/bin/opt+0x8595e0)
Abort (core dumped)

dstenb added a subscriber: dstenb.May 30 2023, 9:19 AM
llvm/lib/Analysis/ScalarEvolution.cpp
14302

It's this urem call that crashes in the comment I made yesterday.

Multiple is 0 and doing urem with RHS being 0 hits the assertion since dividing by 0 isn't good.

Are we perhaps missing a negation of the condition

(Multiple == 0 || RecomputedMultiple == 0)

?
Now we do the urem(Multiple) specifically if Multiple is 0, which we should avoid.

caojoshua added inline comments.May 31 2023, 12:25 AM
llvm/lib/Analysis/ScalarEvolution.cpp
14302

Its due to returning a zero from too many trailing zeros. I am going to rewrite this a bit, it should be verifying on getNonZeroConstantMultiple().

I am going to run this through expensive checks.

caojoshua added inline comments.May 31 2023, 2:11 AM
llvm/lib/Analysis/ScalarEvolution.cpp
14302

Correction: I looked at things wrong. The current verification requires that recomputed multiples are stronger than previous multiples, but that is not the case here. It turns out a multiple can become weaker if due to dependence on ComputeKnownBits(). As the IR transforms, its possible that ComputeKnownBits() becomes weaker due to limitations in depth.

I have local changes that relaxes the verification and passes the provided test case. Testing with expensive checks takes a long time on my machine and I won't be able to push today.

caojoshua marked an inline comment as done.May 31 2023, 9:04 PM
caojoshua added inline comments.