This is an archive of the discontinued LLVM Phabricator instance.

[SCEV] Construct SCEV iteratively.
ClosedPublic

Authored by fhahn on Nov 26 2021, 10:09 AM.

Details

Summary

This patch updates SCEV construction to work iteratively instead of recursively
in most cases. It resolves stack overflow issues when trying to construct SCEVs
for certain inputs, e.g. PR45201.

The basic approach is to to use a worklist to queue operands of V which
need to be created before V. To do so, the current patch adds a
getOperandsToCreate function which collects the operands SCEV
construction depends on for a given value. This is a slight duplication
with createSCEV.

At the moment, SCEVs for phis are still created recursively.

Fixes #32078, #42594, #44546, #49293, #49599, #55333, #55511

Diff Detail

Event Timeline

fhahn created this revision.Nov 26 2021, 10:09 AM
fhahn requested review of this revision.Nov 26 2021, 10:09 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 26 2021, 10:09 AM
lebedev.ri added a comment.EditedNov 26 2021, 10:25 AM

Thanks for looking into this!
I previously looked into this: https://github.com/LebedevRI/llvm-project/commit/6b006aa21caf018aa0f280828899d510274c8444
... but as it is apparent, never posted the diff. Nonetheless, it may be interesting to look at.

The most interesting question is, do we really want to avoid (indirect self-) recursion (through getSCEV)?
Then the api of ScalarEvolution needs more changes than what the current diff contains,
basically everything that can end up creating new SCEV expressions from IR
needs to be able to return "in-progress, requery me later" status.

nikic added a comment.Nov 28 2021, 1:02 PM

To ask the obvious question: Can we get away with simply adding a getSCEV recursion limit instead? Is there reason to believe that a sensible recursion cutoff (similar to the many SCEV already has) would adversely affect analysis quality in practice? I'd rather avoid the additional complexity if we can.

To ask the obvious question: Can we get away with simply adding a getSCEV recursion limit instead? Is there reason to believe that a sensible recursion cutoff (similar to the many SCEV already has) would adversely affect analysis quality in practice? I'd rather avoid the additional complexity if we can.

Recursion itself is rarely the right solution in the first place, let alone working around it by introducing random cut-offs.

fhahn added a comment.Nov 29 2021, 2:55 AM

Thanks for looking into this!
I previously looked into this: https://github.com/LebedevRI/llvm-project/commit/6b006aa21caf018aa0f280828899d510274c8444
... but as it is apparent, never posted the diff. Nonetheless, it may be interesting to look at.

The most interesting question is, do we really want to avoid (indirect self-) recursion (through getSCEV)?
Then the api of ScalarEvolution needs more changes than what the current diff contains,
basically everything that can end up creating new SCEV expressions from IR
needs to be able to return "in-progress, requery me later" status.

Thanks for sharing the patch. I think the current patch doesn't really need to change any existing interfaces, it just introduces a new expectation: before creating a SCEV for I, SCEVs for all of I's operands must already exist. It's up to the callers of createSCEV (and friends) to ensure that. The only additional complexity is 1) extracting operands for which we need to create SCEVs and 2) processing them in the right order.

At the moment, 1) is a bit unfortunate, because it requires some duplication of createSCEV logic, but overall it's not too bad (if no other interfaces need modifying IMO). I think we may be able to improve/unify the logic there as well, but I think first it would be good to get agreement on whether we want to handle the issue by constructing SCEVs iteratively or not. (Getting this patch to work for all cases, cleaning it up and making sure its as fast as possible will be quite a bit of additional work, which I'd only like to do once we it's clear that this is the overall direction we want to pursue :))

To ask the obvious question: Can we get away with simply adding a getSCEV recursion limit instead? Is there reason to believe that a sensible recursion cutoff (similar to the many SCEV already has) would adversely affect analysis quality in practice? I'd rather avoid the additional complexity if we can.

Recursion itself is rarely the right solution in the first place, let alone working around it by introducing random cut-offs.

I think one issue with a cut-off for the recursion is that such a cut-off would make SCEV instantiations depend on the instantiation order. I am not sure how much this would be an issue in practice, but among other things it may make verification of SCEV harder (e.g. if we compute SCEVs from scratch to the cached version we would have to make sure to use the same evaluation order as when computing the original SCEV).

fhahn added a comment.Nov 29 2021, 3:01 AM

>>>! In D114650#3157263, @lebedev.ri wrote:

To ask the obvious question: Can we get away with simply adding a getSCEV recursion limit instead? Is there reason to believe that a sensible recursion cutoff (similar to the many SCEV already has) would adversely affect analysis quality in practice? I'd rather avoid the additional complexity if we can.

Recursion itself is rarely the right solution in the first place, let alone working around it by introducing random cut-offs.

I think one issue with a cut-off for the recursion is that such a cut-off would make SCEV instantiations depend on the instantiation order. I am not sure how much this would be an issue in practice, but among other things it may make verification of SCEV harder (e.g. if we compute SCEVs from scratch to the cached version we would have to make sure to use the same evaluation order as when computing the original SCEV).

https://bugs.llvm.org/show_bug.cgi?id=32731 has a bit more discussion/context on the potential issues of a cut-off.

I want to make sure I understand the problem to be solved. The description and discussion confused me a bit, so I may in fact be wildly off here.

Is the basis issue being solved that calling createSCEV on some IR instruction can require recursing through a very deep chain of instructions which have not yet had SCEVs formed for them? More specifically, is that the *only* deep recursion we're concerned by? The discussion about new invariants for creating a SCEV seemed to indicate it might be much wider, but that discussion didn't parse for me.

If my understanding is correct, then I would have expected this to be effectively a change strictly inside createSCEV. The fact we seem to be changing things much more broadly confuses me. Maybe there's some complexity coming from the phi handling, but if so, maybe a first patch which *only* iteratively constructs arithmetic chains and relies on recursion for the tricky phi parts?

Putting this here only because the bug database is still readonly.

Random thought: We're discussing this as a SCEV cornercase, and it's probably worth fixing the case this exposes, but it's also worth noting here that the unroller is producing a horrendously bad form here. It might be worth considering whether the unroller should be applying a peephole as it unrolls to avoid generating this massive and utterly useless IR.

Specifically for this case: if generating an unrolled IV increment with no other uses of the IV, eagerly fold the step increments if constant. That is, instead of producing:

%a = gep i8, i8 %p, i64 1
%b = gep i8, i8 %b, i64 1
%c = gep i8, i8 %c, i64 1
%d = gep i8, i8 %d, i64 1

produce:

%a = gep i8, i8 %p, i64 1
%b = gep i8, i8 %p, i64 2
%c = gep i8, i8 %p, i64 3
%d = gep i8, i8 %p, i64 4
(All of which but the last are dead.)

Doing this might be a worthwhile compile time optimization on it's own.

I want to make sure I understand the problem to be solved. The description and discussion confused me a bit, so I may in fact be wildly off here.

Is the basis issue being solved that calling createSCEV on some IR instruction can require recursing through a very deep chain of instructions which have not yet had SCEVs formed for them? More specifically, is that the *only* deep recursion we're concerned by? The discussion about new invariants for creating a SCEV seemed to indicate it might be much wider, but that discussion didn't parse for me.

Yes, this is the main issue I'd like to address with this change, thanks for summarising this concisely!

If my understanding is correct, then I would have expected this to be effectively a change strictly inside createSCEV. The fact we seem to be changing things much more broadly confuses me. Maybe there's some complexity coming from the phi handling, but if so, maybe a first patch which *only* iteratively constructs arithmetic chains and relies on recursion for the tricky phi parts?

I think most of the 'cluttering' changes are getSCEV -> getExistingSCEV. Those are not strictly necessary and I should be able to strip those from the initial version, same for excluding the phi part.

The getSCEV -> getExistingSCEV changes are related to the invariant mentioned: if we iteratively construct the operands first, then we should be able to rely on the fact that SCEVs for all operands exist when construction a new SCEV, hence using getExistingSCEV instead of getSCEV.

fhahn planned changes to this revision.Jan 17 2022, 9:39 AM
fhahn updated this revision to Diff 400588.Jan 17 2022, 9:39 AM

Just a rebase so this can be applied on top of main.

I tried this patch to see if it solves the issue reported in https://github.com/llvm/llvm-project/issues/49293#issuecomment-981040924

It fails with the following error - and a request to submit a bug report:

PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace.
 #0 0x0000000000801b33 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/usr/lib/llvm-14/bin/ld.lld+0x801b33)
 #1 0x00000000007ffb0e llvm::sys::RunSignalHandlers() (/usr/lib/llvm-14/bin/ld.lld+0x7ffb0e)
 #2 0x000000000080212f SignalHandler(int) Signals.cpp:0:0
 #3 0x00007f3012f663c0 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x153c0)
 #4 0x0000000002d3cd12 CompareSCEVComplexity(llvm::EquivalenceClasses<llvm::SCEV const*, std::less<llvm::SCEV const*> >&, llvm::EquivalenceClasses<llvm::Value const*, std::less<llvm::Value const*> >&, llvm::LoopInfo const*, llvm::SCEV const*, llvm::SCEV const*, llvm::DominatorTree&, unsigned int) ScalarEvolution.cpp:0:0
 #5 0x0000000002d0044f GroupByComplexity(llvm::SmallVectorImpl<llvm::SCEV const*>&, llvm::LoopInfo*, llvm::DominatorTree&) ScalarEvolution.cpp:0:0
 #6 0x0000000002cf4608 llvm::ScalarEvolution::getAddExpr(llvm::SmallVectorImpl<llvm::SCEV const*>&, llvm::SCEV::NoWrapFlags, unsigned int) (/usr/lib/llvm-14/bin/ld.lld+0x2cf4608)
 #7 0x0000000002d14f67 llvm::ScalarEvolution::createSCEV(llvm::Value*) (/usr/lib/llvm-14/bin/ld.lld+0x2d14f67)
 #8 0x0000000002d0714f llvm::ScalarEvolution::createSCEVIter(llvm::Value*) (/usr/lib/llvm-14/bin/ld.lld+0x2d0714f)
 #9 0x00000000025c508e llvm::SLPVectorizerPass::vectorizeGEPIndices(llvm::BasicBlock*, llvm::slpvectorizer::BoUpSLP&) (/usr/lib/llvm-14/bin/ld.lld+0x25c508e)
#10 0x00000000025c280c llvm::SLPVectorizerPass::runImpl(llvm::Function&, llvm::ScalarEvolution*, llvm::TargetTransformInfo*, llvm::TargetLibraryInfo*, llvm::AAResults*, llvm::LoopInfo*, llvm::DominatorTree*, llvm::AssumptionCache*, llvm::DemandedBits*, llvm::OptimizationRemarkEmitter*) (/usr/lib/llvm-14/bin/ld.lld+0x25c280c)
#11 0x00000000025c20f1 llvm::SLPVectorizerPass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/usr/lib/llvm-14/bin/ld.lld+0x25c20f1)
#12 0x00000000022c7b5d llvm::detail::PassModel<llvm::Function, llvm::SLPVectorizerPass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function> >::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/usr/lib/llvm-14/bin/ld.lld+0x22c7b5d)
#13 0x0000000003181b59 llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function> >::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/usr/lib/llvm-14/bin/ld.lld+0x3181b59)
#14 0x0000000000dfe32d llvm::detail::PassModel<llvm::Function, llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function> >, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function> >::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/usr/lib/llvm-14/bin/ld.lld+0xdfe32d)
#15 0x00000000031853f5 llvm::ModuleToFunctionPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/usr/lib/llvm-14/bin/ld.lld+0x31853f5)
#16 0x0000000000dfe17d llvm::detail::PassModel<llvm::Module, llvm::ModuleToFunctionPassAdaptor, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/usr/lib/llvm-14/bin/ld.lld+0xdfe17d)
#17 0x0000000003180d16 llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/usr/lib/llvm-14/bin/ld.lld+0x3180d16)
#18 0x0000000001e7311c llvm::lto::opt(llvm::lto::Config const&, llvm::TargetMachine*, unsigned int, llvm::Module&, bool, llvm::ModuleSummaryIndex*, llvm::ModuleSummaryIndex const*, std::vector<unsigned char, std::allocator<unsigned char> > const&) (/usr/lib/llvm-14/bin/ld.lld+0x1e7311c)
#19 0x0000000001e75a42 llvm::lto::thinBackend(llvm::lto::Config const&, unsigned int, std::function<llvm::Expected<std::unique_ptr<llvm::CachedFileStream, std::default_delete<llvm::CachedFileStream> > > (unsigned int)>, llvm::Module&, llvm::ModuleSummaryIndex const&, llvm::StringMap<std::unordered_set<unsigned long, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<unsigned long> >, llvm::MallocAllocator> const&, llvm::DenseMap<unsigned long, llvm::GlobalValueSummary*, llvm::DenseMapInfo<unsigned long, void>, llvm::detail::DenseMapPair<unsigned long, llvm::GlobalValueSummary*> > const&, llvm::MapVector<llvm::StringRef, llvm::BitcodeModule, llvm::DenseMap<llvm::StringRef, unsigned int, llvm::DenseMapInfo<llvm::StringRef, void>, llvm::detail::DenseMapPair<llvm::StringRef, unsigned int> >, std::vector<std::pair<llvm::StringRef, llvm::BitcodeModule>, std::allocator<std::pair<llvm::StringRef, llvm::BitcodeModule> > > >*, std::vector<unsigned char, std::allocator<unsigned char> > const&)::$_3::operator()(llvm::Module&, llvm::TargetMachine*, std::unique_ptr<llvm::ToolOutputFile, std::default_delete<llvm::ToolOutputFile> >) const LTOBackend.cpp:0:0
#20 0x0000000001e758bf llvm::lto::thinBackend(llvm::lto::Config const&, unsigned int, std::function<llvm::Expected<std::unique_ptr<llvm::CachedFileStream, std::default_delete<llvm::CachedFileStream> > > (unsigned int)>, llvm::Module&, llvm::ModuleSummaryIndex const&, llvm::StringMap<std::unordered_set<unsigned long, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<unsigned long> >, llvm::MallocAllocator> const&, llvm::DenseMap<unsigned long, llvm::GlobalValueSummary*, llvm::DenseMapInfo<unsigned long, void>, llvm::detail::DenseMapPair<unsigned long, llvm::GlobalValueSummary*> > const&, llvm::MapVector<llvm::StringRef, llvm::BitcodeModule, llvm::DenseMap<llvm::StringRef, unsigned int, llvm::DenseMapInfo<llvm::StringRef, void>, llvm::detail::DenseMapPair<llvm::StringRef, unsigned int> >, std::vector<std::pair<llvm::StringRef, llvm::BitcodeModule>, std::allocator<std::pair<llvm::StringRef, llvm::BitcodeModule> > > >*, std::vector<unsigned char, std::allocator<unsigned char> > const&) (/usr/lib/llvm-14/bin/ld.lld+0x1e758bf)
#21 0x0000000001e6ed58 (anonymous namespace)::InProcessThinBackend::runThinLTOBackendThread(std::function<llvm::Expected<std::unique_ptr<llvm::CachedFileStream, std::default_delete<llvm::CachedFileStream> > > (unsigned int)>, std::function<llvm::Expected<std::function<llvm::Expected<std::unique_ptr<llvm::CachedFileStream, std::default_delete<llvm::CachedFileStream> > > (unsigned int)> > (unsigned int, llvm::StringRef)>, unsigned int, llvm::BitcodeModule, llvm::ModuleSummaryIndex&, llvm::StringMap<std::unordered_set<unsigned long, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<unsigned long> >, llvm::MallocAllocator> const&, llvm::DenseSet<llvm::ValueInfo, llvm::DenseMapInfo<llvm::ValueInfo, void> > const&, std::map<unsigned long, llvm::GlobalValue::LinkageTypes, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, llvm::GlobalValue::LinkageTypes> > > const&, llvm::DenseMap<unsigned long, llvm::GlobalValueSummary*, llvm::DenseMapInfo<unsigned long, void>, llvm::detail::DenseMapPair<unsigned long, llvm::GlobalValueSummary*> > const&, llvm::MapVector<llvm::StringRef, llvm::BitcodeModule, llvm::DenseMap<llvm::StringRef, unsigned int, llvm::DenseMapInfo<llvm::StringRef, void>, llvm::detail::DenseMapPair<llvm::StringRef, unsigned int> >, std::vector<std::pair<llvm::StringRef, llvm::BitcodeModule>, std::allocator<std::pair<llvm::StringRef, llvm::BitcodeModule> > > >&)::'lambda'(std::function<llvm::Expected<std::unique_ptr<llvm::CachedFileStream, std::default_delete<llvm::CachedFileStream> > > (unsigned int)>)::operator()(std::function<llvm::Expected<std::unique_ptr<llvm::CachedFileStream, std::default_delete<llvm::CachedFileStream> > > (unsigned int)>) const LTO.cpp:0:0
#22 0x0000000001e6e857 std::_Function_handler<void (), std::_Bind<(anonymous namespace)::InProcessThinBackend::start(unsigned int, llvm::BitcodeModule, llvm::StringMap<std::unordered_set<unsigned long, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<unsigned long> >, llvm::MallocAllocator> const&, llvm::DenseSet<llvm::ValueInfo, llvm::DenseMapInfo<llvm::ValueInfo, void> > const&, std::map<unsigned long, llvm::GlobalValue::LinkageTypes, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, llvm::GlobalValue::LinkageTypes> > > const&, llvm::MapVector<llvm::StringRef, llvm::BitcodeModule, llvm::DenseMap<llvm::StringRef, unsigned int, llvm::DenseMapInfo<llvm::StringRef, void>, llvm::detail::DenseMapPair<llvm::StringRef, unsigned int> >, std::vector<std::pair<llvm::StringRef, llvm::BitcodeModule>, std::allocator<std::pair<llvm::StringRef, llvm::BitcodeModule> > > >&)::'lambda'(llvm::BitcodeModule, llvm::ModuleSummaryIndex&, llvm::StringMap<std::unordered_set<unsigned long, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<unsigned long> >, llvm::MallocAllocator> const&, llvm::DenseSet<llvm::ValueInfo, llvm::DenseMapInfo<llvm::ValueInfo, void> > const&, std::map<unsigned long, llvm::GlobalValue::LinkageTypes, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, llvm::GlobalValue::LinkageTypes> > > const&, llvm::DenseMap<unsigned long, llvm::GlobalValueSummary*, llvm::DenseMapInfo<unsigned long, void>, llvm::detail::DenseMapPair<unsigned long, llvm::GlobalValueSummary*> > const&, llvm::MapVector<llvm::StringRef, llvm::BitcodeModule, llvm::DenseMap<llvm::StringRef, unsigned int, llvm::DenseMapInfo<llvm::StringRef, void>, llvm::detail::DenseMapPair<llvm::StringRef, unsigned int> >, std::vector<std::pair<llvm::StringRef, llvm::BitcodeModule>, std::allocator<std::pair<llvm::StringRef, llvm::BitcodeModule> > > >&) (llvm::BitcodeModule, std::reference_wrapper<llvm::ModuleSummaryIndex>, std::reference_wrapper<llvm::StringMap<std::unordered_set<unsigned long, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<unsigned long> >, llvm::MallocAllocator> const>, std::reference_wrapper<llvm::DenseSet<llvm::ValueInfo, llvm::DenseMapInfo<llvm::ValueInfo, void> > const>, std::reference_wrapper<std::map<unsigned long, llvm::GlobalValue::LinkageTypes, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, llvm::GlobalValue::LinkageTypes> > > const>, std::reference_wrapper<llvm::DenseMap<unsigned long, llvm::GlobalValueSummary*, llvm::DenseMapInfo<unsigned long, void>, llvm::detail::DenseMapPair<unsigned long, llvm::GlobalValueSummary*> > const>, std::reference_wrapper<llvm::MapVector<llvm::StringRef, llvm::BitcodeModule, llvm::DenseMap<llvm::StringRef, unsigned int, llvm::DenseMapInfo<llvm::StringRef, void>, llvm::detail::DenseMapPair<llvm::StringRef, unsigned int> >, std::vector<std::pair<llvm::StringRef, llvm::BitcodeModule>, std::allocator<std::pair<llvm::StringRef, llvm::BitcodeModule> > > > >)> >::_M_invoke(std::_Any_data const&) LTO.cpp:0:0
#23 0x0000000000b35f0c std::_Function_handler<void (), llvm::ThreadPool::createTaskAndFuture(std::function<void ()>)::'lambda'()>::_M_invoke(std::_Any_data const&) (/usr/lib/llvm-14/bin/ld.lld+0xb35f0c)
#24 0x0000000003253d11 void* llvm::thread::ThreadProxy<std::tuple<llvm::ThreadPool::grow(int)::$_0> >(void*) ThreadPool.cpp:0:0
#25 0x00007f3012f5a609 start_thread /build/glibc-eX1tMB/glibc-2.31/nptl/pthread_create.c:478:7
#26 0x00007f301291c293 __clone /build/glibc-eX1tMB/glibc-2.31/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:97:0
fish: “/usr/lib/llvm-14/bin/ld.lld --h…” terminated by signal SIGSEGV (Address boundary error)
mkazantsev resigned from this revision.Mar 4 2022, 12:14 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 4 2022, 12:14 AM
fhahn updated this revision to Diff 424873.Apr 25 2022, 5:00 AM

Another rebase and attempt to reduce the diff by keeping getSCEV instead of getExistingSCEV. I am planning to further strip down the changes related to phi handling and have the initial version only construct SCEVs iteratively for IR without PHIs/cycles.

fhahn edited the summary of this revision. (Show Details)Apr 28 2022, 5:55 AM
fhahn edited the summary of this revision. (Show Details)May 10 2022, 3:10 AM
chapuni removed a subscriber: chapuni.
chapuni added a subscriber: chapuni.
fhahn updated this revision to Diff 439764.Jun 24 2022, 8:08 AM

Rebased and updated to keep recursive handling of phi nodes for now. I think this is as strippped down as possible for an initial version. Compile-time impact is more managable than for the initial version:

NewPM-O3: +0.11%
NewPM-ReleaseThinLTO: +0.12%
NewPM-ReleaseLTO-g: +0.08%

https://llvm-compile-time-tracker.com/compare.php?from=9d2349c78f93cba78c1f3b770355f2ac0cb98163&to=ff418d805ec44e288f4c8eb3150e1e874b26040c&stat=instructions

The patch can lead to slightly different (mostly better) results, because in some cases we do not run into various existing recursion limits.

I'd be curious to hear what people think about the latest version and the compile-time trade-off. It should fix a set of related crashes, linked to https://github.com/llvm/llvm-project/issues/44546. It fixes all crashes I was able to reproduce from the issue list on my local system.

fhahn retitled this revision from [SCEV] Construct SCEV iteratively (WIP). to [SCEV] Construct SCEV iteratively..Jun 24 2022, 8:20 AM
fhahn edited the summary of this revision. (Show Details)
nlopes added inline comments.Jun 24 2022, 10:05 AM
llvm/lib/Analysis/ScalarEvolution.cpp
7141

Poison, poison! :)

fhahn marked an inline comment as done.Jun 25 2022, 8:44 AM
fhahn added inline comments.
llvm/lib/Analysis/ScalarEvolution.cpp
7141

This is unrelated to this patch, I created D128586. I'll update things here once/if D128586 goes in

fhahn updated this revision to Diff 440131.Jun 27 2022, 1:54 AM
fhahn marked an inline comment as done.
fhahn edited the summary of this revision. (Show Details)

Updated to use PoisonValue after e4e22b6d8038.

nikic added inline comments.Jun 27 2022, 7:01 AM
llvm/lib/Analysis/ScalarEvolution.cpp
7173

Add and mul currently have special code that tries to combine multiple sequential adds/mul into one getAddExpr/getMulExpr call. This is effectively lost here (or maybe worse, we end up doing both -- each add individually here, and then a multiple-add variant in getSCEV).

I think if we're going to change this (which might well make sense), we should probably change that separately, and also in the createSCEV() code as well. I suspect that this might account for both some of the codegen changes and some of the compile-time impact.

7199
fhahn updated this revision to Diff 440582.Jun 28 2022, 5:54 AM

Replace isSized check with assertion, traverse add/mul chains when collecting operands.

This slightly increases compile-time, and the geoman impact is now

NewPM-O3: +0.08%
NewPM-ReleaseThinLTO: +0.08%
NewPM-ReleaseLTO-g: +0.09%

https://llvm-compile-time-tracker.com/compare.php?from=403466860b628d6ada20a4b85b5b2ecdfe97b389&to=d662c2da9dc338dd260dacab19c3fd3f5b252fc0&stat=instructions

fhahn marked 2 inline comments as done.Jun 28 2022, 5:57 AM
fhahn added inline comments.
llvm/lib/Analysis/ScalarEvolution.cpp
7173

Thanks, I updated getOperandsToCreate to also traverse mul/add chains. This indeed removed the IndVarSimplify changes and improved compile-time a bit.

We could go even further and enqueue (Opcode, Operands) tuples to create directly, instead of going through createSCEV, but this would require a way to encode at least negations for the Add case.

7199

Replace with an assert

fhahn updated this revision to Diff 440584.Jun 28 2022, 5:59 AM
fhahn marked 2 inline comments as done.

adjust comment about traversing add/mul chains in getOperandsToCreate.

nikic accepted this revision.Jun 28 2022, 6:28 AM

LG. Seems to be a recurring problem, and the solution looks sufficiently non-intrusive. I like that there is no requirement to precisely synchronize the logic between createSCEV and createSCEVIter here, which would be a PITA. If we don't queue enough operands, those will just get computed recursively.

llvm/lib/Analysis/ScalarEvolution.cpp
7259

I don't think this is strictly true due to the createNodeForSelectViaUMinSeq() code. And either way, I don't think we need to handle it this precisely here.

This revision is now accepted and ready to land.Jun 28 2022, 6:28 AM
This revision was landed with ongoing or failed builds.Jun 29 2022, 3:29 AM
This revision was automatically updated to reflect the committed changes.
fhahn marked an inline comment as done.
fhahn added inline comments.Jun 29 2022, 3:30 AM
llvm/lib/Analysis/ScalarEvolution.cpp
7259

Thanks, I removed the check for Instruction to simplify things here.

alexfh added a subscriber: alexfh.Jul 3 2022, 11:02 PM

Heads up: this commit seems to cause miscompiles on some of our code. We're still working on an isolated test case.

alexfh added a comment.Jul 5 2022, 7:39 AM

Heads up: this commit seems to cause miscompiles on some of our code. We're still working on an isolated test case.

False alarm. The issue turned out to be caused by a UB in the code.

fhahn added a comment.Jul 5 2022, 7:09 PM

Heads up: this commit seems to cause miscompiles on some of our code. We're still working on an isolated test case.

False alarm. The issue turned out to be caused by a UB in the code.

Great, thanks for checking!

foad added a subscriber: foad.Jul 8 2022, 8:51 AM

Hi! We are seeing some large compile time increases with this patch. For example on F23720290, the time for opt -passes=loop-unroll goes from about 2s to 14s while producing identical output.

Hi! We are seeing some large compile time increases with this patch. For example on F23720290, the time for opt -passes=loop-unroll goes from about 2s to 14s while producing identical output.

Interesting, I'll take a look

fhahn added a comment.Jul 13 2022, 9:32 PM

Hi! We are seeing some large compile time increases with this patch. For example on F23720290, the time for opt -passes=loop-unroll goes from about 2s to 14s while producing identical output.

Interesting, I'll take a look

I put up a fix: D129731