This is an archive of the discontinued LLVM Phabricator instance.

[SCEV] Cache ZExt SCEV expressions.
ClosedPublic

Authored by fhahn on Nov 5 2022, 4:20 PM.

Details

Summary

When creating SCEV expressions for ZExt, there's quite a bit of
reasoning done and in many places the reasoning in turn will try to
create new SCEVs for other ZExts.

This can have a huge compile-time impact. The attached test from #58402
takes an excessive amount of compile time; without the patch, the test
doesn't complete in 1500+ seconds, but with the patch it completes in 1
second.

To speed up this case, cache created ZExt expressions for given (SCEV, Ty) pairs.
Caching just ZExts is relatively straight-forward, but it might make
sense to extend it to other expressions in the future.

This has a slight positive impact on CTMark:

  • O3: -0.07%
  • ReleaseThinLTO: -0.06%
  • ReleaseLTO-g: -0.04%

https://llvm-compile-time-tracker.com/compare.php?from=1f034e207885d7edfa34b5408c7c872a966febcd&to=08d9e0f7966745fa300ac4a6c55692666daab68c&stat=instructions:u

The patch also improves compile-time for some internal real-world workloads
where time spent in SCEV goes from ~300 seconds to ~3 seconds.

There are a few cases where computing & caching the result earlier may
return more pessimistic results, but the compile-time savings seem to
outweigh that.

Fixes #58402.

Diff Detail

Event Timeline

fhahn created this revision.Nov 5 2022, 4:20 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 5 2022, 4:20 PM
fhahn requested review of this revision.Nov 5 2022, 4:20 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 5 2022, 4:20 PM
nikic added a comment.Nov 6 2022, 1:48 AM

To make sure I understand the idea here correctly: We already cache the case where the zext does not fold through the early folding set lookup. This additional cache only helps with the case where the zext does fold, and the early lookup thus fails, is that correct?

If so, I feel like this needs some more general solution. We're using the same pattern (early folding set lookup to bypass most of the folding logic) in most (all?) other SCEV creation methods as well, and introducing an independent cache for each of these would not be great. I wonder if we could insert a "forwarding" node into the folding set for all folded SCEV expressions, which points to the folding result, or something along those lines.

mkazantsev added a comment.EditedNov 11 2022, 12:48 AM

Agreed with Nikita that generalizing it to something wider than zext could be a good idea. How hard is that?

One more cache always scares me. Any ideas how we can strengthen verifier to find problems with it? I am specifically interested in dangling pointers.

llvm/lib/Analysis/ScalarEvolution.cpp
13905

Why not ZExtCacheUser.erase(S); after this loop?

fhahn updated this revision to Diff 474777.Nov 11 2022, 8:36 AM

To make sure I understand the idea here correctly: We already cache the case where the zext does not fold through the early folding set lookup. This additional cache only helps with the case where the zext does fold, and the early lookup thus fails, is that correct?

If so, I feel like this needs some more general solution. We're using the same pattern (early folding set lookup to bypass most of the folding logic) in most (all?) other SCEV creation methods as well, and introducing an independent cache for each of these would not be great. I wonder if we could insert a "forwarding" node into the folding set for all folded SCEV expressions, which points to the folding result, or something along those lines.

Yes, that's exactly the issue. Currently the folding set (UniqueSCEV) only contains ZExt entries if we actually created a ZExt expression, not if it got folded to a different expression.

Adding forwarding nodes so we can add references to folded expressions would work I think, but it might be easier to have maintain a separate map for folds with a more general key. I updated the patch to use a more flexible key (similar to FoldingSetNodeID, but with a smaller bits vector) to sketch this out. The new version is mostly neutral in compile-time on CTMark, but still has the same huge improvements on the cases mentioned in the description. (https://llvm-compile-time-tracker.com/compare.php?from=2116d69f100c243069be1e76ac7fdac65ea5328a&to=05897510fbf52a59e0ce28f3f4ea4d06d59e380e&stat=instructions:u)

If we go down the forwarding route, IIUC we would have the add a new SCEV type for that. Happy to play around with that option as well, if people feel this would be more desirable.

One more cache always scares me. Any ideas how we can strengthen verifier to find problems with it? I am specifically interested in dangling pointers.

That's a good point! I am not sure, but it looks like we never actually deallocate SCEV objects, so not sure what to best verify. I updated to code to with the extra invalidation and also added it to forgetAllLoops.

One thing we can do in verifier is to check that FoldCache and FoldCacheUser are in sync, meaning that for each {SCEV, FoldID} in FoldCacheUser there is {FoldId, SCEV} in FoldCache and vice versa. Just in case if one of them will be erased without doing the 2nd part.

llvm/include/llvm/Analysis/ScalarEvolution.h
73

It looks general enough to be moved to some utils, no? Just to think in the future.

79

The fact that we have different API for signed and unsigned leads me to think that we might be able to distinguish between them, e.g. distinguish set with single (int)1 from set with (unsigned) 1, but in fact we can't. What was the intention?

Maybe better remove signed version.

84

What worries me is that how easy to get a collision here. Basically, {0, 0, ~0, ~0, 0, 0} will collide with {0ULL, ~0ULL, 0ULL}. In your use case it should really be fine, because we add a lot of pointers, but if it is to be done more universal, need to think on making it harder to cheat (mix in types into hash?).

97

assert(!Bits.empty())?
Otherwise, if we want to allow empty sets (dunno why, just for generality), maybe start with Hash =Bits.size() and then mix in all elements?

125

is it just & std::max<int>()? Not asking to change, just trying to understand what was the intention.

fhahn updated this revision to Diff 478122.Nov 27 2022, 4:08 PM
fhahn marked an inline comment as done.

One thing we can do in verifier is to check that FoldCache and FoldCacheUser are in sync, meaning that for each {SCEV, FoldID} in FoldCacheUser there is {FoldId, SCEV} in FoldCache and vice versa. Just in case if one of them will be erased without doing the 2nd part.

Thanks for elaborating! I added such verificatio and addressed the other comments.

fhahn marked 3 inline comments as done.Nov 27 2022, 4:13 PM
fhahn added inline comments.
llvm/include/llvm/Analysis/ScalarEvolution.h
73

It is similar to FoldingSetNodeID which is used in multiple places. I played around with that quite a bit, but didn't manage to reach the same performance as with the version inline here.

Having this separate here is not ideal, but at the moment I don't see an easy way to unify/share the code.

79

The main reason for the integer overload is to avoid ambiguous function calls when addInteger is called with signed ints. In the end the signdness doesn't really matter, as we only look at the bits.

84

Yeah, unfortunately this is a potential issue. I went with your suggestion to include the number of entries in the vector in the hash, which should hopefully help a bit.

97

Otherwise, if we want to allow empty sets (dunno why, just for generality), maybe start with Hash =Bits.size() and then mix in all elements?

I went with adding Hash = Bits.size().

125

The main reason was to avoid potential conflicts with the empty/tombstone values, but that shouldn't really help much. Also, now that the hash uses Hash = Bits.size() it should be even less of an issue.

fhahn marked 3 inline comments as done.Dec 5 2022, 2:48 AM

ping :)

mkazantsev accepted this revision.Dec 7 2022, 12:00 AM

LG, thanks for doing this!

This revision is now accepted and ready to land.Dec 7 2022, 12:00 AM
nikic added a comment.Dec 7 2022, 2:20 AM

To make sure I understand the idea here correctly: We already cache the case where the zext does not fold through the early folding set lookup. This additional cache only helps with the case where the zext does fold, and the early lookup thus fails, is that correct?

If so, I feel like this needs some more general solution. We're using the same pattern (early folding set lookup to bypass most of the folding logic) in most (all?) other SCEV creation methods as well, and introducing an independent cache for each of these would not be great. I wonder if we could insert a "forwarding" node into the folding set for all folded SCEV expressions, which points to the folding result, or something along those lines.

Yes, that's exactly the issue. Currently the folding set (UniqueSCEV) only contains ZExt entries if we actually created a ZExt expression, not if it got folded to a different expression.

Adding forwarding nodes so we can add references to folded expressions would work I think, but it might be easier to have maintain a separate map for folds with a more general key. I updated the patch to use a more flexible key (similar to FoldingSetNodeID, but with a smaller bits vector) to sketch this out. The new version is mostly neutral in compile-time on CTMark, but still has the same huge improvements on the cases mentioned in the description. (https://llvm-compile-time-tracker.com/compare.php?from=2116d69f100c243069be1e76ac7fdac65ea5328a&to=05897510fbf52a59e0ce28f3f4ea4d06d59e380e&stat=instructions:u)

If we go down the forwarding route, IIUC we would have the add a new SCEV type for that. Happy to play around with that option as well, if people feel this would be more desirable.

My main thought here was that a forwarding node would avoid the need to do two folding set lookups, and should also avoid the need for the separate FoldID.

nikic added a comment.Dec 7 2022, 2:24 AM

Looks like the new test fails on pre-commit CI.

nikic added inline comments.Dec 7 2022, 4:10 AM
llvm/lib/Analysis/ScalarEvolution.cpp
1624

I'm a bit uncertain about the invalidation story here. We will invalidate the cache if the folding result is invalidated. We will not invalidate if the original zext operand is invalidated. Could this result in problems?

nikic added a comment.Dec 7 2022, 6:46 AM

My main thought here was that a forwarding node would avoid the need to do two folding set lookups, and should also avoid the need for the separate FoldID.

For what it's worth, I roughly sketched out what I had in mind here: https://github.com/llvm/llvm-project/commit/b158b70df1bb49036da8312a76f0b54de6466f7f This causes a change in incorrect-
exit-count.ll though, so probably it's not right.

fhahn updated this revision to Diff 481283.Dec 8 2022, 8:07 AM
fhahn marked an inline comment as done.

My main thought here was that a forwarding node would avoid the need to do two folding set lookups, and should also avoid the need for the separate FoldID.

For what it's worth, I roughly sketched out what I had in mind here: https://github.com/llvm/llvm-project/commit/b158b70df1bb49036da8312a76f0b54de6466f7f This causes a change in incorrect-
exit-count.ll though, so probably it's not right.

Thanks for sharing! I went ahead and cleaned up the code in my patch a bit, with an early exit if the entry is cached and only add the resulting SCEV to the cache if it is not a SCEVZeroExtendExpr.

This improves the compile-time impact a bit further:
https://llvm-compile-time-tracker.com/compare.php?from=bf9de7464946c65f488fe86ea61bfdecb8c654c1&to=5ac0108553992fb3d58bc27b1518e8cf06658a32&stat=instructions:u

fhahn added a comment.Dec 8 2022, 8:08 AM

The latest update also moves the FoldID definition inside ScalarEvolution and the template specialization to the bottom of the file.

llvm/lib/Analysis/ScalarEvolution.cpp
1624

I think in that case, invalidation of the ZExt operand will trigger invalidation of the SCEV for the ZExt via the IR use-list based invalidation.

This revision was automatically updated to reflect the committed changes.

Hi,

We see a verifier error with this patch:

opt -verify-scev -passes="require<iv-users>" bbi-77099.ll -o /dev/null

results in

Entry in FoldCache doesn't match FoldCacheUser: (1 + (zext i16 {-4,+,4}<nsw><%for.body919.i> to i32))<nuw><nsw> != {65533,+,4}<nuw><nsw><%for.body919.i>!
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: ../../main-github/llvm/build-all/bin/opt -verify-scev -passes=require<iv-users> bbi-77099.ll -o /dev/null
 #0 0x0000000002ef65a3 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (../../main-github/llvm/build-all/bin/opt+0x2ef65a3)
 #1 0x0000000002ef42ce llvm::sys::RunSignalHandlers() (../../main-github/llvm/build-all/bin/opt+0x2ef42ce)
 #2 0x0000000002ef6926 SignalHandler(int) Signals.cpp:0:0
 #3 0x00007f6e840f4630 __restore_rt sigaction.c:0:0
 #4 0x00007f6e8183b387 raise (/lib64/libc.so.6+0x36387)
 #5 0x00007f6e8183ca78 abort (/lib64/libc.so.6+0x37a78)
 #6 0x0000000001ef311d llvm::ScalarEvolution::verify() const (../../main-github/llvm/build-all/bin/opt+0x1ef311d)
 #7 0x00000000038d666b llvm::FunctionToLoopPassAdaptor::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (../../main-github/llvm/build-all/bin/opt+0x38d666b)
 #8 0x000000000326cc1d llvm::detail::PassModel<llvm::Function, llvm::FunctionToLoopPassAdaptor, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) crtstuff.c:0:0
 #9 0x0000000002700edc llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (../../main-github/llvm/build-all/bin/opt+0x2700edc)
#10 0x0000000000b0ff1d llvm::detail::PassModel<llvm::Function, llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) crtstuff.c:0:0
#11 0x000000000270516e llvm::ModuleToFunctionPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (../../main-github/llvm/build-all/bin/opt+0x270516e)
#12 0x0000000000b0fcfd llvm::detail::PassModel<llvm::Module, llvm::ModuleToFunctionPassAdaptor, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) crtstuff.c:0:0
#13 0x000000000270018c llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (../../main-github/llvm/build-all/bin/opt+0x270018c)
#14 0x0000000000729aa3 llvm::runPassPipeline(llvm::StringRef, llvm::Module&, llvm::TargetMachine*, llvm::TargetLibraryInfoImpl*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::StringRef, llvm::ArrayRef<llvm::PassPlugin>, llvm::opt_tool::OutputKind, llvm::opt_tool::VerifierKind, bool, bool, bool, bool, bool, bool) (../../main-github/llvm/build-all/bin/opt+0x729aa3)
#15 0x0000000000738cc7 main (../../main-github/llvm/build-all/bin/opt+0x738cc7)
#16 0x00007f6e81827555 __libc_start_main (/lib64/libc.so.6+0x22555)
#17 0x0000000000722b10 _start (../../main-github/llvm/build-all/bin/opt+0x722b10)
Abort (core dumped)

fhahn added a comment.Dec 21 2022, 3:31 PM

Thanks, I am looking into it.

fhahn added a comment.Dec 27 2022, 4:19 PM

Should be fixed by a564048899a1

Should be fixed by a564048899a1

Yep, thanks!