This is an archive of the discontinued LLVM Phabricator instance.

[IndVarSimplify] Notify top most loop to drop cached exit counts
ClosedPublic

Authored by guopeilin on Nov 12 2020, 12:49 AM.

Details

Summary

Some nested loops may share the same ExitingBB, so after we finishing FoldExit,
we need to notify OuterLoop and SCEV to drop any stored trip count.

Diff Detail

Event Timeline

guopeilin created this revision.Nov 12 2020, 12:49 AM
guopeilin requested review of this revision.Nov 12 2020, 12:49 AM

The strategy we use across the optimizer is to drop cached values in the transform that invalidated them, not before their users (as the users are multiple). Do we know who broke it in this case?

llvm/lib/Transforms/Scalar/IndVarSimplify.cpp
1459

This whole logic is completely expensive as it visits the exit blocks multiple times. Why not just call forgetTopmostLoop(L)?

I'm not really getting how wrong cached exit count may possibly lead to hang. Could you please explain this problem more?

guopeilin added a comment.EditedNov 13 2020, 12:49 AM

I'm not really getting how wrong cached exit count may possibly lead to hang. Could you please explain this problem more?

From the cfg we can see that there exist a nested loop, and basic block if.then is a shared exitingBB between inner and outer loop.
When we start running IndVarSimplify pass on the inner loop, we can get block if.then's ExitCount by function ScalarEvolution::getExitCount(), also we can get inner loop's MaxExitCount by function getMaxBackedgeTakenCount.
If a exitingBB's exit count less than loop's max backedge taken count, we can say that this loop never exit from this exitingBB. So we can FoldExit, that is we replace the condition of exitingBB's BranchInst with a ConstantInt. So after we done IndVarSimplify pass on the inner loop, the if.the will be simplified into a uncondtional block.
However, when we begin running IndVarSimplify on outer loop, if.then also is outer loop's exitingBB, and we don't know it has been simplified. So we will get the cached exit count (which is totally wrong), this count is used for compute MaxBackedgeTakenCount of outer loop.
Unfortunately this MaxBackedgeTakenCount is less than block for.inc15's exit count. So IndVarSimplify think outer loop never exit in for.inc15, we can then fold this exitingBB, finally make outer loop endless.

Before IndVarSimplify

After IndVarSimplify

Thanks for explaining, I didn't realize we end up having wrong exact exit count.

I agree with the motivation, but I think that the fix should be unconditional drop of caches in the end of optimizeLoopExits if Changed == true.

guopeilin updated this revision to Diff 305645.Nov 16 2020, 8:13 PM
guopeilin retitled this revision from [IndVarSimplify] Drop any stored trip count value before IndVarSimplify to [IndVarSimplify] Notify top most loop to drop cached exit counts.

use forgetTopmostLoop() directly instead of visiting exitingBB among nested loops

Yse, So I use forgetTopmostLoop() after optimizeLoopExits directly, I think it's more concise.

Thanks for explaining, I didn't realize we end up having wrong exact exit count.

I agree with the motivation, but I think that the fix should be unconditional drop of caches in the end of optimizeLoopExits if Changed == true.

llvm/lib/Transforms/Scalar/IndVarSimplify.cpp
1459

The reason why I chose to visit exit blocks is that I think it would be expensive if we drop out all cached value for those nested loops that do not share exiting BB.

mkazantsev accepted this revision.Nov 17 2020, 8:28 PM

LGTM, thanks.

This revision is now accepted and ready to land.Nov 17 2020, 8:28 PM
mkazantsev added inline comments.Nov 17 2020, 8:47 PM
llvm/lib/Transforms/Scalar/IndVarSimplify.cpp
1793–1795

As a follow-up, could you please also replace this to forgetTopmostLoop? Might be causing same type of problem I guess.