Some nested loops may share the same ExitingBB, so after we finishing FoldExit,
we need to notify OuterLoop and SCEV to drop any stored trip count.
Details
Diff Detail
Event Timeline
The strategy we use across the optimizer is to drop cached values in the transform that invalidated them, not before their users (as the users are multiple). Do we know who broke it in this case?
llvm/lib/Transforms/Scalar/IndVarSimplify.cpp | ||
---|---|---|
1447 | This whole logic is completely expensive as it visits the exit blocks multiple times. Why not just call forgetTopmostLoop(L)? |
I'm not really getting how wrong cached exit count may possibly lead to hang. Could you please explain this problem more?
From the cfg we can see that there exist a nested loop, and basic block if.then is a shared exitingBB between inner and outer loop.
When we start running IndVarSimplify pass on the inner loop, we can get block if.then's ExitCount by function ScalarEvolution::getExitCount(), also we can get inner loop's MaxExitCount by function getMaxBackedgeTakenCount.
If a exitingBB's exit count less than loop's max backedge taken count, we can say that this loop never exit from this exitingBB. So we can FoldExit, that is we replace the condition of exitingBB's BranchInst with a ConstantInt. So after we done IndVarSimplify pass on the inner loop, the if.the will be simplified into a uncondtional block.
However, when we begin running IndVarSimplify on outer loop, if.then also is outer loop's exitingBB, and we don't know it has been simplified. So we will get the cached exit count (which is totally wrong), this count is used for compute MaxBackedgeTakenCount of outer loop.
Unfortunately this MaxBackedgeTakenCount is less than block for.inc15's exit count. So IndVarSimplify think outer loop never exit in for.inc15, we can then fold this exitingBB, finally make outer loop endless.
Before IndVarSimplify
After IndVarSimplify
Thanks for explaining, I didn't realize we end up having wrong exact exit count.
I agree with the motivation, but I think that the fix should be unconditional drop of caches in the end of optimizeLoopExits if Changed == true.
Yse, So I use forgetTopmostLoop() after optimizeLoopExits directly, I think it's more concise.
llvm/lib/Transforms/Scalar/IndVarSimplify.cpp | ||
---|---|---|
1447 | The reason why I chose to visit exit blocks is that I think it would be expensive if we drop out all cached value for those nested loops that do not share exiting BB. |
llvm/lib/Transforms/Scalar/IndVarSimplify.cpp | ||
---|---|---|
1781–1783 | As a follow-up, could you please also replace this to forgetTopmostLoop? Might be causing same type of problem I guess. |
This whole logic is completely expensive as it visits the exit blocks multiple times. Why not just call forgetTopmostLoop(L)?