This is an archive of the discontinued LLVM Phabricator instance.

[MustExecute] Fix algorithmic bug in isGuaranteedToExecute. PR38514
ClosedPublic

Authored by mkazantsev on Aug 10 2018, 5:29 AM.

Details

Summary

The description of isGuaranteedToExecute does not correspond to its implementation.
According to description, it should return true if an instruction is executed under the
assumption that its loop is *entered*. However there is a sophisticated alrogithm inside
that tries to prove that the instruction is executed if the loop is *exited*, which is not the
same thing for infinite loops. There is an attempt to protect from dealing with infinite loops
by prohibiting loops without exit blocks, however an infinite loop can have exit blocks.

As result of that, MustExecute can falsely consider some blocks that are never entered as
mustexec, and LICM can hoist dangerous instructions out of them basing on this fact.
This may introduce UB to programs which did not contain it initially.

This patch removes the problematic algorithm and replaced it with a one which tries to
prove what is required in description.

Diff Detail

Event Timeline

mkazantsev edited the summary of this revision. (Show Details)Aug 10 2018, 5:31 AM

Max, let's discuss offline. The approach taken here would work, but I'm worried about the change in algorithmic complexity. I think we can probably find a narrower fix here.

Separately from the bug fix, using an approach like this might be interesting for better opt quality.

We can easily avoid complexity growth by caching transitive predecessors of each block. It takes O(N^2) memory, but makes collection of preds an amortized O(N) operation. We can as well cache the returned values, making the entire algorithm amortized O(N^2).

I just don't think it's a really good idea to consume that much memory for a task like that. But off course, let's discuss offline.

BTW, after giving it some more thought, the complexity of the old algorithm is O(Exiting blocks in loop), and the new one is O(Predecessors of a block). I'm not sure if it is a growth at all, not clear which number is bigger on average.

Actually I've checked getExitBlocks and it simply iterates through all loop blocks and checks its successors. So the complexity of the old algorithm is O(NumAllBlocks+ NumSuccessorsOfAllBlocks). The new algorithm collects transitive predecessors of a block and checks their successors, so its complexity is O(NumPredecessors+ NumSuccessorsOfPredecessors). Which is strictly not greater than the old algorithm. So this patch reduces complexity, not increases it. :)

skatkov added inline comments.Aug 12 2018, 8:27 PM
lib/Analysis/MustExecute.cpp
197

I would say this comment is a bit messy. Instead, I would suggest to update the comment before the first loop which is pretty clear to add an info that we could also handle not only exit blocks but also loop blocks but CanProveNotTakenFirstIteration is written to support only exit blocks. Another to do might be to support not only 1st iteration but others as well. But it will significantly increase the complexity of your algorithm.

mkazantsev planned changes to this revision.Aug 15 2018, 1:00 AM
mkazantsev marked an inline comment as done.Aug 15 2018, 4:30 AM

Max convinced me that there's no algorithmic complexity change implied here. Given the getLoopExits code walks all edges leaving blocks in the loop, both the old and new code is O(outbound edges of blocks in loop).

I've added a bunch of code comments, but I am anticipating approving this once those details are fixed.

lib/Analysis/MustExecute.cpp
118

Separate and land please.

167

Add to the end of your comment: This set will be a subset of the blocks within the loop.

170

What you have here is fine, but it might be cleaner to write this as:
Worklist.push_back(BB);
do {
} while (!Worklist.empty())

172

I think you meant push_back(Pred) here.

177

Add to comment: and don't want to leave the loop.

180

Minor: One slightly non-obvious piece is that there can be cycles *within* a loop. Either sub-loops or non-loop cycles. The code does appear to handle that case, but might be worth a comment/test.

201

There's a subtlety here that may justify a comment or code structure change. Specifically, we don't know that this particular path (out of the possible paths we identified) is taken on the first iteration. So why is discharging the condition on that particular path on the first iterations sufficient? Answer: because it implies that either a) this path is taken on the first iteration (and thus the exit isn't) or be another path is taken by the first iteration and by the time we execute this path, the instruction of interest has already executed.

reames requested changes to this revision.Aug 15 2018, 9:53 AM
This revision now requires changes to proceed.Aug 15 2018, 9:53 AM
mkazantsev added inline comments.Aug 15 2018, 11:17 PM
lib/Analysis/MustExecute.cpp
172

Errr, yes. Thanks for pointing out!

mkazantsev marked 5 inline comments as done.

Addressed comments, added one more test.

lib/Analysis/MustExecute.cpp
201

I tried to elaborate the comment, hope it expresses the same idea.

LGTM

Thinking about it, I think it may be worth introducing a separate isGuaranteedToExecuteBeforeLoopExit with the old logic. While LICM's hoisting logic needs the "must execute if loop entered" property, load store promotion actually only needs "store executes before any taken exits" property. (We do need one "must execute" dereferencing instruction, but it doesn't have to be the store.)

reames accepted this revision.Aug 16 2018, 10:55 AM
This revision is now accepted and ready to land.Aug 16 2018, 10:55 AM

JFYI this is somewhat of a hornet's nest -- I believe cases like test_impossible_exit_in_untaken_block are actually correct for C/C++ because infloops without side effects are UB.

JFYI this is somewhat of a hornet's nest -- I believe cases like test_impossible_exit_in_untaken_block are actually correct for C/C++ because infloops without side effects are UB.

I will add similar tests with volatile loads in header, so that it would be a clearly defined situation in C++ and the same problem showed up.

mkazantsev added a comment.EditedAug 16 2018, 10:22 PM

JFYI this is somewhat of a hornet's nest -- I believe cases like test_impossible_exit_in_untaken_block are actually correct for C/C++ because infloops without side effects are UB.

I will add similar tests with volatile loads in header, so that it would be a clearly defined situation in C++ and the same problem showed up.

Actually this logic is accidentally correct right now because isGuaranteedToTransferExecutionToSuccessor thinks that volatile loads don't transfer execution to successors (which was under discurrion and probably needs removed) and MustExecute is super-conservative about such instructions in non-header blocks (see https://reviews.llvm.org/D50377). When MustThrow analysis becomes less restrictive, this problem becomes a real one and we hoist sdiv from such examples despite the fact that they are no longer "finite" according to C++ specification.

So the current behavior of this analysis is wrong, but due to limitations of MustThrow analysis it doesn't affect any real example. As I improve the MustThrow analysis, a lot of things like this show up.

I've added a couple of such tests as https://reviews.llvm.org/rL339983 to make sure that they won't break even in spite of improved MustThrow analysis.

This revision was automatically updated to reflect the committed changes.

Actually this logic is accidentally correct right now because isGuaranteedToTransferExecutionToSuccessor thinks that volatile loads don't transfer execution to successors (which was under discurrion and probably needs removed) and MustExecute is super-conservative about such instructions in non-header blocks (see https://reviews.llvm.org/D50377). When MustThrow analysis becomes less restrictive, this problem becomes a real one and we hoist sdiv from such examples despite the fact that they are no longer "finite" according to C++ specification.

Sorry, I wasn't clear. I meant to say that transforming

while (true) {
  if (<unknown condition>) {
    int c = 1 / divisor;
    break;
  }
}

to

int c = 1 / divisor;
while (true) {
  if (<unknown condition>) {
    break;
  }
}

is correct for C++ (because non-side effecting infloops are UB) and that I believe your change will "regress" this optimization (since LLVM will not longer hoist the division). Though I guess if no-one complains about it in a couple of days then we're fine?? :)

What you're saying, that it would be a bug even for C++ if LLVM hoisted the divide from

while (true) {
  <volatile load>
  if (<unknown condition>) {
    int c = 1 / divisor;
    break;
  }
}

is correct, but not quite what I'm trying to say.

Also, given what you said about isGuaranteedToTransferExecutionToSuccessor, perhaps we can close llvm.org/PR24078?

You don't need a volatile operation; [intro.progress] also mentions atomic operations.

I guess we can close PR24078 now.