This is an archive of the discontinued LLVM Phabricator instance.

DSE miscompile when store is clobbered across loop iterations
ClosedPublic

Authored by apilipenko on Sep 24 2019, 7:34 PM.

Details

Summary

(This is a resubmission of D67870 with llvm-commits is subscribers)

Currently DSE miscompiles the following example:

a = calloc(n+1)
for (int i = 0; i < n; i++) {

store 1, a[i+1] // (1)
store 0, a[i]   // (2)

}
It eliminates the second store thinking that it's redundant. This happens because memoryIsNotModifiedBetween doesn't see a[i] being clobbered in between the allocation and the store.

memoryIsNotModifiedBetween does a backwards scan through the CFG from SecondI to FirstI. It looks for instructions which can modify the memory location accessed by SecondI. For every may-write-to-memory instruction is asks AA whether this instruction modifies memory location accessed by SecondI.

The problem occurs when it visits the loop block through a backedge. It asks the AA about aliasing between stores (1) and (2). BasicAA sees that two stores access addresses which are distincs offsets from the same base and concludes that they don't alias. This is true for accesses within one loop iteration, but this is not true across iterations.

The change is to keep track whether we visit a block through a backedge. If we visit the block through a backedge, be conservative and treat any may-write-to-memory instruction as a clobber.

Note that there is a somewhat similar problem in DA:
https://bugs.llvm.org/show_bug.cgi?id=42143

Diff Detail

Event Timeline

bjope added inline comments.Oct 8 2019, 3:53 AM
lib/Transforms/Scalar/DeadStoreElimination.cpp
625 ↗(On Diff #221656)

Isn't store-block confusing?

630 ↗(On Diff #221656)

Load-block?

636 ↗(On Diff #221656)

Is LI an old name for FirstI or FirstBBI? There is several more refs to LI and SI in the code below. Makes it a little bit hard to follow what is going on here, since the code/comments are rotten already in the baseline. Would not mind if that was cleaned up in a separate commit.

641 ↗(On Diff #221656)

Is SI an old name for SecondI? Or SecondBBI?

666 ↗(On Diff #221656)

nit: I don't think that we need this local variable. Just use *PredI instead. If you keep it, then be explicit about the type instead of using auto. (As currently written you have similarly named variables that more or less can denote the same thing, but without really explaining the difference by having any type information. Just confusing IMHO.)

670 ↗(On Diff #221656)

If we have a cfg such as

A: (firstBB)
B: preds A, D
C: preds B (secondBB)
D: preds C

then I think we only visit block B once with this solution. And with VisitedThroughBackedge being false. It is only the SecondBB that can be visited twice afaict. Isn't that a problem?

Herald added a project: Restricted Project. · View Herald TranscriptJan 31 2020, 4:29 PM
Herald added a subscriber: hiraditya. · View Herald Transcript

This is specifically a hole in the calloc() handling, right? This case can't come up in regular load->store forwarding: the address of the load dominates the load, so the address dominates all the operations between the load and the store, so alias() does the right thing.

We could try to avoid the backedge checking for load->store forwarding. And we could try to be more aggressive if the address is some invariant offset into the calloc. Maybe not that likely to matter? What sort of impact does this have on how frequently the transforms triggers?

apilipenko updated this revision to Diff 246040.EditedFeb 21 2020, 4:27 PM

I reworked the patch using PHITransAddr. Now instead of tracking the backedge flag I keep track of the address to check and translate the address when needed.

This change should resolve the concern about overly conservative handling of load-store case, when the address dominates all the operations between load and store. Using PHITransAddr we would only do the translation if it's actually needed.

This makes sense. Please add at least one testcase where PHI translation looks through a PHI node and makes the transform successful.

Demonstrating an improvement accuracy due to PHI translation turned out to be a bit tricky.

So, it doesn't apply to load-store case as in this case the accessed addresses are the same, so no translation is needed.

In calloc case there are two limitations:

  1. Currently we look for stores to pointers with GetUnderlyingObject == calloc. Because GetUnderlyingObject doesn't look through phis we can't handle this case:
  %c = calloc
  %p1 = gep %c, X
  %p2 = gep %c, Y
  ...

label:  
  %p = phi %p1, %p2
  store 0, %p

So we are limited to cases like this:

  %c = calloc
  ...

label:
  %off = phi ...
  %p = gep %c, %off
  store 0, %p
  1. Currently I don't analyze the same block with different addresses. If such situation occurs I bail out assuming clobber (the same logic is used in MDA). But because the calloc dominates the store, eventually we'll visit the calloc block. If there are different paths reaching this block with different pointer values we would bail out. So, the example I came up with uses the same pointer on different paths. See test45 and test46 cases.

Essentially I'm using PHITransAddr as a marker that the address needs translation walking up to the predecessor. I don't make real use of the translated address due to lack of path sensitivity.

efriedma accepted this revision.Feb 26 2020, 5:00 PM

Okay, that makes sense. There isn't any particular reason we can't visit a block multiple times, but probably not worth doing unless we find some case where it matters.

LGTM

This revision is now accepted and ready to land.Feb 26 2020, 5:00 PM
This revision was automatically updated to reflect the committed changes.