This is an archive of the discontinued LLVM Phabricator instance.

Fix bugs in the MemorySSA walker
ClosedPublic

Authored by george.burgess.iv on Mar 10 2016, 2:39 PM.

Details

Summary

Test case:

define void @foo(i1 %b, i8* %ext) {
  %a = alloca i8, align 1
  %sink = alloca i8, align 1
  ; 1 = MemoryDef(liveOnEntry) <<<<<<<<
  store i8 1, i8* %a, align 1
  br i1 %b, label %if.b, label %end

if.b:
  ; 2 = MemoryDef(1)
  store i8 1, i8* %sink
  br label %end

end:
  ; 3 = MemoryPhi({if.b,2},{%0,1})
  ; MemoryUse(liveOnEntry)  <<<<<<<<<<<<<
  load i8, i8* %a, align 1
  ret void
}

...This happens because because of how we recursively walk phis, and how we cache the results afterward. Both of these problems are (hopefully) fixed by this patch.

I plan to move the code around in the loop a bit (because break; as the last statement in a for is kind of dumb), but I’ve separated that from this patch because the refactor adds a fair amount of noise, and should be NFC.

Diff Detail

Repository
rL LLVM

Event Timeline

george.burgess.iv retitled this revision from to Fix bugs in the MemorySSA walker.
george.burgess.iv updated this object.
george.burgess.iv added a reviewer: dberlin.
george.burgess.iv added a subscriber: llvm-commits.
dberlin added inline comments.Mar 10 2016, 4:50 PM
lib/Transforms/Utils/MemorySSA.cpp
970 ↗(On Diff #50359)

Errr.
What?

A. In buildMemorySSA, we already detect forward-unreachable-from-entry blocks and mark all definitions as live on entry. So they should never loop anywhere in the case you are listing, because if the loop has no entry, all the defs/uses in it should be to live on entry.

So i don't see how this can happen.

However
B.
Because we already handle forward unreachable-from-entry blocks and mark all defs in them as liveonentry in buildmemoryssa, is there a good reason we shouldn't just detect reverse unreachable-from-entry blocks and do the same?

We can say that MemorySSA has no forward or reverse unreachable blocks, and make the IR sane in them. This is because we have no real IR, and so, at worst, we can do what we do now:

Mark anything in unreachable as live on entry, and remove all phis in unreachable. We could go one more, and remove all the memory accesses entirely if we wanted to.

george.burgess.iv marked an inline comment as done.

Addressed feedback

lib/Transforms/Utils/MemorySSA.cpp
970 ↗(On Diff #50359)

A. In buildMemorySSA, we already detect forward-unreachable-from-entry blocks and mark all definitions as live on entry. So they should never loop anywhere in the case you are listing, because if the loop has no entry, all the defs/uses in it should be to live on entry.

Correct.

So i don't see how this can happen.

I was thinking that the update API might be able to get us in this situation, but after looking at it, I think I was wrong. :)

Ternary replaced with assert(FirstDef);

Because we already handle forward unreachable-from-entry blocks and mark all defs in them as liveonentry in buildmemoryssa, is there a good reason we shouldn't just detect reverse unreachable-from-entry blocks and do the same?

I can't think of one. Though, I'm also not sure I understand backwards reachability, because I can't think of a case where a reverse unreachable-from-entry block would be walked by a DFS of the domtree. Would a trivial algorithm for determining reverse reachability-from-entry for some node N not be "flip the edges of the CFG, and see if Entry is reachable from N"?

Mark anything in unreachable as live on entry, and remove all phis in unreachable. We could go one more, and remove all the memory accesses entirely if we wanted to.

Would it be fine to just not build them in the first place?

dberlin edited edge metadata.Mar 11 2016, 12:05 PM

Just because this logic is getting fairly complicated, can you explain to me (either here or on a whiteboard) the failure mode that is occurring that you are solving?

It's clear the testcase gives the wrong answer. Can you describe more of "why"?

lib/Transforms/Utils/MemorySSA.cpp
911 ↗(On Diff #50390)

why did you move this out of the for loop init?

dberlin added inline comments.Mar 11 2016, 5:20 PM
lib/Transforms/Utils/MemorySSA.cpp
930 ↗(On Diff #50390)

I'm unclear why this makes sense. It tracks whether you've ever seen a backedge before hitting this phi.

If you see a backedge and move past that cycle, it will still trigger on the next cycle.

Honestly, this is also moving a bit far away from normal usage of the DFI iterator.
I am kinda wondering if we should not just turn this into the traditional SCC finding algorithm (treating the tuple of <thing, memorylocation> as the node we are visiting), and then we will know exactly which stuff is a cycle and which is not.

george.burgess.iv edited edge metadata.

It's clear the testcase gives the wrong answer. Can you describe more of "why"?

In that particular case, because we don't do anything useful with FirstDef after the phi loop :)

I guess I tried to cram too many "fixes" into one patch, and was unclear about that -- sorry.

As it stood, the patch fixed the above code sample, and made us more accurate in cases like:

define void @f(i8* noalias %p1, i8* noalias %p2) {
  ; 1 = MemoryDef(liveOnEntry)
  store i8 0, i8* %p1
  ; MemoryUse(1)
  load i8, i8* %p1
  br i1 undef, label %a, label %b

a:
  ; 2 = MemoryDef(1)
  store i8 0, i8* %p2
  br i1 undef, label %c, label %d

b:
  ; 3 = MemoryDef(1)
  store i8 1, i8* %p2
  br i1 undef, label %c, label %d

c:
  ; 6 = MemoryPhi({a,2},{b,3})
  ; 4 = MemoryDef(6)
  store i8 2, i8* %p2
  br label %e

d:
  ; 7 = MemoryPhi({a,2},{b,3})
  ; 5 = MemoryDef(7)
  store i8 3, i8* %p2
  br label %e

e:
  ; 8 = MemoryPhi({c,4},{d,5})
  ; MemoryUse(8) << Should be MemoryUse(1)
  load i8, i8* %p1
  ret void
}

...And any optimizable case where we have a non-cyclic phi structure that visits the same phi twice (the VisitedOnlyOne check gets triggered).

Though, I think there's a better way to handle that *and* cyclic phis that lets us leave the walking loop mostly untouched. So, I backed out the enhancements + make this review strictly for correctness fixes. I'll work on something to give us better accuracy and send it out separately.

If you'd like more examples, I left the tests I think we can do better on in the patch (with: ; FIXME:s in them). I'll pull them out prior to committing this, and move them to the enhancements patch.

At this point, LGT.

i think we discussed offline that as the caching gets more complex, we should either seriously rethink the caching scheme (since we know it sucks and in fact, is useless for memoryuses, relook at the BFS version to see if it makes it simpler, explore other walking algorithms (Collapse cycles into SCC's), etc.

i think we discussed offline that as the caching gets more complex, we should either seriously rethink the caching scheme (since we know it sucks and in fact, is useless for memoryuses, relook at the BFS version to see if it makes it simpler, explore other walking algorithms (Collapse cycles into SCC's), etc.

SG. I'll dig up + play around with the BFS code, and see if that makes our lives easier. If that fails, we'll figure out something from there. :)

Thanks for the review.

This revision was automatically updated to reflect the committed changes.