Page MenuHomePhabricator

API to update MemorySSA for cloned blocks.
ClosedPublic

Authored by asbirlea on Apr 4 2018, 4:49 PM.

Details

Summary

End goal is to update MemorySSA in all loop passes. LoopUnswitch clones all blocks in a loop. SimpleLoopUnswitch clones some blocks. LoopRotate clones some instructions.
Some loop passes also make CFG changes.
This is an API based on what I found needed in LoopUnswitch, SimpleLoopUnswitch, LoopRotate, LoopInstSimplify, LoopSimplifyCFG.
Adding dependent patches using this API for context.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
include/llvm/Analysis/MemorySSAUpdater.h
131

nit: why do we take a &DT in this one and a *DT in the other?

267

It's unclear to me what the relation between phis/definitions in "a map of MemoryPhi : Definition" is here. Please elaborate

lib/Analysis/MemorySSA.cpp
1548

Why || defined(LLVM_ENABLE_DUMP)?

1553

nit: s/== true//

lib/Analysis/MemorySSAUpdater.cpp
404

naming nit: removeAllDuplicatePhiEdgesBetween?

425

please use dyn_cast if MA shouldn't be null

426

nit: this should only be null if MA is liveOnEntry. Please check against that instead, for clarity

430

same "please don't use or_null if null isn't OK" nit. (though it looks like we could do else { cast<MemoryPhi> here, since we're checking for MUD above?)

441

s/or_null//

484

s/or_null//

485

Please check against liveOnEntry instead

494

s/or_null// (and we can probably change this into else { cast<MemoryPhi> anyway)

550

can this be a range-for across ExitBlocks?

590

nit: please either remove the braces after the if or add them after the else. I'm impartial

697

probably redundant with the line below (unless it's possible for us to see the same BB repeatedly?)

702

Looks like you can replace everything between { ... } with EdgeCountMap[{Pi, BB}]++;.

operator[] on map-like containers generally acts like "return a reference if it exists. otherwise, construct or zero-init a member in the map and return a ref to that"

708

same "looks like you can replace everything between ..."

720

nit: There should probably be spaces before the end of each of these quotes

866

nit: I think this is long enough that it's best to just have braces after the for and if on lines 837/838. :)

asbirlea updated this revision to Diff 160935.Aug 15 2018, 4:00 PM
asbirlea marked 21 inline comments as done.

Address comments.
Thank you for the detailed nits :).

lib/Analysis/MemorySSA.cpp
1548

copy-paste :)..removed.

lib/Analysis/MemorySSAUpdater.cpp
702

Ah, I wasn't sure it was zero-inited.

asbirlea updated this revision to Diff 160938.Aug 15 2018, 4:14 PM

Few nits.

asbirlea updated this revision to Diff 160942.Aug 15 2018, 4:38 PM

Rename DTUpdate to CFGUpdate, since now we're using the Update in Support.

kuhar added a subscriber: kuhar.Aug 20 2018, 1:54 PM
kuhar added inline comments.
lib/Analysis/MemorySSAUpdater.cpp
662

I don't think this will be a bottleneck, but this can be optimized by starting the search at the node with the lowest level (highest in the tree).

Lots of nits, and a few functional concerns. For the functional concerns that revolve around dubious edge additions, I'm happy with fixes or "it's impossible for that to actually happen; added a comment explaining that."

(If we do take the "that's impossible; have a comment" route, I assume that we're gating the new loop+MSSA transforms behind some loop-enable-mssa-like flag for a while to gain confidence about the correctness of this all?)

I also think we should have unittests (simple ones are perfectly OK) for the new updater functionality, please.

Like said, this review gave me a few things to do. I'll ping this with revisions when all of that's landed...

lib/Analysis/MemorySSAUpdater.cpp
429

It's not immediately clear to me that this can't be tripped in general. e.g.

void foo(int j, int *i) {
  if (j == 1 || j == 2) {
    *i = 0;
  } else {
    *i = 1;
  }
}

I realize this isn't a loopy case, but if we decided, in our infinite wisdom, that it was profitable to have a j == 1 block, a j == 2 block, and an else block, we'd end up with both the j == 1 and j == 2 block having DefMUDI == MSSA->getDefiningAccess(), no?

545

nit: no braces, please

559

nit: does this line need to be wrapped?

(also, please remove the const in *const; it's inconsistent with the remainder of this file)

577

nit: can this be an ArrayRef?

601

nit: if (!Rev...empty())

604

nit: Is it possible/easy to update DT in-place and undo the updates? The tool I'm using to grok this says that this line always lands in CalculateWithUpdates, which appears to always call CalculateFromScratch. It sometimes gets things wrong with templated code, though, so I could be assuming the wrong thing here.

625

(Can this be const, or are we updating it somewhere here?)

627

nit: it looks like all the recursion here is tail recursion. Is it possible/easy to rewrite this as a loop with 100% fewer std::function::operator() invocations? :)

635

nit: s/count/Count/

662

(if there's no reason to add it now, please add a quick TODO with kuhar's comment in case it becomes more desirable in the future)

663

nit: const SmallPtrSetImpl<...> &?

665

nit: auto * (here and elsewhere in the patch. In general, I think the accepted style for for loops is to prefer const auto & if it's a not-pointer, and const auto * if it's a pointer. Dropping const (and & in the first case) as necessary)

689

It looks like we expect PrevPredMap and AddedPredMap to have the same set of keys, and they're sorta related. Would it make sense to have something like

struct PredInfo {
  SmallPtrSet<...> Added;
  SmallPtrSet<...> Prev;
};
SmallDenseMap<BasicBlock *, PredInfo> PredMap;

instead, so the relationship is a bit more clear?

(Or maybe one of these will need to be a SmallSetVector or similar, given the ordering concerns below?)

703

To be sure I'm understanding (I'm not overly familiar with the DT APIs), this is saying "for all predecessors of BB, but also including additions in GD, { /* loop body */ }", yeah?

710

nit: if (PrevBlockSet.empty())

733

FWIW, it looks like this'll make MSSA subtly (and in a way that doesn't matter until we print it) depend on set ordering, which may be bad for writing tests and such.

Since we can iterate over elems in PrevPredMap in any order, we'll be creating these Phis in any order. Since we assign Phis IDs in the order in which they're created ...

In this case, looks like we can just iterate directly over Updates and the problem goes away (assuming Updates is populated in a deterministic order)?

739

nit: if you don't plan on mutating the &, make it a const &, please (and below)

741

nit: !empty()

755

Same (potential) nondeterministic ordering concerns.

760

Consider:

   entry
     |
     a
    / \
   |   c
    \ /
     d

a:
  ; 1 = MemoryDef(liveOnEntry)
c:
  ; 2 = MemoryDef(1)
d:
  ; 3 = MemoryPhi({c, 2}, {a, 1})
  ; 4 = MemoryDef(3); optimized to 1

And we add an edge from entry -> d. I don't claim to know when/if this can reasonably happen, but ...

782

Would it be more direct to just RAUW this Phi with DefP1, then remove it?

789

Same ordering concerns. Maybe this time we could just iterate over BB's preds and fold these two loops into one? As for additions, we may need to have AddedPredMap preserve the order we observe from Updates?

(That would also let us get rid of EdgeCountMap, it seems. Assuming we don't need it when we refactor the loop above)

807

Is it possible for BlocksWithDefsToReplace to get duplicates? If so, might it be a good idea to replace that with a SmallPtrSet?

816

(I'm assuming that there's some sort of guarantee that IDF calculations only pass these BBs into the DT. If it tries to access preds/succs of a BB, we might get badness, since the DT here may be pretending that we have CFG edges that we don't? Or did I miss us temporarily re-adding edges that were deleted between BBs?)

832

Are we sure that we only need to consider BlocksWithDefsToReplace here?

I could be totally misunderstanding, but imagine the following scenario (I'm not even going to try to invent an excuse for when this could happen. :) Just wanted to know if it was considered):

   entry
   /  |
  a   |
 / \  |
 b c  f
 \ /  |
  d   |
   \ /
    e

f:
  ; 1 = MemoryDef(liveOnEntry)
e:
  ; 2 = MemoryPhi({d, liveOnEntry}, {f, 1})

Now imagine that we add an edge f -> c. AFAICT, that would cause us to add a Phi to c (way above) and a Phi to d (idf loop), but I'm failing to see where we'd update the Phi in e to point to the new Phi in d. (Since the only no-longer-dominating BB is a)

Similarly, if d had a MemoryDef in it before we added this edge, would its DefiningAccess get pointed appropriately at this new Phi in d?

835

nit: can we scope this more tightly please? e.g. BasicBlock *DominatedBlock = UsrPhi->... and similar in the else

839

nit: Can we just do a range-for loop? Looks like we're only calling set on the use, which shouldn't cause any invalidation badness.

842

s/dyn_cast/cast/.

If a non-MemoryAccess is using a MemoryAccess, we have problems. :)

852

The O(Blocks.size() * AverageBlockMemoryAccessesCount * AverageMemoryAccessUserCount * (Blocks.size() + IDF.size())) (assuming dominates is constant time) behavior here is somewhat concerning to me, though idk if that's an unfounded concern.

If the user is a Def, it's probably best to just deoptimize it; we'll just start from the Def's defining access in the walker anyway. (...Or, well, we would if we noticed that the optimization is now invalid. For reasons unclear to me looking into history + comments, MemoryDef::isOptimized checks the Def's *defining* access, rather than its optimized access. Which is almost definitely incorrect, at least with the aliasing bits we now cache. Working on that now.)

If it's not the Def's optimized access that we're dealing with, then the only new Access that can work is MSSA->getMemoryAccess(U->getBlock()); (since the only way that can happen is if that Def is the first access in a BB, and we've just inserted that block into said BB).

If it's a Use, we'll try to reoptimize starting from its DefiningAccess, which will be whatever this Use points to. It's unclear what the right trade-off is there, since the more aggressive we are here, the less walking we *may* have to do in the future. Depending on the amount of luck involved in this loop's iteration order (e.g. if we remove the break;, is it possible for U to point to something different when the loop terminates? Is that value going to be better, worse, or are we uncertain? The feeling I get is "yes; uncertain," but I haven't fully reasoned through it,) it may be best to just say "point it at the phi of the Use's BB. If that doesn't exist, walk IDoms until we find the last Def."

(Edit: ...Did I never make MemoryDefs stop using WeakVHes to track what they're optimized to? OK. I'll work on that in parallel, too, since I imagine that's pretty necessary for this to work.)

Thanks a lot of the comments, working to address them.

A couple of quick clarifications.
Yes, enabling memoryssa is guarded by a flag, see dependent patches (unswitch, simple loop unswitch; the tests for those passes enable the flag so we do check correctness) and a couple of additional patches already landed to update memoryssa, again guarded and currently disabled, but the tests for those passes also enable the flag (the landed patches don't use any of the apis in this patch, hence no dependency).

comments-related:
Currently it does do CalculateFromScratch; it's creating a new DT based on a snapshot CFG where the "Delete" edges are undone. It updates MSSA only for "Inserts" added, then doing the "Deletes".
I have additional work I need to do to cleanup DT to use a GraphDiff internally as part of its update process (replace BatchUpdateInfo), and then I should be able to have this be an incremental update of doing the deletes in reverse (i.e. re-insert edges).
As for the IDF accessing succ/pred, it does! I taught it to use a snapshot graph too (GraphDiff passed as arg) in D50675.
Using a GraphDiff and the GraphTraits defined in CFGDiff.h should be the common interface for all.
Long-story short, there's a lot more work to be done before this is enabled :)

asbirlea marked 32 inline comments as done.Aug 28 2018, 11:21 AM

I'm planning to add a few more unittests, but sending the updates now for additional comments.

lib/Analysis/MemorySSAUpdater.cpp
604

At the moment we are creating a new DT. I should be able to optimize this after cleaning up DT to use GraphDiff internally. For now, added a comment.

703

Yes. The additions in GD in this case are edges that were removed and that we're reinserting for the scope of this update.

760

Fixed. Even if a Phi exists, continue to the section below that updates defs in BlocksWithDefsToReplace. Added test case as well.

807

We cannot get duplicates. We're walking up the DT, no cycles.

816

The GD passes as argument to IDF takes care of that.

832

This should be handled in the IDF loop above, fixed there.
Added testcase for this scenario.

839

I may be missing something, but if I do:
for (Use &U : DefToReplaceUses.uses()), I get a crash.
I guess the set() does invalidate the iterator? See below, the ++UI needs to be before U.set().

852

This should be fixed. Added testcase as well.

asbirlea updated this revision to Diff 162912.Aug 28 2018, 11:22 AM
asbirlea marked 8 inline comments as done.

Address comments.

asbirlea updated this revision to Diff 163598.Aug 31 2018, 2:41 PM

Fix bug introduced by refactoring.

george.burgess.iv accepted this revision.Sep 5 2018, 5:02 PM

LGTM. Thanks for this!

lib/Analysis/MemorySSAUpdater.cpp
698

s/teh/the

703

s/ / / (two spaces -> one)

839

That's surprising, but WFM. :)

This revision is now accepted and ready to land.Sep 5 2018, 5:02 PM
asbirlea updated this revision to Diff 164133.Sep 5 2018, 5:12 PM

Small update triggered by a test in rotate (patch to be uploaded soon).

Thank you for the patience on this review! :)

I'm doing some additional offline testing before landing (+ the nits you noticed).

asbirlea updated this revision to Diff 164710.Sep 10 2018, 11:49 AM

Final update before commit: changing a cast_or_null to dyn_cast_or_null for cloned instructions. LoopRotate can place Value in the map, as the clone of an instruction.

This revision was automatically updated to reflect the committed changes.