Page MenuHomePhabricator

API to update MemorySSA for cloned blocks.

Authored by asbirlea on Apr 4 2018, 4:49 PM.



End goal is to update MemorySSA in all loop passes. LoopUnswitch clones all blocks in a loop. SimpleLoopUnswitch clones some blocks. LoopRotate clones some instructions.
Some loop passes also make CFG changes.
This is an API based on what I found needed in LoopUnswitch, SimpleLoopUnswitch, LoopRotate, LoopInstSimplify, LoopSimplifyCFG.
Adding dependent patches using this API for context.

Diff Detail


Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
127 ↗(On Diff #160698)

Same 'please constify if possible' nit, though please make this an ArrayRef instead of const SmallVectorImpl &

131 ↗(On Diff #160698)

nit: why do we take a &DT in this one and a *DT in the other?

267 ↗(On Diff #160698)

It's unclear to me what the relation between phis/definitions in "a map of MemoryPhi : Definition" is here. Please elaborate

1548 ↗(On Diff #160698)

Why || defined(LLVM_ENABLE_DUMP)?

1553 ↗(On Diff #160698)

nit: s/== true//

404 ↗(On Diff #160698)

naming nit: removeAllDuplicatePhiEdgesBetween?

425 ↗(On Diff #160698)

please use dyn_cast if MA shouldn't be null

426 ↗(On Diff #160698)

nit: this should only be null if MA is liveOnEntry. Please check against that instead, for clarity

430 ↗(On Diff #160698)

same "please don't use or_null if null isn't OK" nit. (though it looks like we could do else { cast<MemoryPhi> here, since we're checking for MUD above?)

441 ↗(On Diff #160698)


484 ↗(On Diff #160698)


485 ↗(On Diff #160698)

Please check against liveOnEntry instead

494 ↗(On Diff #160698)

s/or_null// (and we can probably change this into else { cast<MemoryPhi> anyway)

550 ↗(On Diff #160698)

can this be a range-for across ExitBlocks?

590 ↗(On Diff #160698)

nit: please either remove the braces after the if or add them after the else. I'm impartial

697 ↗(On Diff #160698)

probably redundant with the line below (unless it's possible for us to see the same BB repeatedly?)

702 ↗(On Diff #160698)

Looks like you can replace everything between { ... } with EdgeCountMap[{Pi, BB}]++;.

operator[] on map-like containers generally acts like "return a reference if it exists. otherwise, construct or zero-init a member in the map and return a ref to that"

708 ↗(On Diff #160698)

same "looks like you can replace everything between ..."

720 ↗(On Diff #160698)

nit: There should probably be spaces before the end of each of these quotes

866 ↗(On Diff #160698)

nit: I think this is long enough that it's best to just have braces after the for and if on lines 837/838. :)

asbirlea updated this revision to Diff 160935.Aug 15 2018, 4:00 PM
asbirlea marked 21 inline comments as done.

Address comments.
Thank you for the detailed nits :).

1548 ↗(On Diff #160698)

copy-paste :)..removed.

702 ↗(On Diff #160698)

Ah, I wasn't sure it was zero-inited.

asbirlea updated this revision to Diff 160938.Aug 15 2018, 4:14 PM

Few nits.

asbirlea updated this revision to Diff 160942.Aug 15 2018, 4:38 PM

Rename DTUpdate to CFGUpdate, since now we're using the Update in Support.

kuhar added a subscriber: kuhar.Aug 20 2018, 1:54 PM
kuhar added inline comments.
662 ↗(On Diff #160942)

I don't think this will be a bottleneck, but this can be optimized by starting the search at the node with the lowest level (highest in the tree).

Lots of nits, and a few functional concerns. For the functional concerns that revolve around dubious edge additions, I'm happy with fixes or "it's impossible for that to actually happen; added a comment explaining that."

(If we do take the "that's impossible; have a comment" route, I assume that we're gating the new loop+MSSA transforms behind some loop-enable-mssa-like flag for a while to gain confidence about the correctness of this all?)

I also think we should have unittests (simple ones are perfectly OK) for the new updater functionality, please.

Like said, this review gave me a few things to do. I'll ping this with revisions when all of that's landed...

429 ↗(On Diff #160942)

It's not immediately clear to me that this can't be tripped in general. e.g.

void foo(int j, int *i) {
  if (j == 1 || j == 2) {
    *i = 0;
  } else {
    *i = 1;

I realize this isn't a loopy case, but if we decided, in our infinite wisdom, that it was profitable to have a j == 1 block, a j == 2 block, and an else block, we'd end up with both the j == 1 and j == 2 block having DefMUDI == MSSA->getDefiningAccess(), no?

545 ↗(On Diff #160942)

nit: no braces, please

559 ↗(On Diff #160942)

nit: does this line need to be wrapped?

(also, please remove the const in *const; it's inconsistent with the remainder of this file)

577 ↗(On Diff #160942)

nit: can this be an ArrayRef?

601 ↗(On Diff #160942)

nit: if (!Rev...empty())

604 ↗(On Diff #160942)

nit: Is it possible/easy to update DT in-place and undo the updates? The tool I'm using to grok this says that this line always lands in CalculateWithUpdates, which appears to always call CalculateFromScratch. It sometimes gets things wrong with templated code, though, so I could be assuming the wrong thing here.

625 ↗(On Diff #160942)

(Can this be const, or are we updating it somewhere here?)

627 ↗(On Diff #160942)

nit: it looks like all the recursion here is tail recursion. Is it possible/easy to rewrite this as a loop with 100% fewer std::function::operator() invocations? :)

635 ↗(On Diff #160942)

nit: s/count/Count/

662 ↗(On Diff #160942)

(if there's no reason to add it now, please add a quick TODO with kuhar's comment in case it becomes more desirable in the future)

663 ↗(On Diff #160942)

nit: const SmallPtrSetImpl<...> &?

665 ↗(On Diff #160942)

nit: auto * (here and elsewhere in the patch. In general, I think the accepted style for for loops is to prefer const auto & if it's a not-pointer, and const auto * if it's a pointer. Dropping const (and & in the first case) as necessary)

689 ↗(On Diff #160942)

It looks like we expect PrevPredMap and AddedPredMap to have the same set of keys, and they're sorta related. Would it make sense to have something like

struct PredInfo {
  SmallPtrSet<...> Added;
  SmallPtrSet<...> Prev;
SmallDenseMap<BasicBlock *, PredInfo> PredMap;

instead, so the relationship is a bit more clear?

(Or maybe one of these will need to be a SmallSetVector or similar, given the ordering concerns below?)

703 ↗(On Diff #160942)

To be sure I'm understanding (I'm not overly familiar with the DT APIs), this is saying "for all predecessors of BB, but also including additions in GD, { /* loop body */ }", yeah?

710 ↗(On Diff #160942)

nit: if (PrevBlockSet.empty())

733 ↗(On Diff #160942)

FWIW, it looks like this'll make MSSA subtly (and in a way that doesn't matter until we print it) depend on set ordering, which may be bad for writing tests and such.

Since we can iterate over elems in PrevPredMap in any order, we'll be creating these Phis in any order. Since we assign Phis IDs in the order in which they're created ...

In this case, looks like we can just iterate directly over Updates and the problem goes away (assuming Updates is populated in a deterministic order)?

739 ↗(On Diff #160942)

nit: if you don't plan on mutating the &, make it a const &, please (and below)

741 ↗(On Diff #160942)

nit: !empty()

755 ↗(On Diff #160942)

Same (potential) nondeterministic ordering concerns.

760 ↗(On Diff #160942)


    / \
   |   c
    \ /

  ; 1 = MemoryDef(liveOnEntry)
  ; 2 = MemoryDef(1)
  ; 3 = MemoryPhi({c, 2}, {a, 1})
  ; 4 = MemoryDef(3); optimized to 1

And we add an edge from entry -> d. I don't claim to know when/if this can reasonably happen, but ...

782 ↗(On Diff #160942)

Would it be more direct to just RAUW this Phi with DefP1, then remove it?

789 ↗(On Diff #160942)

Same ordering concerns. Maybe this time we could just iterate over BB's preds and fold these two loops into one? As for additions, we may need to have AddedPredMap preserve the order we observe from Updates?

(That would also let us get rid of EdgeCountMap, it seems. Assuming we don't need it when we refactor the loop above)

807 ↗(On Diff #160942)

Is it possible for BlocksWithDefsToReplace to get duplicates? If so, might it be a good idea to replace that with a SmallPtrSet?

816 ↗(On Diff #160942)

(I'm assuming that there's some sort of guarantee that IDF calculations only pass these BBs into the DT. If it tries to access preds/succs of a BB, we might get badness, since the DT here may be pretending that we have CFG edges that we don't? Or did I miss us temporarily re-adding edges that were deleted between BBs?)

832 ↗(On Diff #160942)

Are we sure that we only need to consider BlocksWithDefsToReplace here?

I could be totally misunderstanding, but imagine the following scenario (I'm not even going to try to invent an excuse for when this could happen. :) Just wanted to know if it was considered):

   /  |
  a   |
 / \  |
 b c  f
 \ /  |
  d   |
   \ /

  ; 1 = MemoryDef(liveOnEntry)
  ; 2 = MemoryPhi({d, liveOnEntry}, {f, 1})

Now imagine that we add an edge f -> c. AFAICT, that would cause us to add a Phi to c (way above) and a Phi to d (idf loop), but I'm failing to see where we'd update the Phi in e to point to the new Phi in d. (Since the only no-longer-dominating BB is a)

Similarly, if d had a MemoryDef in it before we added this edge, would its DefiningAccess get pointed appropriately at this new Phi in d?

835 ↗(On Diff #160942)

nit: can we scope this more tightly please? e.g. BasicBlock *DominatedBlock = UsrPhi->... and similar in the else

839 ↗(On Diff #160942)

nit: Can we just do a range-for loop? Looks like we're only calling set on the use, which shouldn't cause any invalidation badness.

842 ↗(On Diff #160942)


If a non-MemoryAccess is using a MemoryAccess, we have problems. :)

852 ↗(On Diff #160942)

The O(Blocks.size() * AverageBlockMemoryAccessesCount * AverageMemoryAccessUserCount * (Blocks.size() + IDF.size())) (assuming dominates is constant time) behavior here is somewhat concerning to me, though idk if that's an unfounded concern.

If the user is a Def, it's probably best to just deoptimize it; we'll just start from the Def's defining access in the walker anyway. (...Or, well, we would if we noticed that the optimization is now invalid. For reasons unclear to me looking into history + comments, MemoryDef::isOptimized checks the Def's *defining* access, rather than its optimized access. Which is almost definitely incorrect, at least with the aliasing bits we now cache. Working on that now.)

If it's not the Def's optimized access that we're dealing with, then the only new Access that can work is MSSA->getMemoryAccess(U->getBlock()); (since the only way that can happen is if that Def is the first access in a BB, and we've just inserted that block into said BB).

If it's a Use, we'll try to reoptimize starting from its DefiningAccess, which will be whatever this Use points to. It's unclear what the right trade-off is there, since the more aggressive we are here, the less walking we *may* have to do in the future. Depending on the amount of luck involved in this loop's iteration order (e.g. if we remove the break;, is it possible for U to point to something different when the loop terminates? Is that value going to be better, worse, or are we uncertain? The feeling I get is "yes; uncertain," but I haven't fully reasoned through it,) it may be best to just say "point it at the phi of the Use's BB. If that doesn't exist, walk IDoms until we find the last Def."

(Edit: ...Did I never make MemoryDefs stop using WeakVHes to track what they're optimized to? OK. I'll work on that in parallel, too, since I imagine that's pretty necessary for this to work.)

Thanks a lot of the comments, working to address them.

A couple of quick clarifications.
Yes, enabling memoryssa is guarded by a flag, see dependent patches (unswitch, simple loop unswitch; the tests for those passes enable the flag so we do check correctness) and a couple of additional patches already landed to update memoryssa, again guarded and currently disabled, but the tests for those passes also enable the flag (the landed patches don't use any of the apis in this patch, hence no dependency).

Currently it does do CalculateFromScratch; it's creating a new DT based on a snapshot CFG where the "Delete" edges are undone. It updates MSSA only for "Inserts" added, then doing the "Deletes".
I have additional work I need to do to cleanup DT to use a GraphDiff internally as part of its update process (replace BatchUpdateInfo), and then I should be able to have this be an incremental update of doing the deletes in reverse (i.e. re-insert edges).
As for the IDF accessing succ/pred, it does! I taught it to use a snapshot graph too (GraphDiff passed as arg) in D50675.
Using a GraphDiff and the GraphTraits defined in CFGDiff.h should be the common interface for all.
Long-story short, there's a lot more work to be done before this is enabled :)

asbirlea marked 32 inline comments as done.Aug 28 2018, 11:21 AM

I'm planning to add a few more unittests, but sending the updates now for additional comments.

604 ↗(On Diff #160942)

At the moment we are creating a new DT. I should be able to optimize this after cleaning up DT to use GraphDiff internally. For now, added a comment.

703 ↗(On Diff #160942)

Yes. The additions in GD in this case are edges that were removed and that we're reinserting for the scope of this update.

760 ↗(On Diff #160942)

Fixed. Even if a Phi exists, continue to the section below that updates defs in BlocksWithDefsToReplace. Added test case as well.

807 ↗(On Diff #160942)

We cannot get duplicates. We're walking up the DT, no cycles.

816 ↗(On Diff #160942)

The GD passes as argument to IDF takes care of that.

832 ↗(On Diff #160942)

This should be handled in the IDF loop above, fixed there.
Added testcase for this scenario.

839 ↗(On Diff #160942)

I may be missing something, but if I do:
for (Use &U : DefToReplaceUses.uses()), I get a crash.
I guess the set() does invalidate the iterator? See below, the ++UI needs to be before U.set().

852 ↗(On Diff #160942)

This should be fixed. Added testcase as well.

asbirlea updated this revision to Diff 162912.Aug 28 2018, 11:22 AM
asbirlea marked 8 inline comments as done.

Address comments.

asbirlea updated this revision to Diff 163598.Aug 31 2018, 2:41 PM

Fix bug introduced by refactoring.

george.burgess.iv accepted this revision.Sep 5 2018, 5:02 PM

LGTM. Thanks for this!

839 ↗(On Diff #160942)

That's surprising, but WFM. :)

699 ↗(On Diff #163598)


704 ↗(On Diff #163598)

s/ / / (two spaces -> one)

This revision is now accepted and ready to land.Sep 5 2018, 5:02 PM
asbirlea updated this revision to Diff 164133.Sep 5 2018, 5:12 PM

Small update triggered by a test in rotate (patch to be uploaded soon).

Thank you for the patience on this review! :)

I'm doing some additional offline testing before landing (+ the nits you noticed).

asbirlea updated this revision to Diff 164710.Sep 10 2018, 11:49 AM

Final update before commit: changing a cast_or_null to dyn_cast_or_null for cloned instructions. LoopRotate can place Value in the map, as the clone of an instruction.

This revision was automatically updated to reflect the committed changes.