This is an archive of the discontinued LLVM Phabricator instance.

API to update MemorySSA for cloned blocks.
ClosedPublic

Authored by asbirlea on Apr 4 2018, 4:49 PM.

Details

Summary

End goal is to update MemorySSA in all loop passes. LoopUnswitch clones all blocks in a loop. SimpleLoopUnswitch clones some blocks. LoopRotate clones some instructions.
Some loop passes also make CFG changes.
This is an API based on what I found needed in LoopUnswitch, SimpleLoopUnswitch, LoopRotate, LoopInstSimplify, LoopSimplifyCFG.
Adding dependent patches using this API for context.

Diff Detail

Event Timeline

asbirlea created this revision.Apr 4 2018, 4:49 PM

Uploaded D45300 and D45301 to give usage context.

This is a DRAFT API based on what I found needed in the old LoopUnswitch pass.

To be clear, does this mean "I only would like feedback on the APIs for the moment; please don't nitpick the actual code before that's set in stone," or "I'd like a full code review, but I'm more than happy to refactor the APIs to something better if we can think of it"?

I'm assuming the former, but am also happy to nitpick. :)

First concern is duplicating functionality already present in MemorySSA. I honestly don't understand everything well enough, so it's possible there are chunks that can be factored out from existing methods and the ones in this patch.

Second concern is the best way to expose an update API. Until now, each update call pretty much assumed valid MemorySSA before and after the call. I'm adding APIs where it's known that MemorySSA is broken before the call.

Happy to get nitpicking after the high-level view, just to avoid the distraction.
But if you see something that bugs you now, please feel free to nitpick so we get *that* distraction out of the way :).

Also note that, while the stack of 3 patches doesn't crash any tests, it's not necessarily error-free. It should however give a good view of the end goal.

dberlin added inline comments.Apr 4 2018, 11:23 PM
include/llvm/Analysis/MemorySSA.h
813

I don't understand the desire here, since it already should correctly identify whether it is a def or use on it's own.
What are you trying to do?

asbirlea added inline comments.Apr 5 2018, 8:11 AM
include/llvm/Analysis/MemorySSA.h
813

I'm trying to avoid the additional getModRefInfo call.

For cloned blocks we have a 1:1 mapping of all instructions. E.g., we know we've just cloned I, with MUD access that's a Use with some defining access. I'm trying to create a MUD for the clone instruction with all that info. The use case is in "CloneUsesAndDefs" below.

Perhaps it's not worth the effort, just trying to avoid AA calls where possible.

(Thanks for the patch, by the way!)

So, I've looked at the API at a high level, and I think it's going in a good direction with some tweaks/naming changes. Naturally, whatever we end up standardizing on will need a fair amount of comments explaining what, exactly, it's trying to do/preconditions/etc. :)

FWIW, the naming suggestions are in part to help me be sure I'm not missing anything big WRT respected use-cases, ... So they're a bit shaky at times.

include/llvm/Analysis/MemorySSA.h
813

Would it be clearer to pass MemoryUseOrDef *Template and have this function interrogate it, instead of bool Use and bool Def (or split off a cloneMemoryAccess function to go with the whole cloning theme we're doing)? It's a bit more explicit about what you're trying to do (IMO), and it looks like it would simplify calling code a bit.

(Same with createDefinedAccess)

lib/Analysis/MemorySSAUpdater.cpp
393

updateForClonedLoop?

Given that we talk about entry blocks, exit blocks, etc. insertAccessesFromMap sounds pretty generic.

523

Insert phis if needed because... what happened? :)

I tried to read the loop code, but got a few layers deep in unfamiliar jargon and gave up.

557

General note: I think that Seen + Worklist combos can be replaced with a SmallSetVector. Not something we need to worry about at this very second, but...

814

So... "I just moved everything from Start to the end of To from From to To (I'm so glad we have backticks to disambiguate code from words). Please update MSSA to note that." ?

I'm assuming there are serious constraints on when this can be used. Moreover, I'm assuming that any use is invalid unless either:

  • From was merged into To and is about to be deleted entirely, and From/To logically formed one basic block before the merge (e.g. From's only predecessor was To, and To's only successor was From)
  • ...Or To is the result of chopping From in half starting at Start, so they still both logically form one basic block (with the pred/succ constraints mentioned above)

So, from a wording standpoint, I sort of think we should spell out both of those use-cases as separate methods (mergeSplicedBlocks and spliceBlocks?). If they both end up as one-line wrappers around this private moveInBulk, I'm happy with that. I'd just rather not advertise "hey, this is a general bulk move operation. Come to it with all your bulk memory moving needs." Plus, if we're specific in what we allow, we can throw in more asserts to catch mistakes. :)

816

I don't see a case in the given usage examples where this needs to be anything but End, and I suspect this gets a bit more complicated if Where != End. Until we need that functionality, can we please just assume it's always End for now? (D45300 uses Beginning, but AFAICT, it's a clean BB anyway. So, End should be equivalent?)

850

Why can't this just be "for each user of the last def in To that happens to be a Phi and has From as the old block, update its basic block"?

(We'd also need to special-case when Where != End here, probably, since the last Def in From might not be the last Def in To)

886

movePhiToNewImmediatePredecessor maybe?

It's a bit of a mouthful, yeah...

903

Yeah. This seems like the better approach to me, since this phi we're creating is identical (from the user's perspective, at least) to the previous one. Faster + (I assume) easier to think about + less code = yay.

asbirlea updated this revision to Diff 145098.May 3 2018, 2:52 PM
asbirlea marked 9 inline comments as done.

Address comments.

asbirlea added inline comments.May 3 2018, 2:55 PM
include/llvm/Analysis/MemorySSA.h
813

Updated to MemoryUseOrDef *Template = nullptr.

lib/Analysis/MemorySSAUpdater.cpp
523

Update to updatePhisWhenAddingBBPredecessors.
Kinda long, but at least more insightful? :-/

557

Can you elaborate on this one? The Seen and Worklist structures keep different data, but perhaps I'm missing something here.
Btw, I used the same pattern I saw in fixupDefs, for some consistency.

814

You're right. Created moveAllAfterSpliceBlocks and moveAllAfterMergeBlocks.
Both call private member moveAllAccesses.

850

This can be called after optimizeUses and some defs may be optimized too. Doesn't this mean that any moved def in To can have users in a Phi, not just the last one?

It should work to go over all Defs in To, and all their users and if those are Phis with From incoming update to To. I think that may be more expensive, if there are a lot of Defs and few successors (transitive too), but it can go both ways.

903

Done, but I needed to modify MemorySSA's API to include MemoryPhis: generalized to moveTo(MemoryAccess *).

AFAICT this should be fine, since that call can only used by the updater, and this is the only case where a MemoryPhi is moved.
The alternative is to add another MemorySSA API: moveTo(MemoryPhi *, BasicBlock *) and keep the old one with MemoryUseOrDef. The advantage being the removal of the insertion point. I don't have a strong opinion here.

PTAL.

asbirlea updated this revision to Diff 147218.May 16 2018, 5:22 PM

Update patch with the following:

  • Fix a few bugs that were triggered by tests in SimpleLoopUnswitch
  • Slightly modify current APIs and add a couple APIs motivated by SimpleLoopUnswitch
  • Add APIs to delete blocks (SimpleLoopUnswitch modifies the CFG heavily) and MemoryPhi entries.

This is becoming a pretty complex patch, so if you think at least the removeBlocks/delete from MPhis should be in a separate patch, I can split them.

I will upload the patch that updates MSSA in SimpleLoopUnswitch as a dependecy as well.

As far as testing, all tests pass with the sequence of patches, but I am looking into additional testing.

asbirlea updated this revision to Diff 147539.May 18 2018, 9:52 AM

Some cleanup and clang-format.

(FWIW, Alina and I chatted about this offline. She has a few changes queued up; I'm adding my nits on unchanged bits now, and will do a thorough code review once the new patch is posted)

include/llvm/Analysis/MemorySSAUpdater.h
93

nit: all defs, uses, and phis

133

I like the ascii art :)

144

Nit: is Start the instruction that was previously the last instruction in To?

159

Nit: "was added additional predecessors" should probably be worded something like "BB had additional predecessors added"

lib/Analysis/MemorySSAUpdater.cpp
523

Much better. :)

I'm inclined to think updatePhisForNewBBPredecessors might be a bit more clear, but I'm also fine with this if you disagree.

527

Please also note that uncloned exit blocks don't have predecessors/successors added

557

...Do they? AFAICT, conceptually, this code does a BFS across nodes in our graph of BBs, and WorklistBB is the queue we use to visit everything (with Seen keeping us from revisiting). If I'm correct, SmallSetVector sounds like a sane thing here to me.

asbirlea updated this revision to Diff 150598.Jun 8 2018, 6:23 PM

This update is more a whole rewrite of the APIs.

  • I separated the cloning into an API that just clones (1:1 mapping) uses and defs in blocks
    • Cloning of blocks must be done in RPO order, assume MSSA api receives the block in the correct order.
    • Note that the cloning API will need to change (again) soon to take a vector of maps, once a pending SimpleLoopUnswitch patch (to make a set of clones of the same block at once) lands.
  • Exit blocks are handled separately using the api for blocks getting precedessors
  • All edges being added are handled in a single call (update phis when adding BB preds), and may end up looking a lot like the DT update (vector of some "update" type).
    • This call is adding Phis (if needed) in the BBs gaining predecessors and in the IDF of the blocks now defining accesses.
    • Next, it's fixing domination by replacing uses of all defs that may no longer dominate their uses with the nearest dominating def.
    • The algorithm to get this right is fairly complex and it does rely on the Def-Def optimizations being Uses, which is not currently the case.
  • APIs to move accesses updated based on future usage in unswitch
  • removeBlocks APis simplified in updater and in mssa.

I would appreciate high-level comments on the approach/algorithm first.

asbirlea updated this revision to Diff 153574.Jun 29 2018, 2:27 PM

Rebase after checking in delete APIs.
Updated algorithm for updatePhisWhenAddingBBPred to consider any number of predecessors added for each block.
Updated cloneUseDef signature to use an LoopBlocksRPO.
Updated cloneUseDef to include an API with multiple VMaps, per the SimpleLoopUnswitch change.
Other cleanups.

asbirlea edited the summary of this revision. (Show Details)Jun 29 2018, 2:28 PM
asbirlea updated this revision to Diff 160698.Aug 14 2018, 2:55 PM

Rebase on recent changes.
Added APIs now greatly simplified. Most updates are now done through applyUpdates.

asbirlea marked 15 inline comments as done.Aug 14 2018, 3:11 PM
asbirlea updated this revision to Diff 160702.Aug 14 2018, 3:13 PM

Nit: add a comment.

I'm not entirely through this patch (I hope to make more progress in the coming few days), but this is what I have for now. Logic's looking solid so far; I'm just nitpicky. :)

include/llvm/Analysis/MemorySSAUpdater.h
104

Do we mutate LoopBlocks/VM anywhere? If not, please constify them (here and in later new functions)

116

Same 'please constify if possible' nit, though please make this an ArrayRef instead of const SmallVectorImpl &

include/llvm/Analysis/MemorySSAUpdater.h
120

nit: why do we take a &DT in this one and a *DT in the other?

236

It's unclear to me what the relation between phis/definitions in "a map of MemoryPhi : Definition" is here. Please elaborate

lib/Analysis/MemorySSA.cpp
1540

Why || defined(LLVM_ENABLE_DUMP)?

1545

nit: s/== true//

lib/Analysis/MemorySSAUpdater.cpp
401

naming nit: removeAllDuplicatePhiEdgesBetween?

422

please use dyn_cast if MA shouldn't be null

423

nit: this should only be null if MA is liveOnEntry. Please check against that instead, for clarity

427

same "please don't use or_null if null isn't OK" nit. (though it looks like we could do else { cast<MemoryPhi> here, since we're checking for MUD above?)

438

s/or_null//

481

s/or_null//

482

Please check against liveOnEntry instead

491

s/or_null// (and we can probably change this into else { cast<MemoryPhi> anyway)

547

can this be a range-for across ExitBlocks?

587

nit: please either remove the braces after the if or add them after the else. I'm impartial

694

probably redundant with the line below (unless it's possible for us to see the same BB repeatedly?)

699

Looks like you can replace everything between { ... } with EdgeCountMap[{Pi, BB}]++;.

operator[] on map-like containers generally acts like "return a reference if it exists. otherwise, construct or zero-init a member in the map and return a ref to that"

705

same "looks like you can replace everything between ..."

717

nit: There should probably be spaces before the end of each of these quotes

863

nit: I think this is long enough that it's best to just have braces after the for and if on lines 837/838. :)

asbirlea updated this revision to Diff 160935.Aug 15 2018, 4:00 PM
asbirlea marked 21 inline comments as done.

Address comments.
Thank you for the detailed nits :).

lib/Analysis/MemorySSA.cpp
1540

copy-paste :)..removed.

lib/Analysis/MemorySSAUpdater.cpp
699

Ah, I wasn't sure it was zero-inited.

asbirlea updated this revision to Diff 160938.Aug 15 2018, 4:14 PM

Few nits.

asbirlea updated this revision to Diff 160942.Aug 15 2018, 4:38 PM

Rename DTUpdate to CFGUpdate, since now we're using the Update in Support.

kuhar added a subscriber: kuhar.Aug 20 2018, 1:54 PM
kuhar added inline comments.
lib/Analysis/MemorySSAUpdater.cpp
659

I don't think this will be a bottleneck, but this can be optimized by starting the search at the node with the lowest level (highest in the tree).

Lots of nits, and a few functional concerns. For the functional concerns that revolve around dubious edge additions, I'm happy with fixes or "it's impossible for that to actually happen; added a comment explaining that."

(If we do take the "that's impossible; have a comment" route, I assume that we're gating the new loop+MSSA transforms behind some loop-enable-mssa-like flag for a while to gain confidence about the correctness of this all?)

I also think we should have unittests (simple ones are perfectly OK) for the new updater functionality, please.

Like said, this review gave me a few things to do. I'll ping this with revisions when all of that's landed...

lib/Analysis/MemorySSAUpdater.cpp
426

It's not immediately clear to me that this can't be tripped in general. e.g.

void foo(int j, int *i) {
  if (j == 1 || j == 2) {
    *i = 0;
  } else {
    *i = 1;
  }
}

I realize this isn't a loopy case, but if we decided, in our infinite wisdom, that it was profitable to have a j == 1 block, a j == 2 block, and an else block, we'd end up with both the j == 1 and j == 2 block having DefMUDI == MSSA->getDefiningAccess(), no?

542

nit: no braces, please

556

nit: does this line need to be wrapped?

(also, please remove the const in *const; it's inconsistent with the remainder of this file)

574

nit: can this be an ArrayRef?

598

nit: if (!Rev...empty())

601

nit: Is it possible/easy to update DT in-place and undo the updates? The tool I'm using to grok this says that this line always lands in CalculateWithUpdates, which appears to always call CalculateFromScratch. It sometimes gets things wrong with templated code, though, so I could be assuming the wrong thing here.

622

(Can this be const, or are we updating it somewhere here?)

624

nit: it looks like all the recursion here is tail recursion. Is it possible/easy to rewrite this as a loop with 100% fewer std::function::operator() invocations? :)

632

nit: s/count/Count/

659

(if there's no reason to add it now, please add a quick TODO with kuhar's comment in case it becomes more desirable in the future)

660

nit: const SmallPtrSetImpl<...> &?

662

nit: auto * (here and elsewhere in the patch. In general, I think the accepted style for for loops is to prefer const auto & if it's a not-pointer, and const auto * if it's a pointer. Dropping const (and & in the first case) as necessary)

686

It looks like we expect PrevPredMap and AddedPredMap to have the same set of keys, and they're sorta related. Would it make sense to have something like

struct PredInfo {
  SmallPtrSet<...> Added;
  SmallPtrSet<...> Prev;
};
SmallDenseMap<BasicBlock *, PredInfo> PredMap;

instead, so the relationship is a bit more clear?

(Or maybe one of these will need to be a SmallSetVector or similar, given the ordering concerns below?)

700

To be sure I'm understanding (I'm not overly familiar with the DT APIs), this is saying "for all predecessors of BB, but also including additions in GD, { /* loop body */ }", yeah?

707

nit: if (PrevBlockSet.empty())

730

FWIW, it looks like this'll make MSSA subtly (and in a way that doesn't matter until we print it) depend on set ordering, which may be bad for writing tests and such.

Since we can iterate over elems in PrevPredMap in any order, we'll be creating these Phis in any order. Since we assign Phis IDs in the order in which they're created ...

In this case, looks like we can just iterate directly over Updates and the problem goes away (assuming Updates is populated in a deterministic order)?

736

nit: if you don't plan on mutating the &, make it a const &, please (and below)

738

nit: !empty()

752

Same (potential) nondeterministic ordering concerns.

757

Consider:

   entry
     |
     a
    / \
   |   c
    \ /
     d

a:
  ; 1 = MemoryDef(liveOnEntry)
c:
  ; 2 = MemoryDef(1)
d:
  ; 3 = MemoryPhi({c, 2}, {a, 1})
  ; 4 = MemoryDef(3); optimized to 1

And we add an edge from entry -> d. I don't claim to know when/if this can reasonably happen, but ...

779

Would it be more direct to just RAUW this Phi with DefP1, then remove it?

786

Same ordering concerns. Maybe this time we could just iterate over BB's preds and fold these two loops into one? As for additions, we may need to have AddedPredMap preserve the order we observe from Updates?

(That would also let us get rid of EdgeCountMap, it seems. Assuming we don't need it when we refactor the loop above)

804

Is it possible for BlocksWithDefsToReplace to get duplicates? If so, might it be a good idea to replace that with a SmallPtrSet?

813

(I'm assuming that there's some sort of guarantee that IDF calculations only pass these BBs into the DT. If it tries to access preds/succs of a BB, we might get badness, since the DT here may be pretending that we have CFG edges that we don't? Or did I miss us temporarily re-adding edges that were deleted between BBs?)

829

Are we sure that we only need to consider BlocksWithDefsToReplace here?

I could be totally misunderstanding, but imagine the following scenario (I'm not even going to try to invent an excuse for when this could happen. :) Just wanted to know if it was considered):

   entry
   /  |
  a   |
 / \  |
 b c  f
 \ /  |
  d   |
   \ /
    e

f:
  ; 1 = MemoryDef(liveOnEntry)
e:
  ; 2 = MemoryPhi({d, liveOnEntry}, {f, 1})

Now imagine that we add an edge f -> c. AFAICT, that would cause us to add a Phi to c (way above) and a Phi to d (idf loop), but I'm failing to see where we'd update the Phi in e to point to the new Phi in d. (Since the only no-longer-dominating BB is a)

Similarly, if d had a MemoryDef in it before we added this edge, would its DefiningAccess get pointed appropriately at this new Phi in d?

832

nit: can we scope this more tightly please? e.g. BasicBlock *DominatedBlock = UsrPhi->... and similar in the else

836

nit: Can we just do a range-for loop? Looks like we're only calling set on the use, which shouldn't cause any invalidation badness.

839

s/dyn_cast/cast/.

If a non-MemoryAccess is using a MemoryAccess, we have problems. :)

849

The O(Blocks.size() * AverageBlockMemoryAccessesCount * AverageMemoryAccessUserCount * (Blocks.size() + IDF.size())) (assuming dominates is constant time) behavior here is somewhat concerning to me, though idk if that's an unfounded concern.

If the user is a Def, it's probably best to just deoptimize it; we'll just start from the Def's defining access in the walker anyway. (...Or, well, we would if we noticed that the optimization is now invalid. For reasons unclear to me looking into history + comments, MemoryDef::isOptimized checks the Def's *defining* access, rather than its optimized access. Which is almost definitely incorrect, at least with the aliasing bits we now cache. Working on that now.)

If it's not the Def's optimized access that we're dealing with, then the only new Access that can work is MSSA->getMemoryAccess(U->getBlock()); (since the only way that can happen is if that Def is the first access in a BB, and we've just inserted that block into said BB).

If it's a Use, we'll try to reoptimize starting from its DefiningAccess, which will be whatever this Use points to. It's unclear what the right trade-off is there, since the more aggressive we are here, the less walking we *may* have to do in the future. Depending on the amount of luck involved in this loop's iteration order (e.g. if we remove the break;, is it possible for U to point to something different when the loop terminates? Is that value going to be better, worse, or are we uncertain? The feeling I get is "yes; uncertain," but I haven't fully reasoned through it,) it may be best to just say "point it at the phi of the Use's BB. If that doesn't exist, walk IDoms until we find the last Def."

(Edit: ...Did I never make MemoryDefs stop using WeakVHes to track what they're optimized to? OK. I'll work on that in parallel, too, since I imagine that's pretty necessary for this to work.)

Thanks a lot of the comments, working to address them.

A couple of quick clarifications.
Yes, enabling memoryssa is guarded by a flag, see dependent patches (unswitch, simple loop unswitch; the tests for those passes enable the flag so we do check correctness) and a couple of additional patches already landed to update memoryssa, again guarded and currently disabled, but the tests for those passes also enable the flag (the landed patches don't use any of the apis in this patch, hence no dependency).

comments-related:
Currently it does do CalculateFromScratch; it's creating a new DT based on a snapshot CFG where the "Delete" edges are undone. It updates MSSA only for "Inserts" added, then doing the "Deletes".
I have additional work I need to do to cleanup DT to use a GraphDiff internally as part of its update process (replace BatchUpdateInfo), and then I should be able to have this be an incremental update of doing the deletes in reverse (i.e. re-insert edges).
As for the IDF accessing succ/pred, it does! I taught it to use a snapshot graph too (GraphDiff passed as arg) in D50675.
Using a GraphDiff and the GraphTraits defined in CFGDiff.h should be the common interface for all.
Long-story short, there's a lot more work to be done before this is enabled :)

asbirlea marked 32 inline comments as done.Aug 28 2018, 11:21 AM

I'm planning to add a few more unittests, but sending the updates now for additional comments.

lib/Analysis/MemorySSAUpdater.cpp
601

At the moment we are creating a new DT. I should be able to optimize this after cleaning up DT to use GraphDiff internally. For now, added a comment.

700

Yes. The additions in GD in this case are edges that were removed and that we're reinserting for the scope of this update.

757

Fixed. Even if a Phi exists, continue to the section below that updates defs in BlocksWithDefsToReplace. Added test case as well.

804

We cannot get duplicates. We're walking up the DT, no cycles.

813

The GD passes as argument to IDF takes care of that.

829

This should be handled in the IDF loop above, fixed there.
Added testcase for this scenario.

836

I may be missing something, but if I do:
for (Use &U : DefToReplaceUses.uses()), I get a crash.
I guess the set() does invalidate the iterator? See below, the ++UI needs to be before U.set().

849

This should be fixed. Added testcase as well.

asbirlea updated this revision to Diff 162912.Aug 28 2018, 11:22 AM
asbirlea marked 8 inline comments as done.

Address comments.

asbirlea updated this revision to Diff 163598.Aug 31 2018, 2:41 PM

Fix bug introduced by refactoring.

george.burgess.iv accepted this revision.Sep 5 2018, 5:02 PM

LGTM. Thanks for this!

lib/Analysis/MemorySSAUpdater.cpp
695

s/teh/the

700

s/ / / (two spaces -> one)

836

That's surprising, but WFM. :)

This revision is now accepted and ready to land.Sep 5 2018, 5:02 PM
asbirlea updated this revision to Diff 164133.Sep 5 2018, 5:12 PM

Small update triggered by a test in rotate (patch to be uploaded soon).

Thank you for the patience on this review! :)

I'm doing some additional offline testing before landing (+ the nits you noticed).

asbirlea updated this revision to Diff 164710.Sep 10 2018, 11:49 AM

Final update before commit: changing a cast_or_null to dyn_cast_or_null for cloned instructions. LoopRotate can place Value in the map, as the clone of an instruction.

This revision was automatically updated to reflect the committed changes.