End goal is to update MemorySSA in all loop passes. LoopUnswitch clones all blocks in a loop. SimpleLoopUnswitch clones some blocks. LoopRotate clones some instructions.
Some loop passes also make CFG changes.
This is an API based on what I found needed in LoopUnswitch, SimpleLoopUnswitch, LoopRotate, LoopInstSimplify, LoopSimplifyCFG.
Adding dependent patches using this API for context.
Details
Diff Detail
- Repository
- rL LLVM
- Build Status
Buildable 19124 Build 19124: arc lint + arc unit
Event Timeline
This is a DRAFT API based on what I found needed in the old LoopUnswitch pass.
To be clear, does this mean "I only would like feedback on the APIs for the moment; please don't nitpick the actual code before that's set in stone," or "I'd like a full code review, but I'm more than happy to refactor the APIs to something better if we can think of it"?
I'm assuming the former, but am also happy to nitpick. :)
First concern is duplicating functionality already present in MemorySSA. I honestly don't understand everything well enough, so it's possible there are chunks that can be factored out from existing methods and the ones in this patch.
Second concern is the best way to expose an update API. Until now, each update call pretty much assumed valid MemorySSA before and after the call. I'm adding APIs where it's known that MemorySSA is broken before the call.
Happy to get nitpicking after the high-level view, just to avoid the distraction.
But if you see something that bugs you now, please feel free to nitpick so we get *that* distraction out of the way :).
Also note that, while the stack of 3 patches doesn't crash any tests, it's not necessarily error-free. It should however give a good view of the end goal.
include/llvm/Analysis/MemorySSA.h | ||
---|---|---|
813 | I don't understand the desire here, since it already should correctly identify whether it is a def or use on it's own. |
include/llvm/Analysis/MemorySSA.h | ||
---|---|---|
813 | I'm trying to avoid the additional getModRefInfo call. For cloned blocks we have a 1:1 mapping of all instructions. E.g., we know we've just cloned I, with MUD access that's a Use with some defining access. I'm trying to create a MUD for the clone instruction with all that info. The use case is in "CloneUsesAndDefs" below. Perhaps it's not worth the effort, just trying to avoid AA calls where possible. |
(Thanks for the patch, by the way!)
So, I've looked at the API at a high level, and I think it's going in a good direction with some tweaks/naming changes. Naturally, whatever we end up standardizing on will need a fair amount of comments explaining what, exactly, it's trying to do/preconditions/etc. :)
FWIW, the naming suggestions are in part to help me be sure I'm not missing anything big WRT respected use-cases, ... So they're a bit shaky at times.
include/llvm/Analysis/MemorySSA.h | ||
---|---|---|
813 | Would it be clearer to pass MemoryUseOrDef *Template and have this function interrogate it, instead of bool Use and bool Def (or split off a cloneMemoryAccess function to go with the whole cloning theme we're doing)? It's a bit more explicit about what you're trying to do (IMO), and it looks like it would simplify calling code a bit. (Same with createDefinedAccess) | |
lib/Analysis/MemorySSAUpdater.cpp | ||
393 | updateForClonedLoop? Given that we talk about entry blocks, exit blocks, etc. insertAccessesFromMap sounds pretty generic. | |
523 | Insert phis if needed because... what happened? :) I tried to read the loop code, but got a few layers deep in unfamiliar jargon and gave up. | |
557 | General note: I think that Seen + Worklist combos can be replaced with a SmallSetVector. Not something we need to worry about at this very second, but... | |
814 | So... "I just moved everything from Start to the end of To from From to To (I'm so glad we have backticks to disambiguate code from words). Please update MSSA to note that." ? I'm assuming there are serious constraints on when this can be used. Moreover, I'm assuming that any use is invalid unless either:
So, from a wording standpoint, I sort of think we should spell out both of those use-cases as separate methods (mergeSplicedBlocks and spliceBlocks?). If they both end up as one-line wrappers around this private moveInBulk, I'm happy with that. I'd just rather not advertise "hey, this is a general bulk move operation. Come to it with all your bulk memory moving needs." Plus, if we're specific in what we allow, we can throw in more asserts to catch mistakes. :) | |
816 | I don't see a case in the given usage examples where this needs to be anything but End, and I suspect this gets a bit more complicated if Where != End. Until we need that functionality, can we please just assume it's always End for now? (D45300 uses Beginning, but AFAICT, it's a clean BB anyway. So, End should be equivalent?) | |
850 | Why can't this just be "for each user of the last def in To that happens to be a Phi and has From as the old block, update its basic block"? (We'd also need to special-case when Where != End here, probably, since the last Def in From might not be the last Def in To) | |
886 | movePhiToNewImmediatePredecessor maybe? It's a bit of a mouthful, yeah... | |
903 | Yeah. This seems like the better approach to me, since this phi we're creating is identical (from the user's perspective, at least) to the previous one. Faster + (I assume) easier to think about + less code = yay. |
include/llvm/Analysis/MemorySSA.h | ||
---|---|---|
813 | Updated to MemoryUseOrDef *Template = nullptr. | |
lib/Analysis/MemorySSAUpdater.cpp | ||
523 | Update to updatePhisWhenAddingBBPredecessors. | |
557 | Can you elaborate on this one? The Seen and Worklist structures keep different data, but perhaps I'm missing something here. | |
814 | You're right. Created moveAllAfterSpliceBlocks and moveAllAfterMergeBlocks. | |
850 | This can be called after optimizeUses and some defs may be optimized too. Doesn't this mean that any moved def in To can have users in a Phi, not just the last one? It should work to go over all Defs in To, and all their users and if those are Phis with From incoming update to To. I think that may be more expensive, if there are a lot of Defs and few successors (transitive too), but it can go both ways. | |
903 | Done, but I needed to modify MemorySSA's API to include MemoryPhis: generalized to moveTo(MemoryAccess *). AFAICT this should be fine, since that call can only used by the updater, and this is the only case where a MemoryPhi is moved. PTAL. |
Update patch with the following:
- Fix a few bugs that were triggered by tests in SimpleLoopUnswitch
- Slightly modify current APIs and add a couple APIs motivated by SimpleLoopUnswitch
- Add APIs to delete blocks (SimpleLoopUnswitch modifies the CFG heavily) and MemoryPhi entries.
This is becoming a pretty complex patch, so if you think at least the removeBlocks/delete from MPhis should be in a separate patch, I can split them.
I will upload the patch that updates MSSA in SimpleLoopUnswitch as a dependecy as well.
As far as testing, all tests pass with the sequence of patches, but I am looking into additional testing.
(FWIW, Alina and I chatted about this offline. She has a few changes queued up; I'm adding my nits on unchanged bits now, and will do a thorough code review once the new patch is posted)
include/llvm/Analysis/MemorySSAUpdater.h | ||
---|---|---|
93 | nit: all defs, uses, and phis | |
133 | I like the ascii art :) | |
144 | Nit: is Start the instruction that was previously the last instruction in To? | |
159 | Nit: "was added additional predecessors" should probably be worded something like "BB had additional predecessors added" | |
lib/Analysis/MemorySSAUpdater.cpp | ||
523 | Much better. :) I'm inclined to think updatePhisForNewBBPredecessors might be a bit more clear, but I'm also fine with this if you disagree. | |
527 | Please also note that uncloned exit blocks don't have predecessors/successors added | |
557 | ...Do they? AFAICT, conceptually, this code does a BFS across nodes in our graph of BBs, and WorklistBB is the queue we use to visit everything (with Seen keeping us from revisiting). If I'm correct, SmallSetVector sounds like a sane thing here to me. |
This update is more a whole rewrite of the APIs.
- I separated the cloning into an API that just clones (1:1 mapping) uses and defs in blocks
- Cloning of blocks must be done in RPO order, assume MSSA api receives the block in the correct order.
- Note that the cloning API will need to change (again) soon to take a vector of maps, once a pending SimpleLoopUnswitch patch (to make a set of clones of the same block at once) lands.
- Exit blocks are handled separately using the api for blocks getting precedessors
- All edges being added are handled in a single call (update phis when adding BB preds), and may end up looking a lot like the DT update (vector of some "update" type).
- This call is adding Phis (if needed) in the BBs gaining predecessors and in the IDF of the blocks now defining accesses.
- Next, it's fixing domination by replacing uses of all defs that may no longer dominate their uses with the nearest dominating def.
- The algorithm to get this right is fairly complex and it does rely on the Def-Def optimizations being Uses, which is not currently the case.
- APIs to move accesses updated based on future usage in unswitch
- removeBlocks APis simplified in updater and in mssa.
I would appreciate high-level comments on the approach/algorithm first.
Rebase after checking in delete APIs.
Updated algorithm for updatePhisWhenAddingBBPred to consider any number of predecessors added for each block.
Updated cloneUseDef signature to use an LoopBlocksRPO.
Updated cloneUseDef to include an API with multiple VMaps, per the SimpleLoopUnswitch change.
Other cleanups.
Rebase on recent changes.
Added APIs now greatly simplified. Most updates are now done through applyUpdates.
I'm not entirely through this patch (I hope to make more progress in the coming few days), but this is what I have for now. Logic's looking solid so far; I'm just nitpicky. :)
include/llvm/Analysis/MemorySSAUpdater.h | ||
---|---|---|
104 | Do we mutate LoopBlocks/VM anywhere? If not, please constify them (here and in later new functions) | |
116 | Same 'please constify if possible' nit, though please make this an ArrayRef instead of const SmallVectorImpl & |
include/llvm/Analysis/MemorySSAUpdater.h | ||
---|---|---|
120 | nit: why do we take a &DT in this one and a *DT in the other? | |
236 | It's unclear to me what the relation between phis/definitions in "a map of MemoryPhi : Definition" is here. Please elaborate | |
lib/Analysis/MemorySSA.cpp | ||
1540 | Why || defined(LLVM_ENABLE_DUMP)? | |
1545 | nit: s/== true// | |
lib/Analysis/MemorySSAUpdater.cpp | ||
401 | naming nit: removeAllDuplicatePhiEdgesBetween? | |
422 | please use dyn_cast if MA shouldn't be null | |
423 | nit: this should only be null if MA is liveOnEntry. Please check against that instead, for clarity | |
427 | same "please don't use or_null if null isn't OK" nit. (though it looks like we could do else { cast<MemoryPhi> here, since we're checking for MUD above?) | |
438 | s/or_null// | |
481 | s/or_null// | |
482 | Please check against liveOnEntry instead | |
491 | s/or_null// (and we can probably change this into else { cast<MemoryPhi> anyway) | |
547 | can this be a range-for across ExitBlocks? | |
587 | nit: please either remove the braces after the if or add them after the else. I'm impartial | |
694 | probably redundant with the line below (unless it's possible for us to see the same BB repeatedly?) | |
699 | Looks like you can replace everything between { ... } with EdgeCountMap[{Pi, BB}]++;. operator[] on map-like containers generally acts like "return a reference if it exists. otherwise, construct or zero-init a member in the map and return a ref to that" | |
705 | same "looks like you can replace everything between ..." | |
717 | nit: There should probably be spaces before the end of each of these quotes | |
863 | nit: I think this is long enough that it's best to just have braces after the for and if on lines 837/838. :) |
lib/Analysis/MemorySSAUpdater.cpp | ||
---|---|---|
659 | I don't think this will be a bottleneck, but this can be optimized by starting the search at the node with the lowest level (highest in the tree). |
Lots of nits, and a few functional concerns. For the functional concerns that revolve around dubious edge additions, I'm happy with fixes or "it's impossible for that to actually happen; added a comment explaining that."
(If we do take the "that's impossible; have a comment" route, I assume that we're gating the new loop+MSSA transforms behind some loop-enable-mssa-like flag for a while to gain confidence about the correctness of this all?)
I also think we should have unittests (simple ones are perfectly OK) for the new updater functionality, please.
Like said, this review gave me a few things to do. I'll ping this with revisions when all of that's landed...
lib/Analysis/MemorySSAUpdater.cpp | ||
---|---|---|
426 | It's not immediately clear to me that this can't be tripped in general. e.g. void foo(int j, int *i) { if (j == 1 || j == 2) { *i = 0; } else { *i = 1; } } I realize this isn't a loopy case, but if we decided, in our infinite wisdom, that it was profitable to have a j == 1 block, a j == 2 block, and an else block, we'd end up with both the j == 1 and j == 2 block having DefMUDI == MSSA->getDefiningAccess(), no? | |
542 | nit: no braces, please | |
556 | nit: does this line need to be wrapped? (also, please remove the const in *const; it's inconsistent with the remainder of this file) | |
574 | nit: can this be an ArrayRef? | |
598 | nit: if (!Rev...empty()) | |
601 | nit: Is it possible/easy to update DT in-place and undo the updates? The tool I'm using to grok this says that this line always lands in CalculateWithUpdates, which appears to always call CalculateFromScratch. It sometimes gets things wrong with templated code, though, so I could be assuming the wrong thing here. | |
622 | (Can this be const, or are we updating it somewhere here?) | |
624 | nit: it looks like all the recursion here is tail recursion. Is it possible/easy to rewrite this as a loop with 100% fewer std::function::operator() invocations? :) | |
632 | nit: s/count/Count/ | |
659 | (if there's no reason to add it now, please add a quick TODO with kuhar's comment in case it becomes more desirable in the future) | |
660 | nit: const SmallPtrSetImpl<...> &? | |
662 | nit: auto * (here and elsewhere in the patch. In general, I think the accepted style for for loops is to prefer const auto & if it's a not-pointer, and const auto * if it's a pointer. Dropping const (and & in the first case) as necessary) | |
686 | It looks like we expect PrevPredMap and AddedPredMap to have the same set of keys, and they're sorta related. Would it make sense to have something like struct PredInfo { SmallPtrSet<...> Added; SmallPtrSet<...> Prev; }; SmallDenseMap<BasicBlock *, PredInfo> PredMap; instead, so the relationship is a bit more clear? (Or maybe one of these will need to be a SmallSetVector or similar, given the ordering concerns below?) | |
700 | To be sure I'm understanding (I'm not overly familiar with the DT APIs), this is saying "for all predecessors of BB, but also including additions in GD, { /* loop body */ }", yeah? | |
707 | nit: if (PrevBlockSet.empty()) | |
730 | FWIW, it looks like this'll make MSSA subtly (and in a way that doesn't matter until we print it) depend on set ordering, which may be bad for writing tests and such. Since we can iterate over elems in PrevPredMap in any order, we'll be creating these Phis in any order. Since we assign Phis IDs in the order in which they're created ... In this case, looks like we can just iterate directly over Updates and the problem goes away (assuming Updates is populated in a deterministic order)? | |
736 | nit: if you don't plan on mutating the &, make it a const &, please (and below) | |
738 | nit: !empty() | |
752 | Same (potential) nondeterministic ordering concerns. | |
757 | Consider: entry | a / \ | c \ / d a: ; 1 = MemoryDef(liveOnEntry) c: ; 2 = MemoryDef(1) d: ; 3 = MemoryPhi({c, 2}, {a, 1}) ; 4 = MemoryDef(3); optimized to 1 And we add an edge from entry -> d. I don't claim to know when/if this can reasonably happen, but ... | |
779 | Would it be more direct to just RAUW this Phi with DefP1, then remove it? | |
786 | Same ordering concerns. Maybe this time we could just iterate over BB's preds and fold these two loops into one? As for additions, we may need to have AddedPredMap preserve the order we observe from Updates? (That would also let us get rid of EdgeCountMap, it seems. Assuming we don't need it when we refactor the loop above) | |
804 | Is it possible for BlocksWithDefsToReplace to get duplicates? If so, might it be a good idea to replace that with a SmallPtrSet? | |
813 | (I'm assuming that there's some sort of guarantee that IDF calculations only pass these BBs into the DT. If it tries to access preds/succs of a BB, we might get badness, since the DT here may be pretending that we have CFG edges that we don't? Or did I miss us temporarily re-adding edges that were deleted between BBs?) | |
829 | Are we sure that we only need to consider BlocksWithDefsToReplace here? I could be totally misunderstanding, but imagine the following scenario (I'm not even going to try to invent an excuse for when this could happen. :) Just wanted to know if it was considered): entry / | a | / \ | b c f \ / | d | \ / e f: ; 1 = MemoryDef(liveOnEntry) e: ; 2 = MemoryPhi({d, liveOnEntry}, {f, 1}) Now imagine that we add an edge f -> c. AFAICT, that would cause us to add a Phi to c (way above) and a Phi to d (idf loop), but I'm failing to see where we'd update the Phi in e to point to the new Phi in d. (Since the only no-longer-dominating BB is a) Similarly, if d had a MemoryDef in it before we added this edge, would its DefiningAccess get pointed appropriately at this new Phi in d? | |
832 | nit: can we scope this more tightly please? e.g. BasicBlock *DominatedBlock = UsrPhi->... and similar in the else | |
836 | nit: Can we just do a range-for loop? Looks like we're only calling set on the use, which shouldn't cause any invalidation badness. | |
839 | s/dyn_cast/cast/. If a non-MemoryAccess is using a MemoryAccess, we have problems. :) | |
849 | The O(Blocks.size() * AverageBlockMemoryAccessesCount * AverageMemoryAccessUserCount * (Blocks.size() + IDF.size())) (assuming dominates is constant time) behavior here is somewhat concerning to me, though idk if that's an unfounded concern. If the user is a Def, it's probably best to just deoptimize it; we'll just start from the Def's defining access in the walker anyway. (...Or, well, we would if we noticed that the optimization is now invalid. For reasons unclear to me looking into history + comments, MemoryDef::isOptimized checks the Def's *defining* access, rather than its optimized access. Which is almost definitely incorrect, at least with the aliasing bits we now cache. Working on that now.) If it's not the Def's optimized access that we're dealing with, then the only new Access that can work is MSSA->getMemoryAccess(U->getBlock()); (since the only way that can happen is if that Def is the first access in a BB, and we've just inserted that block into said BB). If it's a Use, we'll try to reoptimize starting from its DefiningAccess, which will be whatever this Use points to. It's unclear what the right trade-off is there, since the more aggressive we are here, the less walking we *may* have to do in the future. Depending on the amount of luck involved in this loop's iteration order (e.g. if we remove the break;, is it possible for U to point to something different when the loop terminates? Is that value going to be better, worse, or are we uncertain? The feeling I get is "yes; uncertain," but I haven't fully reasoned through it,) it may be best to just say "point it at the phi of the Use's BB. If that doesn't exist, walk IDoms until we find the last Def." (Edit: ...Did I never make MemoryDefs stop using WeakVHes to track what they're optimized to? OK. I'll work on that in parallel, too, since I imagine that's pretty necessary for this to work.) |
Thanks a lot of the comments, working to address them.
A couple of quick clarifications.
Yes, enabling memoryssa is guarded by a flag, see dependent patches (unswitch, simple loop unswitch; the tests for those passes enable the flag so we do check correctness) and a couple of additional patches already landed to update memoryssa, again guarded and currently disabled, but the tests for those passes also enable the flag (the landed patches don't use any of the apis in this patch, hence no dependency).
comments-related:
Currently it does do CalculateFromScratch; it's creating a new DT based on a snapshot CFG where the "Delete" edges are undone. It updates MSSA only for "Inserts" added, then doing the "Deletes".
I have additional work I need to do to cleanup DT to use a GraphDiff internally as part of its update process (replace BatchUpdateInfo), and then I should be able to have this be an incremental update of doing the deletes in reverse (i.e. re-insert edges).
As for the IDF accessing succ/pred, it does! I taught it to use a snapshot graph too (GraphDiff passed as arg) in D50675.
Using a GraphDiff and the GraphTraits defined in CFGDiff.h should be the common interface for all.
Long-story short, there's a lot more work to be done before this is enabled :)
I'm planning to add a few more unittests, but sending the updates now for additional comments.
lib/Analysis/MemorySSAUpdater.cpp | ||
---|---|---|
601 | At the moment we are creating a new DT. I should be able to optimize this after cleaning up DT to use GraphDiff internally. For now, added a comment. | |
700 | Yes. The additions in GD in this case are edges that were removed and that we're reinserting for the scope of this update. | |
757 | Fixed. Even if a Phi exists, continue to the section below that updates defs in BlocksWithDefsToReplace. Added test case as well. | |
804 | We cannot get duplicates. We're walking up the DT, no cycles. | |
813 | The GD passes as argument to IDF takes care of that. | |
829 | This should be handled in the IDF loop above, fixed there. | |
836 | I may be missing something, but if I do: | |
849 | This should be fixed. Added testcase as well. |
Thank you for the patience on this review! :)
I'm doing some additional offline testing before landing (+ the nits you noticed).
Final update before commit: changing a cast_or_null to dyn_cast_or_null for cloned instructions. LoopRotate can place Value in the map, as the clone of an instruction.
I don't understand the desire here, since it already should correctly identify whether it is a def or use on it's own.
What are you trying to do?