[Dominators] Add the DomTreeUpdater class
ClosedPublic

Authored by NutshellySima on Jun 20 2018, 11:31 AM.

Details

Summary

This patch is the first in a series of patches related to the RFC - A new dominator tree updater for LLVM.

This patch introduces the DomTreeUpdater class, which provides a cleaner API to perform updates on available dominator trees (none, only DomTree, only PostDomTree, both) using different update strategies (eagerly or lazily) to simplify the updating process.

—Prior to the patch—

  1. The current API is quite fragmented and different functions using these APIs need to write redundant code to manually deal with various kinds of update strategies which makes code hard to maintain.
    • Directly calling update functions of DominatorTree updates the data structure eagerly while DeferredDominance does updates lazily.
    • DeferredDominance class cannot be used when a PostDominatorTree also needs to be updated.
    • Functions receiving DT/DDT need to branch a lot which is currently necessary.
    • Functions using both DomTree and PostDomTree need to call the update function separately on both trees.
    • People need to construct an additional DeferredDominance class to use functions only receiving DDT.
  2. We need to manually decide whether to erase a BasicBlock from the dominator tree when one is removed from the CFG.
  3. When using lazy updating methods, the BasicBlock waiting to delete will be deleted in an unforeseeable time after being removed from the Function so the user cannot do further actions on it.
  4. When we have both trees (DomTree and PostDomTree), we can try using the update information of one of them to prune updates of another to accelerate the updating process.

—After the patch—

  1. The abilities of the DeferredDominance are subsumed by the new updater class. And the API user can use an enum to specify which update strategy to use.
  2. The new updater class can update all the data structures it holds when the user calls the update function.
  3. The new updater class will first check whether these BasicBlocks still act as a node in the holding trees then call delete to prevent further manual checks.
  4. The updater class has a callbackDeleteBB function which accepts a callable to let users do additional operations on the BasicBlock before deletion.

Diff Detail

Repository
rL LLVM
There are a very large number of changes, so older changes are hidden. Show Older Changes

Thank you for working on this @NutshellySima , a unified dominator interface will be most welcome in LLVM. Please see my inline comments.

include/llvm/IR/DomTreeUpdater.h
70 ↗(On Diff #152324)

The comment here doesn't make grammatical sense. Did you mean "It is required for the state of the LLVM IR to be updated *before* applying updates to the DominatorTree. This routine inspects the current IR edge relationships to determine if an update needs to be queued."

82 ↗(On Diff #152324)

I find the wording of this comment to be a bit confusing. I understand it's difficult to describe all the state you're trying to convey but maybe it's better to rewrite a bit. Here's more along the lines of what I was thinking:

/// Notify all holding trees on an edge insertion.
/// The internal Strategy of the DTU instance alters how the insertion even is handled.
///The Eager Strategy applies the update immediately while the Lazy Strategy will
/// queue them until a flush() event. If the current IR does not contain the edge
/// inserted the update will be silently discarded unless the user sets
/// ForceRemoveInvalidUpdate to true. It is recommended to only use this method
/// when you have exactly one insertion (and no deleteions) needed to the DTU.
/// It is recommended to use applyUpdates() in all other cases.
87 ↗(On Diff #152324)

Comments similar to insertEdge()...

142 ↗(On Diff #152324)

I'd actually prefer if flushDomTree() and flushPostDomTree(), and flushAll() were private members. Users shouldn't have to think about flushing state. Instead, have all delections and flushes happen when they request access to a DT or PDT reference such as:

auto DTU = DomTreeUpdater(DT, PDT, Lazy);
...
DTU->applyUpdates({Foo, Bar, Insert});
DTU->applyUpdates({Bar, Baz, Delete});
...
auto DT = DTU->getDT(); // flush() and delete BBs happen here automatically.
182 ↗(On Diff #152324)

It'd be good to mention why this is necessary in the comment. When immediately deleting a BB, the call removeFromParent() removes all its instructions as well. However, in the deferred case the block isn't really deleted and is still a child of F. This causes problems for algorithms that do for (auto BB: F) and expect terminator instructions, instruction Uses, and REPL to all be consistent for every block inside a function. Without these fixups all sorts of routines will start asserting() complaining that the state of the IR is inconsistent.

116 ↗(On Diff #152110)

You might want to add a comment explaining why deleted BBs need to be processed first:

/// If there are BasicBlocks pending deletion they must be removed first.
/// This is necessary because the pending BBs still have their Parent set to F
/// which is traversed by the tree's recalculate() algorithm and will assert()
/// if it detects a block in F that is not dominated by entry.
119 ↗(On Diff #152110)

I strongly dislike the idea of recalculation of DT or PDT by the caller. If the user has instantiated a DTU with DT and PDT I'd expect the class to quietly operate (and update) both. I cannot think of a situation where the caller would want DT updated but PDT not updated.

170 ↗(On Diff #152110)

+1

kuhar added inline comments.Jun 22 2018, 12:10 AM
include/llvm/IR/DomTreeUpdater.h
164 ↗(On Diff #152324)

Extra line, please

170 ↗(On Diff #152324)

Extra line, please

177 ↗(On Diff #152324)

Do you expect to have 8 callbacks here in average cases? Once you switch all users of DDT, how many callbacks registered will there be?

205 ↗(On Diff #152324)

nit: s/Tree/Trees

lib/IR/DomTreeUpdater.cpp
28 ↗(On Diff #152324)

Extra line, please

31 ↗(On Diff #152324)

Extra line, please

36 ↗(On Diff #152324)

Why not llvm::find?

44 ↗(On Diff #152324)

Extra line, please

47 ↗(On Diff #152324)

Extra line, please

63 ↗(On Diff #152324)

E should be const

65 ↗(On Diff #152324)

Extra line, please

68 ↗(On Diff #152324)

Extra line, please

74 ↗(On Diff #152324)

Extra line, please

106 ↗(On Diff #152324)

An assert for begin + index < end would be welcome

116 ↗(On Diff #152324)

I'd move such one-liners to the header

120 ↗(On Diff #152324)

Extra line, please

175 ↗(On Diff #152324)

Extra line, please

202 ↗(On Diff #152324)

Extra line, please

272 ↗(On Diff #152324)

Extra line, please

282 ↗(On Diff #152324)

Extra line, please

310 ↗(On Diff #152324)

Consider using llvm::find

388 ↗(On Diff #152324)

Extra line, please

456 ↗(On Diff #152324)

Extra line, please

319 ↗(On Diff #152110)

I think you missed this.
Same applies to the ifs below

334 ↗(On Diff #152110)

Missed?

Hello. Glad to see this is moving forward! This looks like it's going to be very useful.

Couple of quick, high level questions, in terms of how we expect the interface to be used. If there is a function that currently looks like this:

bool MergeBlockIntoPredecessor(BasicBlock *BB, DominatorTree *DT = nullptr,
                               LoopInfo *LI = nullptr,
                               MemoryDependenceResults *MemDep = nullptr,
                               DeferredDominance *DDT = nullptr);

.. and we'd like to make it optionally preserve PDT's, do you think it's best to have both an (optional) DT and a DTU? Just the DTU and get DT through it? Would the DTU remain optional or become required? (though possibly empty)

What about something like

BasicBlock *SplitBlockPredecessors(BasicBlock *BB, ArrayRef<BasicBlock *> Preds,
                                   const char *Suffix,
                                   DominatorTree *DT = nullptr,
                                   LoopInfo *LI = nullptr,
                                   bool PreserveLCSSA = false);

Which I think calls DT->splitBlock() internally. Do you think updating this just to take a DTU instead of a DT, and updating all the call sites is the best way to go?

include/llvm/IR/DomTreeUpdater.h
138 ↗(On Diff #152324)

Nit: Flush

142 ↗(On Diff #152324)

What do you think of having DTU automatically flush remaining updates in it's deconstructor. So that things like these (to steal your examples) work:

void F(..)
{
  auto DTU = DomTreeUpdater(DT, PDT, Lazy);
  ...
  DTU->applyUpdates({Foo, Bar, Insert});
  DTU->applyUpdates({Bar, Baz, Delete});
}

or

{
  auto DTU = DomTreeUpdater(DT, PDT, Lazy);
  SomeFunctionThatUpdatesTrees(..., DTU, ...);
}
201 ↗(On Diff #152324)

Nit: Flush

lib/IR/DomTreeUpdater.cpp
64 ↗(On Diff #152324)

asserts tend to give a filename and linenumber by default, so maybe change this to something describing the error.

kuhar added a comment.Jun 22 2018, 9:03 AM

Couple of quick, high level questions, in terms of how we expect the interface to be used. If there is a function that currently looks like this:

bool MergeBlockIntoPredecessor(BasicBlock *BB, DominatorTree *DT = nullptr,
                               LoopInfo *LI = nullptr,
                               MemoryDependenceResults *MemDep = nullptr,
                               DeferredDominance *DDT = nullptr);

.. and we'd like to make it optionally preserve PDT's, do you think it's best to have both an (optional) DT and a DTU? Just the DTU and get DT through it? Would the DTU remain optional or become required? (though possibly empty)

The idea is to have it only take DTU (by reference (non-optional)), such that MergeBlockIntoPredecessor preserves whatever you pass it (possibly no tree).

What about something like

BasicBlock *SplitBlockPredecessors(BasicBlock *BB, ArrayRef<BasicBlock *> Preds,
                                   const char *Suffix,
                                   DominatorTree *DT = nullptr,
                                   LoopInfo *LI = nullptr,
                                   bool PreserveLCSSA = false);

Which I think calls DT->splitBlock() internally. Do you think updating this just to take a DTU instead of a DT, and updating all the call sites is the best way to go?

In general. the preferred way would be take DTU as an argument and try to use the incremental api.
There are some cases where it seems like you can do some smarter things with more low-level operations, e.g. changing the function entry block, splitting predecessor, but from what I know there are only a couple of them. In such cases we can expose them as dedicated functions in DTU.

NutshellySima marked 2 inline comments as done.Jun 23 2018, 8:06 AM
NutshellySima added inline comments.
lib/IR/DomTreeUpdater.cpp
319 ↗(On Diff #152110)

No. Phabricator does not track the correct line between current and the last patch.

NutshellySima marked an inline comment as done.
  1. Address comments.
  2. Remove recalculateDomTree() and recalculatePostDomTree().
  3. Remove the "removeInvalidUpdate" parameter of insertEdge() and deleteEdge() and update docs to reflect this change.
  4. Rename recalculateAll() to recalculate() (It can recalculate all holding trees).
  5. Rename flushDomTree()/flushPostDomTree() to getDomTree()/getPostDomTree() because the functions indeed return the updated trees.
NutshellySima marked 35 inline comments as done.Jun 23 2018, 10:33 AM
NutshellySima marked an inline comment as not done.Jun 23 2018, 10:48 AM
NutshellySima added inline comments.
include/llvm/IR/DomTreeUpdater.h
142 ↗(On Diff #152324)

I think flushAll() can't be private members:

auto DTU = DomTreeUpdater(DT, Lazy);
... // Make BB0 forward/backward-unreachable.
DTU->applyUpdates(SomeUpdates);
auto DT = DTU->getDT();
...
// Now delete BB0
DTU->deleteBB(BB0);
// Now there is no pendingDTUpdates and can only use flushAll() to get BB0 removed from its parent and deleted.
NutshellySima added inline comments.Jun 23 2018, 10:58 AM
include/llvm/IR/DomTreeUpdater.h
142 ↗(On Diff #152324)

More information on the above comments:
I remember some code will break if the internal logic of deleteBB calls removeFromParent() and deletes BB immediately when there isn't any pending DT/PDT updates.
So, I think it is needed to leave a flushAll() function to help user flush BB awaiting deletion.

Couple of quick, high level questions, in terms of how we expect the interface to be used. If there is a function that currently looks like this:

bool MergeBlockIntoPredecessor(BasicBlock *BB, DominatorTree *DT = nullptr,
                               LoopInfo *LI = nullptr,
                               MemoryDependenceResults *MemDep = nullptr,
                               DeferredDominance *DDT = nullptr);

.. and we'd like to make it optionally preserve PDT's, do you think it's best to have both an (optional) DT and a DTU? Just the DTU and get DT through it? Would the DTU remain optional or become required? (though possibly empty)

The idea is to have it only take DTU (by reference (non-optional)), such that MergeBlockIntoPredecessor preserves whatever you pass it (possibly no tree).

+1. I'd like to see most, if not all, of the updates to PDT/DT to be done through the DTU interface applyUpdate() call. It simplifies method signatures and provides a consistent interface to analysis and updates.

What about something like

BasicBlock *SplitBlockPredecessors(BasicBlock *BB, ArrayRef<BasicBlock *> Preds,
                                   const char *Suffix,
                                   DominatorTree *DT = nullptr,
                                   LoopInfo *LI = nullptr,
                                   bool PreserveLCSSA = false);

Which I think calls DT->splitBlock() internally. Do you think updating this just to take a DTU instead of a DT, and updating all the call sites is the best way to go?

In general. the preferred way would be take DTU as an argument and try to use the incremental api.
There are some cases where it seems like you can do some smarter things with more low-level operations, e.g. changing the function entry block, splitting predecessor, but from what I know there are only a couple of them. In such cases we can expose them as dedicated functions in DTU.

In this case someone will have to do the flush() call during the Lazy Strategy. Either the call of SplitBlockPredecessors() will have to do:

auto DT = DTU->getDT(); // flush, deleteBBs called
auto NewBB = SplitBlockPredecessors(BB, Preds, Suffix, DT);

or

auto NewBB = SplitBlockPredecessors(BB, Preds, Suffix, DTU);
...
// Inside SplitBlockPredecessors()
auto DT = DTU->getDT();
DT->splitblock();
...

The latter example has three benefits in my opinion:

  1. the details of DT/DTU are hidden from the caller
  2. SplitBlockPredecessors now preserves (and u[dates) PDT for free
  3. There might also be a way to update the DT/PDT without the flush() call which can be investigated without altering all the callers of SplitBlockPredecessors()
include/llvm/IR/DomTreeUpdater.h
142 ↗(On Diff #152324)

That would be a good idea to force flushes (and deleteBB) on deconstruction iff any are pending.

142 ↗(On Diff #152324)

You make a good point here, I think the big flush needs to be public. Can't it just be called flush() or sync() instead of flushAll()? I'm still hopeful the need for this call can be minimized by the DT/PDT getter method implicitly flushing and more call-sites only use DTU.

NutshellySima marked 4 inline comments as done.Jun 27 2018, 10:43 AM
NutshellySima added inline comments.Jun 27 2018, 11:37 AM
include/llvm/IR/DomTreeUpdater.h
182 ↗(On Diff #152324)

It seems that calling removeFromParent() does not remove all its instructions.

116 ↗(On Diff #152110)

It seems that DT.recalculate() won't assert on this case.

  1. Make DomTreeUpdater flush() on deconstruction.
  2. Add insertEdgeRelaxed() and deleteEdgeRelaxed() to cover the behavior of the original DeferredDominance's insertEdge()/deleteEdge().
  3. Modify unittests to test the 2 new functions.
NutshellySima marked 2 inline comments as done.Jun 27 2018, 11:46 AM

Thank you for continuing this work @NutshellySima .

  1. Make DomTreeUpdater flush() on deconstruction.

Excellent, this is a nice convenience.

  1. Add insertEdgeRelaxed() and deleteEdgeRelaxed() to cover the behavior of the original DeferredDominance's insertEdge()/deleteEdge().

Maybe it'd be better to have a default boolean third variable?
deleteEdge(BasicBlock *From, BasicBlock *To, bool Relaxed = false) { ...

include/llvm/IR/DomTreeUpdater.h
182 ↗(On Diff #152324)

Correction, I meant llvm::DeleteDeadBlock(). You'll see a similar loop to DDT->deleteBB() which does a RAUW to undef for all instructions and then calls removeFromParent(). The biggest change that deleteBB() does is adds a valid TI: necessary for llvm to keep the block as a child of F().

kuhar added a comment.Jun 27 2018, 2:40 PM
  1. Add insertEdgeRelaxed() and deleteEdgeRelaxed() to cover the behavior of the original DeferredDominance's insertEdge()/deleteEdge().

Maybe it'd be better to have a default boolean third variable?
deleteEdge(BasicBlock *From, BasicBlock *To, bool Relaxed = false) { ...

If we agree that it would be better to have well-behaved updates in most of the places in the future, then having distinct names for these two should make it much easier to grep for. Additionally, it seems like it's not always possible to perform them with the eager strategy, is it?

If we agree that it would be better to have well-behaved updates in most of the places in the future, then having distinct names for these two should make it much easier to grep for. Additionally, it seems like it's not always possible to perform them with the eager strategy, is it?

Good point, I didn't think of the calls being easier to grep. Sounds good to me. And yes, I would prefer all updates to be strict and done after CFG updates, but reality isn't always so forgiving. Having explicit *Relaxed() calls can make them easier to identify and potentially fix as we go.

NutshellySima marked 2 inline comments as done.Jun 28 2018, 7:08 AM
  1. Address comments - Modify document of validateDeleteBB(BasicBlock *DelBB).
  2. Fix some typos.
  3. Remove an unimplemented function.
NutshellySima marked 3 inline comments as done.Jun 28 2018, 8:17 AM
kuhar added a comment.Jun 28 2018, 8:55 AM

Looks almost ready to me. Found just a couple of nits.

include/llvm/IR/DomTreeUpdater.h
58 ↗(On Diff #153326)

I'm not sure that holding trees is grammatically correct in this place and all the other ones.
Maybe held trees or just available trees.
Perhaps @brzycki could help here.

130 ↗(On Diff #153326)

please remove be

157 ↗(On Diff #153326)

Can you add one extra sentence saying what it actually means to flush?

lib/IR/DomTreeUpdater.cpp
36 ↗(On Diff #152324)

Missed comment?

248 ↗(On Diff #153326)

Can a BB have 0 instructions?

@kuhar and @NutshellySima here are a few more small comments.

include/llvm/IR/DomTreeUpdater.h
58 ↗(On Diff #153326)

The phrase holding trees doesn't make sense in this context. I didn't mention it before because I'm not an expert on Dominators and wasn't sure if this was a technical phrase. :)

If you need to keep the phrase I'd recommend held trees but even that sounds a bit odd. Something along the lines of valid tree instances or trees contained inside. I tend to think of the DTU object instantiated with one or two tree objects.

In this particular case the english is odd when describing the false case. I'd prefer if the comment said:

/// Returns true if either of DT or PDT is valid and the tree has at
/// least one update pending. If DT or PDT is nullptr it is treated
/// as having no pending updates.

Something like that I think is a bit easier to understand.

60 ↗(On Diff #152110)

+1. I'd assume for Eager the routine always returns false since queuing isn't allowed. It'd be good to state that explicitly.

lib/IR/DomTreeUpdater.cpp
248 ↗(On Diff #153326)

Not for long. The only place I've seen BBs with 0 instructions is just before actual deletion. Too many routines expect, at the very least, a valid TI and will cause assert() casting nullptr to I or dereferencing a nullptr instruction instance. As much as anything's a rule in the middle end, I've found that to be pretty consistent.

  1. Address comments.
  2. Fix build with C++ module.
NutshellySima marked 8 inline comments as done.Jun 29 2018, 10:17 AM
kuhar added inline comments.Jun 29 2018, 10:20 AM
lib/IR/DomTreeUpdater.cpp
34 ↗(On Diff #153521)

Still missed? Can we use llvm::find here?

NutshellySima added inline comments.Jun 29 2018, 11:26 AM
lib/IR/DomTreeUpdater.cpp
34 ↗(On Diff #153521)

It was commented when I used std::any_of last time. I then switched to llvm::any_of. I believe llvm::any_of is clearer than llvm::find because after llvm::find returns an iterator, and I need to have it compared with the succ_end to find out the value of this boolean.

kuhar added inline comments.Jun 29 2018, 11:27 AM
lib/IR/DomTreeUpdater.cpp
34 ↗(On Diff #153521)

OK

NutshellySima marked 3 inline comments as done.Jun 29 2018, 11:29 AM

Fix issues when building LLVM with C++ modules (Thanks Richard Smith for the advice).

kuhar accepted this revision.Jul 1 2018, 6:21 PM

Looks good to me now. Thanks for making all the fixes!

This revision is now accepted and ready to land.Jul 1 2018, 6:21 PM

+1. Nice work guys.

brzycki accepted this revision.Jul 2 2018, 7:49 AM

+1 LGTM. Thank you @NutshellySima for your dedication to getting this patch ready for adoption!

This revision was automatically updated to reflect the committed changes.
NutshellySima reopened this revision.Jul 2 2018, 12:54 PM

Reopen for a stack-use-after-scope bug in DomTreeUpdater_LazyUpdateBasicOperations_Test.

This revision is now accepted and ready to land.Jul 2 2018, 12:54 PM

Use std::function instead of function_ref in callback utils.

This revision was automatically updated to reflect the committed changes.