This is an archive of the discontinued LLVM Phabricator instance.

[JumpThreading] Preserve DT and LVI across the pass.
ClosedPublic

Authored by brzycki on Sep 6 2017, 1:03 PM.

Details

Summary

JumpThreading now preserves dominance and lazy value information across the entire pass. The pass manager is also informed of this preservation with the goal of DT and LVI being recalculated fewer times overall during compilation.

This change prepares JumpThreading for enhanced opportunities; particularly those across loop boundaries.

Patch by Brian Rzycki and Sebastian Pop.

Diff Detail

Event Timeline

brzycki created this revision.Sep 6 2017, 1:03 PM
davide added a subscriber: davide.Sep 6 2017, 2:08 PM
davide added inline comments.
llvm/lib/Transforms/Scalar/JumpThreading.cpp
108–109

Now that the domtree is preserved, you can probably also preserve LVI.
The reason why it was disabled is that LVI holds a pointer to DomTree, and when you free the dominator analysis result, you end up with garbage.

kuhar edited edge metadata.Sep 6 2017, 4:28 PM

This patch uses the "!find-push_back" pattern a lot:

DominatorTree::UpdateType UT = {DominatorTree::Delete, *PI, BB};
if (std::find(Updates.begin(), Updates.end(), UT) == Updates.end()) {
  Updates.push_back(UT);

which made me think that maybe it would be better to offer some better interface for applying updates that would automatically get rid of the duplicate ones? Or maybe it would make sense to just have a second overload that accepts set-like containers.

llvm/lib/Transforms/Scalar/JumpThreading.cpp
1959

If there are exactly 3 updates every time, maybe it'd make sense to use SmallVector<DT::UT, 3> or just an array here?

2524

(Same us above)
Maybe worth using SmallVector/array?

llvm/lib/Transforms/Utils/Local.cpp
183

Why not some standard algorithm? I.E. std::find or std::none_of

1821

applyUpdates should work with empty update set, but I guess it makes no real difference here.

This patch uses the "!find-push_back" pattern a lot:

DominatorTree::UpdateType UT = {DominatorTree::Delete, *PI, BB};
if (std::find(Updates.begin(), Updates.end(), UT) == Updates.end()) {
  Updates.push_back(UT);

which made me think that maybe it would be better to offer some better interface for applying updates that would automatically get rid of the duplicate ones? Or maybe it would make sense to just have a second overload that accepts set-like containers.

Hi Kuba, thanks for taking the time to review. Since this patch is probably the first major consumer of your ABI there might be opportunities to learn here. I'm a bit torn as to which way it should go.

In the current implementation you have strong assert() checks for the following:

  • deleting an edge that still exists in the CFG
  • attempting to delete or insert the same edge more than once
  • attempting to insert an edge to itself { Insert, BB, BB }

Those three reasons are primarily why you see the gating code so often repeated. I hit every one of those problems during make check debug sessions. I actually consider these asserts() to be a good thing since they didn't let errors quietly continue and complicate gdb debugging. The down side is it does clutter the call sites a bit with extra checks.

I'm just brainstorming here but maybe one option is to only assert when running a non-release build. The rest of the time those three cases are simply ignored and you do an early return. But this makes me a little uneasy as it allows for error propagation on new code only tested with Release builds...

Even with this added overhead I find the new interface much easier to use than the old one. When I started this patch a few months ago your code wasn't available and I had several custom functions to fix up the dominator tree. It was slow going and highly error-prone.

llvm/lib/Transforms/Scalar/JumpThreading.cpp
108–109

Hi David, I'm hopeful that this (and other benefits) can be obtained with dominance preserved across JumpThreading. This is an important optimization in LLVM and there are more opportunities I'd like to see it catch.

For now I'd like to keep this patch just focused on DT. Let's talk after this gets accepted to preserve LVI.

1959

I copied your example code in the new dominator tree unit tests. I'd be happy to change this to whichever you think makes more sense between SmallVector or an Array.

2524

Yup, I agree.

llvm/lib/Transforms/Utils/Local.cpp
183

I'm a better C programmer than a C++ one. :)

I'll try both STL calls to see which one feels more natural in context.

1821

I did this because Updates.empty() is a very fast call and wanted to avoid the potential of calling applyUpdates with no work to do. I can take it out if you think that's over-optimizing.

One other comment: some of the layout of this patch was designed based on feedback between us in email. I asked you if applyUpdates() was more expensive for a single Insertion/Deletion than a direct call. Were I to assume otherwise some of the calls and logic would be simpler. I'd always create an Updates array and just push_back() throughout the function. On exit I'd call applyUpdates() with an empty vector or not. If you think code clarity trumps the (small?) performance deltas I'd be open to revisiting some of the call sites.

kuhar added a comment.Sep 7 2017, 12:11 PM

Hi Kuba, thanks for taking the time to review. Since this patch is probably the first major consumer of your ABI there might be opportunities to learn here. I'm a bit torn as to which way it should go.

In the current implementation you have strong assert() checks for the following:

  • deleting an edge that still exists in the CFG
  • attempting to delete or insert the same edge more than once
  • attempting to insert an edge to itself { Insert, BB, BB }

Those three reasons are primarily why you see the gating code so often repeated. I hit every one of those problems during make check debug sessions. I actually consider these asserts() to be a good thing since they didn't let errors quietly continue and complicate gdb debugging. The down side is it does clutter the call sites a bit with extra checks.

I'm just brainstorming here but maybe one option is to only assert when running a non-release build. The rest of the time those three cases are simply ignored and you do an early return. But this makes me a little uneasy as it allows for error propagation on new code only tested with Release builds...

I don't think having different behavior in debug and release is a good idea. I'd rather see it as either a separate function or a helper utility that lifts these restrictions and automatically deduplicates it, but as you noticed, the current strict API makes it more difficult to make mistakes when collecting updates to apply. Perhaps something worth thinking about would be adding a helper function to JumpThreading itself, eg. addUpdateIfNew(CurrentUpdates, NewUpdate)? For now, it would be only used in JumpThreading anyway.

Even with this added overhead I find the new interface much easier to use than the old one. When I started this patch a few months ago your code wasn't available and I had several custom functions to fix up the dominator tree. It was slow going and highly error-prone.

Good to hear that!

llvm/lib/Transforms/Scalar/JumpThreading.cpp
1959

applyUpdates takes ArrayRef as a parameter, so all 3 of vector, SmallVector, and array should do. The thing with vector is that it will probably allocate, whereas the other two won't. I'd personally go with std::array or SmallVector here.

llvm/lib/Transforms/Utils/Local.cpp
1821

I think it's fine as-is.

kuhar added a comment.Sep 7 2017, 12:16 PM

@brzycki, one related thing: have you tried preserving postdominators as well? I'm not sure if that would be beneficial here, but in theory, it should be pretty simple, as you can pass exactly the same update sequences to the DominatorTree and PostDominatorTree.

brzycki updated this revision to Diff 114284.Sep 7 2017, 4:48 PM

Preserves LVI across JumpThreading.

Removed unnecessary std::vector calls.

Fixed a bug when normal and unwind destinations were the same for one case.

Fixed the JumpThreading declaration of preserved/requires to make sure the dominator tree happens first so that LVI will properly instantiate the private LVI->DT variable.

brzycki marked 10 inline comments as done.Sep 7 2017, 4:53 PM

@brzycki, one related thing: have you tried preserving postdominators as well? I'm not sure if that would be beneficial here, but in theory, it should be pretty simple, as you can pass exactly the same update sequences to the DominatorTree and PostDominatorTree.

I have not. I'll talk to @sebpop and see if he thinks this makes sense to include in this patch. Depending on our follow-on updates to JT we may need post dominators as well.

dberlin edited edge metadata.Sep 7 2017, 5:20 PM

@brzycki, one related thing: have you tried preserving postdominators as well? I'm not sure if that would be beneficial here, but in theory, it should be pretty simple, as you can pass exactly the same update sequences to the DominatorTree and PostDominatorTree.

I have not. I'll talk to @sebpop and see if he thinks this makes sense to include in this patch. Depending on our follow-on updates to JT we may need post dominators as well.

Even if you don't need it, it's trivial to preserve and probably worth doing.

Jakub, we should just create an updater object that updates whatever is around with the same edge sequence.

if you stare at getBestSimplifyQuery in lib/Analysis/SimplifyInstruction.cpp, you can see how to do that with both the old and new PM.

You can see how it is used in Transforms/Scalar/CorrelatedValuePropagation.cpp and Transforms/Scalar/LoopRotation.cpp for both the old and new PM.
An updater object that gets the postdom/dom tree if it's around, and handles the update edge sequence for both, would go a long way to getting postdom preserved everywhere.

kuhar accepted this revision.Sep 7 2017, 5:47 PM

LGTM after the fixes, but you may want to wait for a green light from someone else.

llvm/lib/Transforms/Utils/Local.cpp
672–673

nit: s/replaceEntryBB/ReplaceEntryBB

This revision is now accepted and ready to land.Sep 7 2017, 5:47 PM
sebpop edited edge metadata.Sep 8 2017, 6:53 AM

@brzycki, one related thing: have you tried preserving postdominators as well? I'm not sure if that would be beneficial here, but in theory, it should be pretty simple, as you can pass exactly the same update sequences to the DominatorTree and PostDominatorTree.

I have not. I'll talk to @sebpop and see if he thinks this makes sense to include in this patch. Depending on our follow-on updates to JT we may need post dominators as well.

Even if you don't need it, it's trivial to preserve and probably worth doing.

Agreed. Preserving post-dominators will also save some compilation time which is always good to get.

brzycki updated this revision to Diff 114382.Sep 8 2017, 9:56 AM

Use none_of instead of custom loop.

Fixed grammatical case issues.

brzycki marked 3 inline comments as done.Sep 8 2017, 9:59 AM

@brzycki, one related thing: have you tried preserving postdominators as well? I'm not sure if that would be beneficial here, but in theory, it should be pretty simple, as you can pass exactly the same update sequences to the DominatorTree and PostDominatorTree.

I have not. I'll talk to @sebpop and see if he thinks this makes sense to include in this patch. Depending on our follow-on updates to JT we may need post dominators as well.

Even if you don't need it, it's trivial to preserve and probably worth doing.

Agreed. Preserving post-dominators will also save some compilation time which is always good to get.

@sebpop and @dberlin I'll start looking at this. Depending on the complexity of the updates for PDT at the DT call-sites the work may warrant a separate patch on top of this one.

llvm/lib/Transforms/Scalar/JumpThreading.cpp
108–109

@davide The latest uploaded diff preserves LVI as well as DT.

brzycki updated this revision to Diff 116665.Sep 26 2017, 7:37 AM
brzycki marked an inline comment as done.

Rebase to tip as of 9/25. Eugene Zelenko's commit using Clang-tidy modernize-use-using required manual fixes.

Fixed a corner-case bug in MergeBasicBlockIntoOnlyPred() where DestBB could be requested to dominate itself, triggering an assert in the dominance insert code.

Rebase to tip as of 9/25. Eugene Zelenko's commit using Clang-tidy modernize-use-using required manual fixes.

Fixed a corner-case bug in MergeBasicBlockIntoOnlyPred() where DestBB could be requested to dominate itself, triggering an assert in the dominance insert code.

I also investigated preserving PDT across JumpThreading. It's turning out to be less trivial than originally anticipated. There are corner-cases of PDT->deleteEdge() that do not work automatically when basic blocks are about to be deleted from a function. I also have to update the PDT preservation algorithm for several external functions:

  • SplitBlockPredecessors()
  • SplitEdge()
  • SplitBlockAndInsertIfThen()
  • computeKnownBits()
  • (and possibly more that I missed)

For these reasons I'd prefer to preserve PDT in a separate patch on top of this one. If @kuhar and @dberlin are OK with this I'd like to move forward with merging this patch as-is.

kuhar accepted this revision.Sep 26 2017, 6:01 PM

LGTM

LGTM

Thank you @kuhar . @dberlin are you OK with the patch? If so I can ask one of my peers to push it into tip. Thank you both for making this patch better.

dberlin accepted this revision.Sep 28 2017, 8:10 AM
brzycki retitled this revision from [JumpThreading] Preserve dominance across the pass. to [JumpThreading] Preserve DT and LVI across the pass..Sep 28 2017, 9:11 AM
brzycki edited the summary of this revision. (Show Details)
This revision was automatically updated to reflect the committed changes.
brzycki reopened this revision.Jan 5 2018, 8:32 AM
brzycki added a subscriber: rnk.
This comment was removed by brzycki.
This revision is now accepted and ready to land.Jan 5 2018, 8:32 AM
brzycki closed this revision.Jan 5 2018, 8:33 AM

I accidentally re-opened the wrong diff.