This is an archive of the discontinued LLVM Phabricator instance.

[SimplifyCFG] If provided, preserve Dominator Tree
ClosedPublic

Authored by lebedev.ri on Jan 15 2021, 2:29 PM.

Details

Summary

SimplifyCFG is an utility pass, and the fact that it does not preserve DomTree's,
forces it's users to somehow workaround that, likely by not preserving DomTrees's themselves.

Indeed, simplifycfg pass didn't know how to preserve dominator tree,
it took me just under a month (starting with e1133179587dd895962a2fe4d6eb0cb1e63b5ee2)
do rectify that, now it fully knows how to, there's likely some problems with that still,
but i've dealt with everything i can spot so far.

I think we now can flip the switch.

Note that this is functionally an NFC change, since this doesn't change the users
to pass in the DomTree, that is a separate question.

Diff Detail

Event Timeline

lebedev.ri created this revision.Jan 15 2021, 2:29 PM
lebedev.ri requested review of this revision.Jan 15 2021, 2:29 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 15 2021, 2:29 PM
kuhar added a comment.Jan 15 2021, 2:58 PM

Wow, this is fantastic. When I first started working on the domtree updater back in 2017, SimplifyGFG seemed like one of the most difficult passes to handle, and I wasn't sure if we ever get there. Very impressive work, @lebedev.ri!

This is awesome. And scary. This is scarily awesome. :)

llvm/lib/Transforms/Scalar/SimplifyCFGPass.cpp
263 ↗(On Diff #317069)

Why remove that? It looks useful to have (maybe over-paranoid) checks that DT is correct at least for the first time. I'd rather keep them for a while unless there is a reason to drop them now.

I'm somewhat skeptical about this change. The motivation seems a bit weak given the costs involved. The costs are:

  • Compile-time cost. Your best case estimate puts this at a 0.5% compile-time regression. This is for the case where SimplifyCFG simplify preserves DT, without using it. Once the DT is used, the DTU may be flushed many times while SimplifyCFG iterates. This will drive the average-case cost up, and likely introduce pathological cases as well.
  • Code complexity: At least watching from afar, your path to preserving DT in SimplifyCFG involved quite a few subtle issues, where the first implementation of DT preservation for a given transformation later turned out not to be entirely correct. Given the large number of very different transforms that SimplifyCFG performs, this adds a code complexity and maintenance cost.

The (listed) benefits are:

  • Not processing unreachable blocks. I am doubtful that we'll want to use the DT for this purpose, because that would require flushing the DTU on each SimplifyCFG iteration. You'd probably be better off running removeUnreachableBlocks each time. Though it's not clear that this is even a problem that needs solving.
  • Using DT in FoldBranchToCommonDest. This seems to be the real motivation here. I will have to dive deeper into the code to understand what the problem here is and whether it justifies the costs.
xbolva00 added a subscriber: xbolva00.EditedJan 18 2021, 12:13 PM

Code complexity

Was there any RFC or general agreement?

After many patches for this goal the pass is now more complex and harder to read.. it would be sad to revert month of work..

Wow. So much polarization here.

Wow, this is fantastic. When I first started working on the domtree updater back in 2017, SimplifyGFG seemed like one of the most difficult passes to handle, and I wasn't sure if we ever get there. Very impressive work, @lebedev.ri!

This is awesome. And scary. This is scarily awesome. :)

Why thank you.
It ended up being rather boringly trivial after all to me.
The hard part was coming up with the gradual roll-out part (the flag)

As for the rest of comments, let me split this up into parts.

  1. Should SimplifyCFG know how to preserve {,Post}DominatorTree
  2. Should SimplifyCFG make use of DominatorTree
  3. Should SimplifyCFGPass require DominatorTree

As for 1., for the life of me i can not take that question seriously.
Let me point out that SimplifyCFG is an *Utility*, it can be called from other passes,
and as is painfully visible from the diff, both the DwarfEHPrepare
and AMDGPUUnifyDivergentExitNodes do that already.

By keeping SimplifyCFG illiterate about DomTrees, we force it's users
to also avoid DomTrees to some extent, be it either to just not preserve them,
or recompute them each time after calling SimplifyCFG.

Is that really what we want to be doing?
So if such a question is seriously being asked,
i'd personally suggest to take a step back and really think about it.

3 - ignoring the cost of domtree preservation, this is basically free.
As far as i can tell from the tests, it doesn't result in any new domtree creations,
we'd always already calculate domtree after the simplifycfg invocation,
now we'll just do that earlier.

2 - well, yeah, that is a question of the patch, isn't it? :)
Not sure why it's being met with such surprisingly-hostile comments.

I'm somewhat skeptical about this change. The motivation seems a bit weak given the costs involved. The costs are:

  • Compile-time cost. Your best case estimate puts this at a 0.5% compile-time regression. This is for the case where SimplifyCFG simplify preserves DT, without using it. Once the DT is used, the DTU may be flushed many times while SimplifyCFG iterates. This will drive the average-case cost up, and likely introduce pathological cases as well.

Gentle reminder that i have already previously suggested to codify that
LLVM now prioritizes compile-time over everything.
Or something to that regard, about compile time.
I don't think that has happened yet, so i'm going to largely blatantly ignore this point.

  • Code complexity: At least watching from afar, your path to preserving DT in SimplifyCFG involved quite a few subtle issues, where the first implementation of DT preservation for a given transformation later turned out not to be entirely correct. Given the large number of very different transforms that SimplifyCFG performs, this adds a code complexity and maintenance cost.
  1. See above.
  2. This is true for practically every change, to some extent, so i'm not sure what the point is here.
  3. "Never attribute to malice that which is adequately explained by stupidity", or in this case, it's not that "later turned out not to be entirely correct", this was a *very* intentional roll-out approach, to weed out the cases that lacked test coverage.

The (listed) benefits are:

  • Not processing unreachable blocks. I am doubtful that we'll want to use the DT for this purpose, because that would require flushing the DTU on each SimplifyCFG iteration. You'd probably be better off running removeUnreachableBlocks each time. Though it's not clear that this is even a problem that needs solving.
  • Using DT in FoldBranchToCommonDest. This seems to be the real motivation here. I will have to dive deeper into the code to understand what the problem here is and whether it justifies the costs.

In the mean time, now that i'm done with the groundwork, i'll fix that the currently-preferred way.

llvm/lib/Transforms/Scalar/SimplifyCFGPass.cpp
263 ↗(On Diff #317069)

Note that this checks that the DomTree *provided by the pass manager* is valid.
Do we actually want to check that here?

nikic added a comment.Jan 18 2021, 3:05 PM

Wow. So much polarization here.

Wow, this is fantastic. When I first started working on the domtree updater back in 2017, SimplifyGFG seemed like one of the most difficult passes to handle, and I wasn't sure if we ever get there. Very impressive work, @lebedev.ri!

This is awesome. And scary. This is scarily awesome. :)

Why thank you.
It ended up being rather boringly trivial after all to me.
The hard part was coming up with the gradual roll-out part (the flag)

As for the rest of comments, let me split this up into parts.

  1. Should SimplifyCFG know how to preserve {,Post}DominatorTree
  2. Should SimplifyCFG make use of DominatorTree
  3. Should SimplifyCFGPass require DominatorTree

As for 1., for the life of me i can not take that question seriously.
Let me point out that SimplifyCFG is an *Utility*, it can be called from other passes,
and as is painfully visible from the diff, both the DwarfEHPrepare
and AMDGPUUnifyDivergentExitNodes do that already.

By keeping SimplifyCFG illiterate about DomTrees, we force it's users
to also avoid DomTrees to some extent, be it either to just not preserve them,
or recompute them each time after calling SimplifyCFG.

Is that really what we want to be doing?
So if such a question is seriously being asked,
i'd personally suggest to take a step back and really think about it.

3 - ignoring the cost of domtree preservation, this is basically free.
As far as i can tell from the tests, it doesn't result in any new domtree creations,
we'd always already calculate domtree after the simplifycfg invocation,
now we'll just do that earlier.

2 - well, yeah, that is a question of the patch, isn't it? :)
Not sure why it's being met with such surprisingly-hostile comments.

I'm not sure why my comment was taken to be hostile. All I want to say is that there's a tradeoff here, and I haven't been convinced that it's a good one yet.

I think your point that SimplifyCFG is a utility and may be used in passes that want to preserve DT is a good one. I can see an argument for supporting DT-preservation in SimplifyCFG, while not requiring it in SimplifyCFGPass and not allowing use inside SimplifyCFG itself. Do I understand correctly that DwarfEHPrepare may currently use an outdated dominator tree because SimplifyCFG does not preserve it? If that is the case, then that should certainly be addressed in some way.

I'm somewhat skeptical about this change. The motivation seems a bit weak given the costs involved. The costs are:

  • Compile-time cost. Your best case estimate puts this at a 0.5% compile-time regression. This is for the case where SimplifyCFG simplify preserves DT, without using it. Once the DT is used, the DTU may be flushed many times while SimplifyCFG iterates. This will drive the average-case cost up, and likely introduce pathological cases as well.

Gentle reminder that i have already previously suggested to codify that
LLVM now prioritizes compile-time over everything.
Or something to that regard, about compile time.
I don't think that has happened yet, so i'm going to largely blatantly ignore this point.

LLVM does not prioritize compile-time over everything. Compile-time regressions are fine as long as they are justified.

I have already stated my concern above. Preserving the DT is one thing, and a 0.5% compile-time regression is not overly problematic. However, the intention behind this change is clearly to also make use of DT inside SimplifyCFG, and that's where things become very problematic. I did a quick try of fetching the DT (and thus requiring a DTU flush) on every SimplifyCFG iteration (which is what would happen if we implemented your idea regarding unreachable blocks), and here's the result I got: https://llvm-compile-time-tracker.com/compare.php?from=22b68440e1647e16b5ee24b924986207173c02d1&to=879bffddcb45b02f3c819e542a327a5539999ea2&stat=instructions Now we go from 0.5% regression to 2.5%. consumer-typeset sees a 5% regression. The regressions are larger on individual files, e.g. libclamav_pe.c sees as >30% regression. This isn't peanuts. And these are still "average case" regressions on relatively benign benchmarks. You can bet that if you do this, you're going to see pathologically large regressions on generated code.

I feel like this is a valid concern to have, and I don't like the way you're just brushing this off completely. While it seems like common sense to me, I will look into updating our developer policy to make explicit mention of compile-time regressions, next to performance and correctness regressions.

  • Code complexity: At least watching from afar, your path to preserving DT in SimplifyCFG involved quite a few subtle issues, where the first implementation of DT preservation for a given transformation later turned out not to be entirely correct. Given the large number of very different transforms that SimplifyCFG performs, this adds a code complexity and maintenance cost.
  1. See above.
  2. This is true for practically every change, to some extent, so i'm not sure what the point is here.
  3. "Never attribute to malice that which is adequately explained by stupidity", or in this case, it's not that "later turned out not to be entirely correct", this was a *very* intentional roll-out approach, to weed out the cases that lacked test coverage.

Oh, sorry, looks like I misunderstood what was going on there. It didn't cross my mind that you would intentionally write incorrect DTU code in order to fix it later.

mkazantsev added inline comments.Jan 18 2021, 9:11 PM
llvm/lib/Transforms/Scalar/SimplifyCFGPass.cpp
263 ↗(On Diff #317069)

This assert is meant to check that we actually pass it a DomTree immediately from the pass manager, and not some previously modified dom tree. It's up to you, but I'm a little bit paranoid when it comes to validation of analysis structures. It's never too late to remove redundant assertions.

lebedev.ri retitled this revision from [SimplifyCFG] Require and preserve dominator tree to [SimplifyCFG] If provided, preserve Dominator Tree.
lebedev.ri edited the summary of this revision. (Show Details)

Let me resell this. This patch is fully NFC,
it just signifies that SimplifyCFG is now DomTree-aware,
and preserves it if it is provided.

Here, i'm only looking for a congratulatory LGTM.
I'm sure some edgecases will pop up

nikic accepted this revision.Jan 27 2021, 11:21 AM

LGTM

This revision is now accepted and ready to land.Jan 27 2021, 11:21 AM

@kuhar @mkazantsev wanna get in on the hype train?

kuhar accepted this revision.Jan 27 2021, 12:13 PM

Choo choo

This revision was landed with ongoing or failed builds.Jan 28 2021, 3:12 AM
This revision was automatically updated to reflect the committed changes.