Page MenuHomePhabricator

[CVP] Require DomTree for new Pass Manager
ClosedPublic

Authored by dmgreen on May 15 2018, 10:20 AM.

Details

Summary

We were previously using a DT in CVP through SimplifyQuery, but not requiring it in
the new pass manager. This now gets DT directly and plumbs it through to where it is
used (instead of using it through SQ)

Diff Detail

Event Timeline

dmgreen created this revision.May 15 2018, 10:20 AM
dmgreen added inline comments.
lib/Transforms/Scalar/CorrelatedValuePropagation.cpp
162

Required here :)

There must be a way to test this? Ie, how did you notice this bug?

Ah, test, yes. I found this with a downstream pass that doesn't preserve DT's like jump threading does. I'll see if I can come up with something.

chandlerc added inline comments.May 15 2018, 5:14 PM
lib/Transforms/Scalar/CorrelatedValuePropagation.cpp
762–765

I really don't like this use of getBestSimplifyQuery. I would strongly suggest passing the analyses you want to pass into SimplifyQuery. Not doing that makes the pass subtly dependent on pass ordering and other vagaries.

dmgreen updated this revision to Diff 147048.May 16 2018, 4:31 AM
dmgreen added a reviewer: dberlin.

I've done some plumbing to pass DT directly through to where it is used. The only other use of SQ is into SimplifyInstruction. I hope this makes the use of getBestSimplifyQuery OK. If not, which analyses should be used to create it? All of them (DT, DL, TLI and AC)? And why do we have getBestSimplifyQuery? ;)

I've also added new pass manager run lines to a couple of tests. Add.ll fails without requiring the DT.

dberlin added a comment.EditedMay 16 2018, 7:53 AM

So, this seems very confused.
getBestSimplifyQuery simply queries the analysis manager to see what passes are available.

This was used to replace something even *more* vague, where passes were randomly plumbing or not plumbing pieces they had to simplifyInstruction various places, and were doing it quite wrong.
It was generally considered a significant improvement over what existed before.
You can look at what it replaced and make that decision,.

All that said, it does not *require* anything.
Nor does SimplifyInstruction, which was deliberately built to not require a DomTree to function properly, and is used in places where it is not up to date or correct.
Hence, getBestSimplifyQuery uses what is available, as does the SimplifyQuery structure and SimplifyInstruction.

It is only meant to be used with SI, and using it elsewhere is a bad plan.

Otherwise, i disagree with Chandler strongly to go back to constructing SimplifyQuery's directly. It was wrong often enough that this was clearly not a good way of doing things, IMHO.

dmgreen edited the summary of this revision. (Show Details)May 16 2018, 9:16 AM

I may have sowed some confusion here by not updating commit messages as the code changed.

The code here in CVP currently:

  • use getBestSimplifyQuery to get the best SQ available (but not require a DT in the NPM)
  • use the SQ in SQ.DT->dominates (causing the problem) and SimplifyInstruction(.., SQ)

With this patch it:

  • gets DT directly and passes it through to where it's needed (i.e DT->dominates)
  • uses getBestSimplifyQuery only for SimplifyInstruction(.., SQ)
  • no longer uses SQ.DT directly

Everyone happy (enough) with this state of affairs?

dberlin accepted this revision.May 18 2018, 8:49 AM

I may have sowed some confusion here by not updating commit messages as the code changed.

The code here in CVP currently:

  • use getBestSimplifyQuery to get the best SQ available (but not require a DT in the NPM)
  • use the SQ in SQ.DT->dominates (causing the problem) and SimplifyInstruction(.., SQ)

Yeah, this is a bad idea :)

With this patch it:

  • gets DT directly and passes it through to where it's needed (i.e DT->dominates)
  • uses getBestSimplifyQuery only for SimplifyInstruction(.., SQ)
  • no longer uses SQ.DT directly

This is a good idea ;)

Everyone happy (enough) with this state of affairs?

I am for sure.

I'll mark this accepted, but please wait until monday to see if anyone has any more concerns.

This revision is now accepted and ready to land.May 18 2018, 8:49 AM

I'm happy as long as we have a test. :)

I haven't stepped through this to understand actually what happens here.

  1. In the add.ll file, the new PM crashes. Explain that with a comment in the test file and/or commit message?
  2. I don't see any difference in phi-common-val.ll - all of the auto-generated assertions pass even without this patch. Augment the run/assertions to show the failure (if it's visible there some other way)?

So, this seems very confused.
getBestSimplifyQuery simply queries the analysis manager to see what passes are available.

This was used to replace something even *more* vague, where passes were randomly plumbing or not plumbing pieces they had to simplifyInstruction various places, and were doing it quite wrong.
It was generally considered a significant improvement over what existed before.
You can look at what it replaced and make that decision,.

All that said, it does not *require* anything.
Nor does SimplifyInstruction, which was deliberately built to not require a DomTree to function properly, and is used in places where it is not up to date or correct.
Hence, getBestSimplifyQuery uses what is available, as does the SimplifyQuery structure and SimplifyInstruction.

It is only meant to be used with SI, and using it elsewhere is a bad plan.

Otherwise, i disagree with Chandler strongly to go back to constructing SimplifyQuery's directly. It was wrong often enough that this was clearly not a good way of doing things, IMHO.

I think we're miscommunicating about this.

I'm not necessarily suggesting constructing SimplifyQuery directly. I'm not trying to suggest any previous API is good.

I'm trying to point out that we're going to have a debugging and stability problem if we rely heavily on getting "available" passes in the new PM that is significantly different from the old PM. I'll try to explain the underlying problem here in a bit more detail so that we're on the same page.

In the old PM, getting the aviailable passes was a really nice approach. Because the schedule of pass runs was decided up-front, getting available analyses essentially said "whatever the pass pipeline looks like, use what is there". This is a great optimization and avoids lots of overly tight coupling between passes and their schedule. But the results were *extremely predictable* -- there would be exactly one set of analyses available for a pass in a given position in the pipeline. I see zero problems here, and the API we're discussing does indeed seem way, way better than manually building SimplifyQuery.

But in the new PM, we have a very different situation. Because passes are cached, the "available" thing is really subtle. It isn't static, it is dynamic. One module may have that analysis available, while a *very* subtly different module won't. You can even have action at a distance with this, where one function changes in the module, and on a completely unrelated function the analysis is no longer available. I see at least three issues with this:

  1. It will make it very hard to debug missed optimizations because it is much more complicated to reason about. For example, it will make bugpoint reduction of failures substantially harder as now you'll end up needing to preserve lots of very inexplicable things to get particular behaviors to occur
  2. If we ever want to add memory usage limiting functionality to the new PM (which everyone was expecting in the early days but hasn't been needed thus far), those memory saving clears of the cache *will change optimization power*.
  3. It may in rare cases hurt users where seemingly benign refactorings and changes cause the optimizer to improve and regress in ways they can't understand and don't expect.

Now, all I'm trying to say here is that getting available analyses in the new PM is substantially more risky than in the old PM. I'm not suggesting any particular API approach. Just saying that in the new PM, especially with analyses being somewhat cheaper due to caching, we should be skewing much, much further towards requiring analyses rather than getting them if available.

That still could mean the raw SimplifyQuery API is bad, and that we need a better API. Totally on board with that. I'm just hoping that, to the extent possible, we can design APIs that actually require the analyses in the new PM rather than using whatever happens to be sitting in the cache.

And that also doesn't mean this patch isn't an incremental improvement over the prior state necessarily. I'm perfectly happy if we need to fix this in a follow-up. I just want people to be aware of and understand the risk and issues here.

chandlerc added inline comments.May 18 2018, 10:22 AM
lib/Transforms/Scalar/CorrelatedValuePropagation.cpp
760

And this makes it somewhat obvious that this patch is an incremental improvement -- it removes one dynamic aspect of this. So, this patch definitely LGTM, but I think we also want to think about what a better API for building SimplifyQuery looks like in the new PM world where we have more of these issues around cached analyses being unpredictable.

dmgreen updated this revision to Diff 147751.May 21 2018, 4:07 AM

Thanks for the info. I see the problem. I will commit this, to make it so this isn't crashing, and we can go from there.

I've changed the test here to be a new test in phi-common-val.ll that shows the error. No longer using the fact that add.ll happened to crash.

dmgreen closed this revision.May 21 2018, 4:17 AM

rL332836

I also forgot to mention I took a quick look and it appears that getBestSimplifyQuery is only used in two places, here in CVP and LoopRotate.

As LoopRotate is a loop pass, it will always have access to all the analyses via LoopStandardAnalysisResults.

I'm pretty sure that TLI, and AC are all so cheap we could just always require them in getBestSimplifyQuery (at least without paying any cost in getBestSimplifyQuery, I guess they could still cause more cost in InstructionSimplify). That would still leave what to do about DT.