This is an archive of the discontinued LLVM Phabricator instance.

[PM/Unswitch] Fix a bug in the domtree update logic for the new unswitch pass.
ClosedPublic

Authored by chandlerc on May 2 2017, 3:54 AM.

Details

Summary

The original logic only considered direct successors of the hoisted
domtree nodes, but that isn't really enough. If there are other basic
blocks that are completely within the subtree, their successors could
just as easily be impacted by the hoisting.

The more I think about it, the more I think the correct update here is
to hoist every block on the dominance frontier which has an idom in the
chain we hoist across. However, this is subtle enough that I'd
definitely appreciate some more eyes on it.

Sadly, if this is the correct algorithm, it requires computing a (highly
localized) dominance frontier. I've done this in the simplest (IE, least
code) way I could come up with, but that may be too naive. Suggestions
welcome here, dominance update algorithms are not an area I've studied
much, so I don't have strong opinions.

In good news, with this patch, turning on simple unswitch passes the LLVM test
suite for me with asserts enabled.

Event Timeline

chandlerc created this revision.May 2 2017, 3:54 AM
sanjoy requested changes to this revision.May 6 2017, 7:09 PM
sanjoy added inline comments.
lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
117

Minor stylistic thing: this lambda is large enough that I'd prefer pulling it out into a static helper.

I'd also call the parameter something more descriptive than Node.

127

This could be DomNodes.insert(DomNodes.end(), DomNodes[i]->begin(), DomNodes[i]->end());

135

I'd be tempted to sort and then binary search instead of building an intermediate set.

142

You're not really computing the dominance frontier here -- the condition for that would be !DomSet.count(SuccBB) || SuccBB == Node.

I think the algorithm is correct overall, but this difference from a textbook dominance frontier needs to be documented.

This revision now requires changes to proceed.May 6 2017, 7:09 PM
davide requested changes to this revision.May 8 2017, 11:47 AM

The algorithm you have, I think, it's correct to start (although as pointed out by Sanjoy doesn't really compute the dominance frontier).
Have you considered computing the iterated dominance frontier every time instead? If there are technical difficulties (or it's just slower) I'd add a comment to the code explaining why we can't use it.

lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
112

s/onte/on/ or am I wrong?

chandlerc marked 2 inline comments as done.May 11 2017, 7:51 PM

The algorithm you have, I think, it's correct to start (although as pointed out by Sanjoy doesn't really compute the dominance frontier).
Have you considered computing the iterated dominance frontier every time instead? If there are technical difficulties (or it's just slower) I'd add a comment to the code explaining why we can't use it.

Added some comments about why the existing IDF calculator seems like a bad fit for what we want here.

lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
117

I somewhat prefer it as a lambda because that lets the underlying (and persisting) data structures be very elegantly captured rather than having to clutter up the interface. But if you feel strongly, I'll hoist it out.

That said, a better parameter name is always welcome. Any suggestions though? I couldn't come up with anything...

135

I have a lame reason to not do that: it makes the order of the worklist non-determinist because i have to sort by pointer order. At the point where I have to keep two lists, it seems like a set is just simpler code-wise.

142

Added a comment. In fact, we could "append" itself without changing behavior because the worklist is deduplicated and this node came from the worklist. But it seems good to document that we're intentionally not including this aspect of the dominance frontier.

chandlerc updated this revision to Diff 98719.May 11 2017, 8:01 PM
chandlerc edited edge metadata.

Update based on review comments.

dberlin edited edge metadata.EditedMay 12 2017, 5:48 AM

I have an intern starting in two weeks to do incremental dominator algorithms :)

The algorithm you have, I think, it's correct to start (although as pointed out by Sanjoy doesn't really compute the dominance frontier).
Have you considered computing the iterated dominance frontier every time instead? If there are technical difficulties (or it's just slower) I'd add a comment to the code explaining why we can't use it.

Added some comments about why the existing IDF calculator seems like a bad fit for what we want here.

TBQH, both of the comment seem quite wrong.

  1. Yes it computes dom tree levels each time. This is trivially fixable, but historically has never been a time sink.

You say:

//  It includes extra complexity and logic to allow PHI placement which is
 //    somewhat fundamentally a harder problem than the one we're trying to
 //    solve here, so we can get away with a simpler approach. Specifically,
 //    we don't need to do any live-in pruning.

No it isn't and no it doesn't :)
At least, the issues you describe are ... not a good reason to avoid it.
The problem of proper phi placement is not "fundamentally harder ". it's quite literally identical. :)
Second, the live-in pruning is completely optional, and costs nothing when it is off.
As an example, MemorySSA does not use the live-in pruning.

If you do not set the live in blocks, it simply does not use them.
This is why the comment says "/// By default, liveness is not used to prune the IDF computation."

You also could use them in your case if you wanted a smaller and more optimal answer.

(The mechanism you are currently using will go badly N^2 on certain types of CFG's)

sanjoy added inline comments.May 12 2017, 11:49 AM
lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
117

SGTM to the first.

I can't immediately think of something better than Node either, SGTM to the second too.

135

I don't see why non-determinism in the worklist order matters here.

sanjoy requested changes to this revision.May 19 2017, 4:19 PM

Getting this out of my review queue (I'm still waiting on the "I don't see why non-determinism in the worklist order matters here." bit).

This revision now requires changes to proceed.May 19 2017, 4:19 PM
chandlerc updated this revision to Diff 99845.May 22 2017, 9:54 PM
chandlerc edited edge metadata.

Update the comments to be more accurate and re-arrange the FIXME.

The homegrown dominance frontier logic is not great, but I don't think it's a significant blocker and can be refactored in the future.
i.e. I'm much more interested in understanding whether this thing stick together (including the non trivial unswitch, once you get to it).
I'm happy with this going in when Danny/Sanjoy are.

The homegrown dominance frontier logic is not great, but I don't think it's a significant blocker and can be refactored in the future.
i.e. I'm much more interested in understanding whether this thing stick together (including the non trivial unswitch, once you get to it).
I'm happy with this going in when Danny/Sanjoy are.

To clarify, I'm not talking about the code per-se, which I think is good. I mean that we have two pieces of code which could be shared but currently are not.

I have an intern starting in two weeks to do incremental dominator algorithms :)

Very excited to move this code to something better. But I'd also like to make progress here.

The algorithm you have, I think, it's correct to start (although as pointed out by Sanjoy doesn't really compute the dominance frontier).
Have you considered computing the iterated dominance frontier every time instead? If there are technical difficulties (or it's just slower) I'd add a comment to the code explaining why we can't use it.

Added some comments about why the existing IDF calculator seems like a bad fit for what we want here.

TBQH, both of the comment seem quite wrong.

  1. Yes it computes dom tree levels each time. This is trivially fixable, but historically has never been a time sink.

You say:

//  It includes extra complexity and logic to allow PHI placement which is
 //    somewhat fundamentally a harder problem than the one we're trying to
 //    solve here, so we can get away with a simpler approach. Specifically,
 //    we don't need to do any live-in pruning.

No it isn't and no it doesn't :)
At least, the issues you describe are ... not a good reason to avoid it.
The problem of proper phi placement is not "fundamentally harder ". it's quite literally identical. :)
Second, the live-in pruning is completely optional, and costs nothing when it is off.
As an example, MemorySSA does not use the live-in pruning.

If you do not set the live in blocks, it simply does not use them.
This is why the comment says "/// By default, liveness is not used to prune the IDF computation."

You also could use them in your case if you wanted a smaller and more optimal answer.

Ok, I've updated the comments. I agree these are fixable issues, but they're still issues. I'm happy to refactor this to share code if that's useful and I've updated the comment to reflect that we should factor things in a way that makes it easy to share.

The issue I was *trying* to get at in the second point was that the API isn't terribly convenient for this use case (but looks very much like the API that I'd want for PHI placement). Not a comment on the actual logic, I completely agree that could be shared. I have a mild preference to get the other domtree updates I'm going to need to do in this code written to better understand the API before rewriting the API used to access the IDF calculator logic just to have a better idea. And it isn't a lot of code (30 lines including lots of comments). But if you're really worried about having the logic local, I can plumb through the IDF interface bits.

(The mechanism you are currently using will go badly N^2 on certain types of CFG's)

Is this different from IDF calculator? Is that difference the domtree levels? If it is some other difference, I don't really understand it, and would appreciate an explanation. It's possible I'm just missing it, but I'm not seeing how the two meaningfully differ.

If it is just the levels, I'm not sure how to address this without adding a different N^2 due to walking the entire domtree rather than just a localized region. At that point, i'd rather just recalculate at the end of unswitching.

-Chandler

chandlerc added inline comments.May 22 2017, 10:09 PM
lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
135

That changes the order of children in the domtree, which in turn can change the output of anything that walks the domtree.

Typically, the domtree's order is fully determined by the order of the IR in the function. I'm not worried about having any particular order, but I'm worried about non-deterministic order as I suspect other passes transform code in ways influenced by the domtree visit order.

Does this seem unreasonable?

chandlerc updated this revision to Diff 99846.May 22 2017, 10:17 PM

Fix comment even harder.

sanjoy added inline comments.May 22 2017, 11:34 PM
lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
135

I missed that this would change the order of children. Given that, what's here LGTM.

FYI, I'm still waiting to hear back from you Danny...

In my subsequent patch, I actually start recomputing the entire dominator tree as the updates become far too complex to make sense. However, if possible I'd like to keep the update logic here as the next patch goes and immediately *queries* the dominator tree to do non-trivial unswitching cost estimation.

But I may take a shot at removing that and moving to *just* recomputing the dominator tree until we have the incremental update facilities properly integrated.

dberlin added a comment.EditedMay 24 2017, 9:12 PM

Over the past week or so, I think we have done enough playing around with incremental dominators at this point (and i believe jakub has working prototype code) that i feel confident enough that we can make it work, and so i'm not worried about this patch really.
Maybe it's code we destroy in a few months, but it doesn't seem worth holding up progress over.
It obviously could turn out i'm completely wrong (IE as we bring a design to upstream and start on a non-prototype, something becomes intractable), but we can still deal with it then.

Over the past week or so, I think we have done enough playing around with incremental dominators at this point (and i believe jakub has working prototype code) that i feel confident enough that we can make it work, and so i'm not worried about this patch really.
Maybe it's code we destroy in a few months, but it doesn't seem worth holding up progress over.
It obviously could turn out i'm completely wrong (IE as we bring a design to upstream and start on a non-prototype, something becomes intractable), but we can still deal with it then.

Cool.

I'm gonna land this and clear the way for some follow-up patches. I'm somewhat hopeful that I can even kill this off before we have proper updates as we'll have to recalculate the whole function's tree when doing non-trivial unswitching. But even if not, I think this is an OK stop-gap and I've got the FIXME in here to get something saner.

This revision was automatically updated to reflect the committed changes.