Download Raw Diff

Details

Reviewers

sanjoy
davide
• dberlin

Commits

rGdd2e275a47fc: [PM/Unswitch] Fix a bug in the domtree update logic for the new unswitch pass.
rL303843: [PM/Unswitch] Fix a bug in the domtree update logic for the new unswitch

Summary

The original logic only considered direct successors of the hoisted
domtree nodes, but that isn't really enough. If there are other basic
blocks that are completely within the subtree, their successors could
just as easily be impacted by the hoisting.

The more I think about it, the more I think the correct update here is
to hoist every block on the dominance frontier which has an idom in the
chain we hoist across. However, this is subtle enough that I'd
definitely appreciate some more eyes on it.

Sadly, if this is the correct algorithm, it requires computing a (highly
localized) dominance frontier. I've done this in the simplest (IE, least
code) way I could come up with, but that may be too naive. Suggestions
welcome here, dominance update algorithms are not an area I've studied
much, so I don't have strong opinions.

In good news, with this patch, turning on simple unswitch passes the LLVM test
suite for me with asserts enabled.

Diff Detail

Build Status

Buildable 6375
Build 6375: arc lint + arc unit

Event Timeline

chandlerc created this revision.May 2 2017, 3:54 AM

Herald added a subscriber: mcrosier. · View Herald TranscriptMay 2 2017, 3:54 AM

sanjoy requested changes to this revision.May 6 2017, 7:09 PM

sanjoy added inline comments.

lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
129	Minor stylistic thing: this lambda is large enough that I'd prefer pulling it out into a static helper. I'd also call the parameter something more descriptive than `Node`.
139	This could be `DomNodes.insert(DomNodes.end(), DomNodes[i]->begin(), DomNodes[i]->end());`
147	I'd be tempted to sort and then binary search instead of building an intermediate set.
154	You're not really computing the dominance frontier here -- the condition for that would be `!DomSet.count(SuccBB) \|\| SuccBB == Node`. I think the algorithm is correct overall, but this difference from a textbook dominance frontier needs to be documented.

This revision now requires changes to proceed.May 6 2017, 7:09 PM

The algorithm you have, I think, it's correct to start (although as pointed out by Sanjoy doesn't really compute the dominance frontier).
Have you considered computing the iterated dominance frontier every time instead? If there are technical difficulties (or it's just slower) I'd add a comment to the code explaining why we can't use it.

lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
112	s/onte/on/ or am I wrong?

In D32740#748894, @davide wrote:

The algorithm you have, I think, it's correct to start (although as pointed out by Sanjoy doesn't really compute the dominance frontier).
Have you considered computing the iterated dominance frontier every time instead? If there are technical difficulties (or it's just slower) I'd add a comment to the code explaining why we can't use it.

Added some comments about why the existing IDF calculator seems like a bad fit for what we want here.

lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
129	I somewhat prefer it as a lambda because that lets the underlying (and persisting) data structures be very elegantly captured rather than having to clutter up the interface. But if you feel strongly, I'll hoist it out. That said, a better parameter name is always welcome. Any suggestions though? I couldn't come up with anything...
147	I have a lame reason to not do that: it makes the order of the worklist non-determinist because i have to sort by pointer order. At the point where I have to keep two lists, it seems like a set is just simpler code-wise.
154	Added a comment. In fact, we could "append" itself without changing behavior because the worklist is deduplicated and this node came from the worklist. But it seems good to document that we're intentionally not including this aspect of the dominance frontier.

Update based on review comments.

I have an intern starting in two weeks to do incremental dominator algorithms :)

In D32740#752838, @chandlerc wrote:

In D32740#748894, @davide wrote:

The algorithm you have, I think, it's correct to start (although as pointed out by Sanjoy doesn't really compute the dominance frontier).
Have you considered computing the iterated dominance frontier every time instead? If there are technical difficulties (or it's just slower) I'd add a comment to the code explaining why we can't use it.

Added some comments about why the existing IDF calculator seems like a bad fit for what we want here.

TBQH, both of the comment seem quite wrong.

Yes it computes dom tree levels each time. This is trivially fixable, but historically has never been a time sink.

You say:

//  It includes extra complexity and logic to allow PHI placement which is
 //    somewhat fundamentally a harder problem than the one we're trying to
 //    solve here, so we can get away with a simpler approach. Specifically,
 //    we don't need to do any live-in pruning.

No it isn't and no it doesn't :)
At least, the issues you describe are ... not a good reason to avoid it.
The problem of proper phi placement is not "fundamentally harder ". it's quite literally identical. :)
Second, the live-in pruning is completely optional, and costs nothing when it is off.
As an example, MemorySSA does not use the live-in pruning.

If you do not set the live in blocks, it simply does not use them.
This is why the comment says "/// By default, liveness is not used to prune the IDF computation."

You also could use them in your case if you wanted a smaller and more optimal answer.

(The mechanism you are currently using will go badly N^2 on certain types of CFG's)

sanjoy added inline comments.May 12 2017, 11:49 AM

lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
129	SGTM to the first. I can't immediately think of something better than `Node` either, SGTM to the second too.
147	I don't see why non-determinism in the worklist order matters here.

Getting this out of my review queue (I'm still waiting on the "I don't see why non-determinism in the worklist order matters here." bit).

This revision now requires changes to proceed.May 19 2017, 4:19 PM

Update the comments to be more accurate and re-arrange the FIXME.

The homegrown dominance frontier logic is not great, but I don't think it's a significant blocker and can be refactored in the future.
i.e. I'm much more interested in understanding whether this thing stick together (including the non trivial unswitch, once you get to it).
I'm happy with this going in when Danny/Sanjoy are.

In D32740#761657, @davide wrote:

The homegrown dominance frontier logic is not great, but I don't think it's a significant blocker and can be refactored in the future.
i.e. I'm much more interested in understanding whether this thing stick together (including the non trivial unswitch, once you get to it).
I'm happy with this going in when Danny/Sanjoy are.

To clarify, I'm not talking about the code per-se, which I think is good. I mean that we have two pieces of code which could be shared but currently are not.

In D32740#753141, @dberlin wrote:

I have an intern starting in two weeks to do incremental dominator algorithms :)

Very excited to move this code to something better. But I'd also like to make progress here.

In D32740#752838, @chandlerc wrote:

In D32740#748894, @davide wrote:

The algorithm you have, I think, it's correct to start (although as pointed out by Sanjoy doesn't really compute the dominance frontier).
Have you considered computing the iterated dominance frontier every time instead? If there are technical difficulties (or it's just slower) I'd add a comment to the code explaining why we can't use it.

Added some comments about why the existing IDF calculator seems like a bad fit for what we want here.

TBQH, both of the comment seem quite wrong.

Yes it computes dom tree levels each time. This is trivially fixable, but historically has never been a time sink.

You say:
//  It includes extra complexity and logic to allow PHI placement which is
 //    somewhat fundamentally a harder problem than the one we're trying to
 //    solve here, so we can get away with a simpler approach. Specifically,
 //    we don't need to do any live-in pruning.
No it isn't and no it doesn't :)
At least, the issues you describe are ... not a good reason to avoid it.
The problem of proper phi placement is not "fundamentally harder ". it's quite literally identical. :)
Second, the live-in pruning is completely optional, and costs nothing when it is off.
As an example, MemorySSA does not use the live-in pruning.

If you do not set the live in blocks, it simply does not use them.
This is why the comment says "/// By default, liveness is not used to prune the IDF computation."

You also could use them in your case if you wanted a smaller and more optimal answer.

Ok, I've updated the comments. I agree these are fixable issues, but they're still issues. I'm happy to refactor this to share code if that's useful and I've updated the comment to reflect that we should factor things in a way that makes it easy to share.

The issue I was *trying* to get at in the second point was that the API isn't terribly convenient for this use case (but looks very much like the API that I'd want for PHI placement). Not a comment on the actual logic, I completely agree that could be shared. I have a mild preference to get the other domtree updates I'm going to need to do in this code written to better understand the API before rewriting the API used to access the IDF calculator logic just to have a better idea. And it isn't a lot of code (30 lines including lots of comments). But if you're really worried about having the logic local, I can plumb through the IDF interface bits.

(The mechanism you are currently using will go badly N^2 on certain types of CFG's)

Is this different from IDF calculator? Is that difference the domtree levels? If it is some other difference, I don't really understand it, and would appreciate an explanation. It's possible I'm just missing it, but I'm not seeing how the two meaningfully differ.

If it is just the levels, I'm not sure how to address this without adding a different N^2 due to walking the entire domtree rather than just a localized region. At that point, i'd rather just recalculate at the end of unswitching.

-Chandler

chandlerc added inline comments.May 22 2017, 10:09 PM

lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
147	That changes the order of children in the domtree, which in turn can change the output of anything that walks the domtree. Typically, the domtree's order is fully determined by the order of the IR in the function. I'm not worried about having any particular order, but I'm worried about non-deterministic order as I suspect other passes transform code in ways influenced by the domtree visit order. Does this seem unreasonable?

Fix comment even harder.

Harbormaster completed remote builds in B6668: Diff 99846.May 22 2017, 10:17 PM

sanjoy added inline comments.May 22 2017, 11:34 PM

lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
147	I missed that this would change the order of children. Given that, what's here LGTM.

FYI, I'm still waiting to hear back from you Danny...

In my subsequent patch, I actually start recomputing the entire dominator tree as the updates become far too complex to make sense. However, if possible I'd like to keep the update logic here as the next patch goes and immediately *queries* the dominator tree to do non-trivial unswitching cost estimation.

But I may take a shot at removing that and moving to *just* recomputing the dominator tree until we have the incremental update facilities properly integrated.

Over the past week or so, I think we have done enough playing around with incremental dominators at this point (and i believe jakub has working prototype code) that i feel confident enough that we can make it work, and so i'm not worried about this patch really.
Maybe it's code we destroy in a few months, but it doesn't seem worth holding up progress over.
It obviously could turn out i'm completely wrong (IE as we bring a design to upstream and start on a non-prototype, something becomes intractable), but we can still deal with it then.

In D32740#764113, @dberlin wrote:

Over the past week or so, I think we have done enough playing around with incremental dominators at this point (and i believe jakub has working prototype code) that i feel confident enough that we can make it work, and so i'm not worried about this patch really.
Maybe it's code we destroy in a few months, but it doesn't seem worth holding up progress over.
It obviously could turn out i'm completely wrong (IE as we bring a design to upstream and start on a non-prototype, something becomes intractable), but we can still deal with it then.

Cool.

I'm gonna land this and clear the way for some follow-up patches. I'm somewhat hopeful that I can even kill this off before we have proper updates as we'll have to recalculate the whole function's tree when doing non-trivial unswitching. But even if not, I think this is an OK stop-gap and I've got the FIXME in here to get something saner.

Closed by commit rL303843: [PM/Unswitch] Fix a bug in the domtree update logic for the new unswitch (authored by chandlerc). · Explain WhyMay 24 2017, 11:33 PM

This revision was automatically updated to reflect the committed changes.

Diff 98719

lib/Transforms/Scalar/SimpleLoopUnswitch.cpp

Show First 20 Lines • Show All 101 Lines • ▼ Show 20 Lines	for (auto *IDom = UnswitchedNode->getIDom(); IDom != OldPHNode;
IDom = IDom->getIDom())		IDom = IDom->getIDom())
DomChain.insert(IDom);		DomChain.insert(IDom);

// The unswitched block ends up immediately dominated by the old preheader --		// The unswitched block ends up immediately dominated by the old preheader --
// regardless of whether it is the loop exit block or split off of the loop		// regardless of whether it is the loop exit block or split off of the loop
// exit block.		// exit block.
DT.changeImmediateDominator(UnswitchedNode, OldPHNode);		DT.changeImmediateDominator(UnswitchedNode, OldPHNode);

// Blocks reachable from the unswitched block may need to change their IDom		// For everything that moves up the dominator tree, we need to examine the
// as well.		// dominator frontier to see if it additionally should move up the dominator
		// tree. This lambda appends the dominator frontier for a node on the
		davideUnsubmitted Done Reply Inline Actions s/onte/on/ or am I wrong? davide: s/onte/on/ or am I wrong?
		// worklist.
		//
		// FIXME: This uses a naive algorithm for computing the dominator frontier.
		//
		// Note that we don't want to use the IDFCalculator here for a number of
		// reasons:
		// 1) It computes dominator tree levels for the entire function every time
		// you run it, we're trying to avoid that so that we can repeatedly update
		// disjoint loops in the domtree without re-walking the entire thing.
		// 2) It includes extra complexity and logic to allow PHI placement which is
		// somewhat fundamentally a harder problem than the one we're trying to
		// solve here, so we can get away with a simpler approach. Specifically,
		// we don't need to do any live-in pruning.
SmallSetVector<BasicBlock *, 4> Worklist;		SmallSetVector<BasicBlock *, 4> Worklist;
for (auto *SuccBB : successors(UnswitchedBB))		SmallVector<DomTreeNode *, 4> DomNodes;
		SmallPtrSet<BasicBlock *, 4> DomSet;
		auto AppendDomFrontier = [&](DomTreeNode *Node) {
		sanjoyUnsubmitted Not Done Reply Inline Actions Minor stylistic thing: this lambda is large enough that I'd prefer pulling it out into a static helper. I'd also call the parameter something more descriptive than `Node`. sanjoy: Minor stylistic thing: this lambda is large enough that I'd prefer pulling it out into a static…
		chandlercAuthorUnsubmitted Not Done Reply Inline Actions I somewhat prefer it as a lambda because that lets the underlying (and persisting) data structures be very elegantly captured rather than having to clutter up the interface. But if you feel strongly, I'll hoist it out. That said, a better parameter name is always welcome. Any suggestions though? I couldn't come up with anything... chandlerc: I somewhat prefer it as a lambda because that lets the underlying (and persisting) data…
		sanjoyUnsubmitted Not Done Reply Inline Actions SGTM to the first. I can't immediately think of something better than `Node` either, SGTM to the second too. sanjoy: SGTM to the first. I can't immediately think of something better than `Node` either, SGTM to…
		assert(DomNodes.empty() && "Must start with no dominator nodes.");
		assert(DomSet.empty() && "Must start with an empty dominator set.");

		// First flatten this subtree into sequence of nodes by doing a pre-order
		// walk.
		DomNodes.push_back(Node);
		// We intentionally re-evaluate the size as each node can add new children.
		// Because this is a tree walk, this cannot add any duplicates.
		for (int i = 0; i < (int)DomNodes.size(); ++i)
		DomNodes.insert(DomNodes.end(), DomNodes[i]->begin(), DomNodes[i]->end());
		sanjoyUnsubmitted Done Reply Inline Actions This could be `DomNodes.insert(DomNodes.end(), DomNodes[i]->begin(), DomNodes[i]->end());` sanjoy: This could be `DomNodes.insert(DomNodes.end(), DomNodes[i]->begin(), DomNodes[i]->end());`

		// Now create a set of the basic blocks so we can quickly test for
		// dominated successors. We could in theory use the DFS numbers of the
		// dominator tree for this, but we want this to remain predictably fast
		// even while we mutate the dominator tree in ways that would invalidate
		// the DFS numbering.
		for (DomTreeNode *InnerN : DomNodes)
		DomSet.insert(InnerN->getBlock());
		sanjoyUnsubmitted Not Done Reply Inline Actions I'd be tempted to sort and then binary search instead of building an intermediate set. sanjoy: I'd be tempted to sort and then binary search instead of building an intermediate set.
		chandlercAuthorUnsubmitted Not Done Reply Inline Actions I have a lame reason to not do that: it makes the order of the worklist non-determinist because i have to sort by pointer order. At the point where I have to keep two lists, it seems like a set is just simpler code-wise. chandlerc: I have a lame reason to not do that: it makes the order of the worklist non-determinist because…
		sanjoyUnsubmitted Not Done Reply Inline Actions I don't see why non-determinism in the worklist order matters here. sanjoy: I don't see why non-determinism in the worklist order matters here.
		chandlercAuthorUnsubmitted Not Done Reply Inline Actions That changes the order of children in the domtree, which in turn can change the output of anything that walks the domtree. Typically, the domtree's order is fully determined by the order of the IR in the function. I'm not worried about having any particular order, but I'm worried about non-deterministic order as I suspect other passes transform code in ways influenced by the domtree visit order. Does this seem unreasonable? chandlerc: That changes the order of children in the domtree, which in turn can change the output of…
		sanjoyUnsubmitted Not Done Reply Inline Actions I missed that this would change the order of children. Given that, what's here LGTM. sanjoy: I missed that this would change the order of children. Given that, what's here LGTM.

		// Now re-walk the nodes, appending every successor of every node that isn't
		// in the set. Note that we don't append the node itself, even though if it
		// is a successor it does not strictly dominate itself and thus it would be
		// part of the dominance frontier. The reason we don't append it is that
		// the node passed in came from the worklist and so it has already been
		// processed.
		sanjoyUnsubmitted Not Done Reply Inline Actions You're not really computing the dominance frontier here -- the condition for that would be `!DomSet.count(SuccBB) \|\| SuccBB == Node`. I think the algorithm is correct overall, but this difference from a textbook dominance frontier needs to be documented. sanjoy: You're not really computing the dominance frontier here -- the condition for that would be `!
		chandlercAuthorUnsubmitted Not Done Reply Inline Actions Added a comment. In fact, we could "append" itself without changing behavior because the worklist is deduplicated and this node came from the worklist. But it seems good to document that we're intentionally not including this aspect of the dominance frontier. chandlerc: Added a comment. In fact, we could "append" itself without changing behavior because the…
		for (DomTreeNode *InnerN : DomNodes)
		for (BasicBlock *SuccBB : successors(InnerN->getBlock()))
		if (!DomSet.count(SuccBB))
Worklist.insert(SuccBB);		Worklist.insert(SuccBB);

		DomNodes.clear();
		DomSet.clear();
		};

		// Append the initial dom frontier nodes.
		AppendDomFrontier(UnswitchedNode);

// Walk the worklist. We grow the list in the loop and so must recompute size.		// Walk the worklist. We grow the list in the loop and so must recompute size.
for (int i = 0; i < (int)Worklist.size(); ++i) {		for (int i = 0; i < (int)Worklist.size(); ++i) {
auto *BB = Worklist[i];		auto *BB = Worklist[i];

DomTreeNode *Node = DT[BB];		DomTreeNode *Node = DT[BB];
assert(!DomChain.count(Node) &&		assert(!DomChain.count(Node) &&
"Cannot be dominated by a block you can reach!");		"Cannot be dominated by a block you can reach!");
// If this block doesn't have an immediate dominator somewhere in the chain
// we hoisted over, then its position in the domtree hasn't changed. Either		// If this block had an immediate dominator somewhere in the chain
// it is above the region hoisted and still valid, or it is below the		// we hoisted over, then its position in the domtree needs to move as it is
// hoisted block and so was trivially updated. This also applies to		// reachable from a node hoisted over this chain.
// everything reachable from this block so we're completely done with the
// it.
if (!DomChain.count(Node->getIDom()))		if (!DomChain.count(Node->getIDom()))
continue;		continue;

// We need to change the IDom for this node but also walk its successors
// which could have similar dominance position.
DT.changeImmediateDominator(Node, OldPHNode);		DT.changeImmediateDominator(Node, OldPHNode);
for (auto *SuccBB : successors(BB))
Worklist.insert(SuccBB);		// Now add this node's dominator frontier to the worklist as well.
		AppendDomFrontier(Node);
}		}
}		}

/// Check that all the LCSSA PHI nodes in the loop exit block have trivial		/// Check that all the LCSSA PHI nodes in the loop exit block have trivial
/// incoming values along this edge.		/// incoming values along this edge.
static bool areLoopExitPHIsLoopInvariant(Loop &L, BasicBlock &ExitingBB,		static bool areLoopExitPHIsLoopInvariant(Loop &L, BasicBlock &ExitingBB,
BasicBlock &ExitBB) {		BasicBlock &ExitBB) {
for (Instruction &I : ExitBB) {		for (Instruction &I : ExitBB) {
▲ Show 20 Lines • Show All 597 Lines • Show Last 20 Lines

test/Transforms/SimpleLoopUnswitch/trivial-unswitch.ll

Show First 20 Lines • Show All 376 Lines • ▼ Show 20 Lines	loop_exit2:
%result2 = add i32 %result2.1, %result2.2		%result2 = add i32 %result2.1, %result2.2
ret i32 %result2		ret i32 %result2
; CHECK: loop_exit2:		; CHECK: loop_exit2:
; CHECK-NEXT: %[[R1:.*]] = phi i32 [ %x, %entry ]		; CHECK-NEXT: %[[R1:.*]] = phi i32 [ %x, %entry ]
; CHECK-NEXT: %[[R2:.*]] = phi i32 [ %y, %entry ]		; CHECK-NEXT: %[[R2:.*]] = phi i32 [ %y, %entry ]
; CHECK-NEXT: %[[R:.*]] = add i32 %[[R1]], %[[R2]]		; CHECK-NEXT: %[[R:.*]] = add i32 %[[R1]], %[[R2]]
; CHECK-NEXT: ret i32 %[[R]]		; CHECK-NEXT: ret i32 %[[R]]
}		}

		; This test, extracted from the LLVM test suite, has an interesting dominator
		; tree to update as there are edges to sibling domtree nodes within child
		; domtree nodes of the unswitched node.
		define void @xgets(i1 %cond1, i1* %cond2.ptr) {
		; CHECK-LABEL: @xgets(
		entry:
		br label %for.cond.preheader
		; CHECK: entry:
		; CHECK-NEXT: br label %for.cond.preheader

		for.cond.preheader:
		br label %for.cond
		; CHECK: for.cond.preheader:
		; CHECK-NEXT: br i1 %cond1, label %for.cond.preheader.split, label %if.end17.thread.loopexit
		;
		; CHECK: for.cond.preheader.split:
		; CHECK-NEXT: br label %for.cond

		for.cond:
		br i1 %cond1, label %land.lhs.true, label %if.end17.thread.loopexit
		; CHECK: for.cond:
		; CHECK-NEXT: br label %land.lhs.true

		land.lhs.true:
		br label %if.then20
		; CHECK: land.lhs.true:
		; CHECK-NEXT: br label %if.then20

		if.then20:
		%cond2 = load volatile i1, i1* %cond2.ptr
		br i1 %cond2, label %if.then23, label %if.else
		; CHECK: if.then20:
		; CHECK-NEXT: %[[COND2:.]] = load volatile i1, i1 %cond2.ptr
		; CHECK-NEXT: br i1 %[[COND2]], label %if.then23, label %if.else

		if.else:
		br label %for.cond
		; CHECK: if.else:
		; CHECK-NEXT: br label %for.cond

		if.end17.thread.loopexit:
		br label %if.end17.thread
		; CHECK: if.end17.thread.loopexit:
		; CHECK-NEXT: br label %if.end17.thread

		if.end17.thread:
		br label %cleanup
		; CHECK: if.end17.thread:
		; CHECK-NEXT: br label %cleanup

		if.then23:
		br label %cleanup
		; CHECK: if.then23:
		; CHECK-NEXT: br label %cleanup

		cleanup:
		ret void
		; CHECK: cleanup:
		; CHECK-NEXT: ret void
		}

This is an archive of the discontinued LLVM Phabricator instance.

[PM/Unswitch] Fix a bug in the domtree update logic for the new unswitch pass.
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 98719

lib/Transforms/Scalar/SimpleLoopUnswitch.cpp

test/Transforms/SimpleLoopUnswitch/trivial-unswitch.ll

This is an archive of the discontinued LLVM Phabricator instance.

[PM/Unswitch] Fix a bug in the domtree update logic for the new unswitch pass.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 98719

lib/Transforms/Scalar/SimpleLoopUnswitch.cpp

test/Transforms/SimpleLoopUnswitch/trivial-unswitch.ll

[PM/Unswitch] Fix a bug in the domtree update logic for the new unswitch pass.
ClosedPublic