Page MenuHomePhabricator

[MachineBlockPlacement] Fix UnscheduledPredecessors counter to reflect tail duplication.
Needs ReviewPublic

Authored by huihuiz on Nov 8 2019, 3:13 PM.

Details

Summary

When filling worklist, UnscheduledPredecessors counter is increased whenever a MBB
within a BlockChain has any predecessor with different BlockChain. This counter is
decreased during buildChain (scheduled), by walking through every successor of MBB
within the BlockChain, decrease when successor BlockChain is different from current
BlockChain. Assuming BlockFilter condition is satisfied.

However, when a MBB is tail duplicated into it's predecessor, a previously increased
UnscheduledPredecessors counter may not be decreased. The BlockFilter condition may no
longer satisfied. Consider nested loop for example, the new predecessor and successor
are in different LoopBlockSet collected for outer and inner loop. Therefore, causing
an unreleased UnscheduledPredecessors counter for inner loop, causing assertion on
"Attempting to place block with unscheduled predecessors in worklist." when trying to
fill worklist later on.

This patch fixes pr44122.

Diff Detail

Event Timeline

huihuiz created this revision.Nov 8 2019, 3:13 PM
huihuiz added a comment.EditedNov 8 2019, 3:16 PM

This problem caused assertion on: void (anonymous namespace)::MachineBlockPlacement::fillWorkLists(const llvm::MachineBasicBlock *, SmallPtrSetImpl<(anonymous namespace)::BlockChain *> &, const (anonymous namespace)::MachineBlockPlacement::BlockFilterSet *, bool): Assertion `Chain.UnscheduledPredecessors == 0 && "Attempting to place block with unscheduled predecessors in worklist."' failed.

My apologies for not able to get a test case here. I try really hard to reduce a test case, but failed.

Carrot added a subscriber: xur.Nov 8 2019, 3:54 PM
huihuiz added a comment.EditedNov 8 2019, 3:59 PM

Adding some more detailed explanation:

When tail duplicator duplicate BB into its predecessor, for cases tail duplicator decides to delete BB:

PredBB [PredChain]                PredBB [PredChain]
     \                             /  \
      BB [BBChain]         =>   Succ0  Succ1
     /   \
Succ0 Succ1

For cases PredChain is different from BBChain, the UnscheduledPredecessors will be set to 1 for BBChain during fillWorkLists.
Assuming Succ1's BlockChain is equal to BBChain.
The previously set UnscheduledPredecessors for BBChain will not be decreased. Because Succ1 may not be included in the BlockFilter associated with the Loop of PredBB.
Since LoopBlockSet is collected at the very beginning of buildLoopChains, and tail duplication will not update the BlockFilter(LoopBlockSet).

Carrot added a comment.Nov 8 2019, 4:11 PM

More interesting problem is why fillWorkLists is called with a removed BB? From the code around of callers of fillWorkLists, we can see the removed BB is still in MachineFunction or MachineLoop data structure, we should also delete removed BB from there.

llvm/lib/CodeGen/MachineBlockPlacement.cpp
2957

Since BB is completely duplicated and removed, why not simply delete it from BlockToChain?

huihuiz added a comment.EditedNov 8 2019, 4:15 PM

Actually fillWorkLists is not called for the deleted BB, but the MBBs of BBChain. The BB is deleted, and removed from BBChain. Also removed from MachineLoopInfo.
Here BBChain will have an unreleased UnscheduledPredecessors counter, when calling fillWorkLists for any MBBs within BBChain, this assertion will happen.

Carrot added a comment.Nov 8 2019, 4:15 PM

The explanation looks helpful for understanding the problem.
thanks

huihuiz marked an inline comment as done.Nov 8 2019, 4:24 PM
huihuiz added inline comments.
llvm/lib/CodeGen/MachineBlockPlacement.cpp
2957

The deleted BB is removed from its associated BlockChain, and removed from MachineBlockInfo.

huihuiz marked an inline comment as done.Nov 11 2019, 3:24 PM

Just realize I forget to mention how the BB is removed.

llvm/lib/CodeGen/MachineBlockPlacement.cpp
2907

This is the logic where BB is removed from the Chain and Chain Map. It's called in TailDuplicator::tailDuplicateAndUpdate(), when BB is dead and tailduplicator decides to removed it (removeDeadBlock).

Guys, any thoughts on this?

The original code doesn't correctly handle the case that the duplicated BB comes from a chain contains multiple BBs. So your patch looks reasonable to me. Could you enhance your comment to emphasize this case. It's better to make the comments of following loop more clearer. The current comments looks similar and confusing.

Considering duplicate a BB from a chain with multiple BBs, there is another bug in lambda RemovalCallback. If the duplicated BB is removed from a worklist, the following BB in same chain should be added to the worklist.

huihuiz updated this revision to Diff 229401.Nov 14 2019, 2:03 PM
huihuiz edited the summary of this revision. (Show Details)

Addressed reviewer feedback -- add more explanation in comments.

Carrot added inline comments.Nov 15 2019, 8:47 AM
llvm/lib/CodeGen/MachineBlockPlacement.cpp
2969

With current layout algorithm it is uncommon to build BBChain before placing PredBB, the most possible situation is BB is a loop header, and PredBB is not in the same loop. Could you write a test case with this hint?

2970

Think about it more and I found it is more complex than the code snippet.

  1. Even BB is not completely removed, the UnscheduledPredecessors should also be decremented by the number of actual duplications. The code should be similar to the following loop.
  1. In your sample code, there was a UnscheduledPredecessors for the edge PredBB -> BB, even BB is duplicated into PredBB, there is still an edge (PredBB + BB) -> Succ1, so there is another UnscheduledPredecessors, it doesn't impact the total number of UnscheduledPredecessors for BBChain. Only when BB doesn't fall through/jump to any other block in BBChain, then UnscheduledPredecessors can be decremented for the duplication. Could you double check if your failure is in this case?
huihuiz updated this revision to Diff 230705.Nov 22 2019, 12:11 PM
huihuiz edited the summary of this revision. (Show Details)
huihuiz added reviewers: bcahoon, kparzysz.

Add reduced test case

huihuiz marked 2 inline comments as done.Nov 22 2019, 12:19 PM
huihuiz added inline comments.
llvm/lib/CodeGen/MachineBlockPlacement.cpp
2969

That's actually not right.

Consider, BBChain is the inner loop, PredBB is in the outer loop.
BBChain for inner loop is build before outer loop (PredChain)

2970

Please refer to my test case.

The updated CFG PredBB -> Succ1 cat not make sure UnscheduledPredecessors is released through markBlockSuccessors.
As PredBB is in the outer loop's LoopBlockSet, and Succ1 is in the inner loop's LoopBlockSet.

Ping.

Any objections with this approach?

The change looks good to me. Thanks for adding the test!

llvm/test/CodeGen/ARM/block-placement-tail-duplication.mir
5 ↗(On Diff #230705)

You might want to add REQUIRES: asserts. Or maybe explain what the CHECK-NOT: bb.14.bb58 means

huihuiz updated this revision to Diff 231770.Dec 2 2019, 1:20 PM

Updated test checking.

Removed CHECK lines, as we are checking this test no longer cause assertion crash.

I would like to merge this patch, please let me know if there are any objections?

Carrot added inline comments.Dec 2 2019, 8:40 PM
llvm/lib/CodeGen/MachineBlockPlacement.cpp
2932

Another related problem.
After RemBB is removed from WorkList, the following BB in the same chain should be added to WorkList.

2969

I can't figure out what's the difference between the two comments.

2970

When work on outer loop, the BBs in inner loop are also included in LoopBlockSet. When BB is duplicated into PredBB, the edge (PredBB + BB) -> Succ1 must be added to SuccChain.UnscheduledPredecessors. Fortunately the loop at line 2982 already handled this case. So we don't need to worry about it.

But you still need to decrement BBChain.UnscheduledPredecessors for actual duplication, even if BB is not completely duplicated and removed.

huihuiz marked 2 inline comments as done.Dec 2 2019, 9:02 PM
huihuiz added inline comments.
llvm/lib/CodeGen/MachineBlockPlacement.cpp
2932

There is no need for special handling, when done building Loop Chain, all MBB within the function will be checked again to see if there's need to be added to WorkList. Refer to Line:2604

2970

When work on outer loop, the BBs in inner loop are NOT included in LoopBlockSet for the outer loop.

When BB is duplicated into PredBB, the edge (PredBB + BB) -> Succ1 must be added to SuccChain.UnscheduledPredecessors. Fortunately the loop at line 2982 already handled this case. So we don't need to worry about it.
-> also not right.
-> The PredBB is not in the LoopBlockSet for SuccChain, the SuccChain.UnscheduledPredecessors has been increased in respect to PredBB earlier in fillWorkList

But you still need to decrement BBChain.UnscheduledPredecessors for actual duplication, even if BB is not completely duplicated and removed.
-> if BB is not completely removed, then should leave to markBlockSuccessors to decrease the UnscheduledPredecessors counter

lkail added a subscriber: lkail.Dec 2 2019, 9:05 PM
Carrot added inline comments.Dec 3 2019, 8:51 AM
llvm/lib/CodeGen/MachineBlockPlacement.cpp
2932

Please keep in mind this file tries to layout code optimally.

2970

Please read the code in function collectLoopBlockSet. Also it is intentionally to leave cold BBs from inner loop to be processed with outer loop to get better I$ behavior.

For the handling of UnscheduledPredecessors. It records the number of cross chain edges.
In normal situation, it is constructed in fillWorkList, and decreased when predecessor is placed.
But when tail duplication occurred, the number of cross chain edges changed, so UnscheduledPredecessors must be adjusted accordingly. There are two possibilities:

  • reduced cross chain edges, this is the root cause of your problem, but a completely duplicated and removed BB is really a special case.
  • increased cross chain edges, it is handled by loop at line 2982, please read it carefully, it is a good template for your patch.

Hey @Carrot , really appreciated your time on it.

For the test case provided, I am pretty sure inner loop and outer loop are in different Loop Chain.

From your comments, I still don't understand are there any problems/concerns with the fix proposed?
Are you suggesting merging the proposed fix into loop Line:2982? But that's not very clear, I would prefer to have a separate logic.

Carrot added a comment.Dec 4 2019, 9:40 AM

Hey @Carrot , really appreciated your time on it.

For the test case provided, I am pretty sure inner loop and outer loop are in different Loop Chain.

This is correct. I already mentioned it in Nov 15. Thanks for the test case.

From your comments, I still don't understand are there any problems/concerns with the fix proposed?
Are you suggesting merging the proposed fix into loop Line:2982? But that's not very clear, I would prefer to have a separate logic.

The main problem is the fix should be applied to all duplication even if BB is not removed.

huihuiz updated this revision to Diff 232440.EditedDec 5 2019, 1:17 PM
huihuiz retitled this revision from [MachineBlockPlacement] Update UnscheduledPredecessors when tail duplicate delete a block. to [MachineBlockPlacement] Fix UnscheduledPredecessors counter to reflect tail duplication..
huihuiz edited the summary of this revision. (Show Details)

"The main problem is the fix should be applied to all duplication even if BB is not removed."
-> Make sense, thank you for pointing this out.

Please let me know if there are any different opinions?

Hey @Carrot ,

Please let me know if you have any further concerns?

Carrot added inline comments.Dec 9 2019, 4:04 PM
llvm/lib/CodeGen/MachineBlockPlacement.cpp
2976

Here if BBChain.UnscheduledPredecessors == 0, then there is something wrong.

2979

This condition also applies to your code, so the original if statement should not be changed.

huihuiz updated this revision to Diff 232957.Dec 9 2019, 4:18 PM
huihuiz marked 2 inline comments as done.
huihuiz added inline comments.
llvm/lib/CodeGen/MachineBlockPlacement.cpp
2976

already guarded under "BBChain.UnscheduledPredecessors != 0"

2979

Yes, I was also debating about this.

Carrot added inline comments.Dec 10 2019, 8:37 AM
llvm/lib/CodeGen/MachineBlockPlacement.cpp
2976

You should not hide this kind of compiler internal error. Even do nothing is better than hide it. Of course a better choice is assert it.

3039

Replace BlockToChain[Pred] with PredChain.

huihuiz marked an inline comment as done.Dec 10 2019, 9:54 AM
huihuiz added inline comments.
llvm/lib/CodeGen/MachineBlockPlacement.cpp
2976

I don't think so. Assert here may not be good idea.

Refer to line:620, this also make sure not decrementing on an already zero UnscheduledPredecessors counter.

// This is a cross-chain edge that is within the loop, so decrement the
 // loop predecessor count of the destination chain.
 if (SuccChain.UnscheduledPredecessors == 0 ||
     --SuccChain.UnscheduledPredecessors > 0)
   continue;

Please let me know if you can agree with this? I am trying not to waste each other's time on endless arguing.

Carrot added inline comments.Dec 11 2019, 8:46 AM
llvm/lib/CodeGen/MachineBlockPlacement.cpp
2976

Nobody wants to argue with you for no reason! If you want your patch to be accepted quickly, please try to understand the algorithm carefully.

This code looks also want to hide something wrong, so not a good practice.

I may change my idea if you can find some reasonable situation that SuccChain.UnscheduledPredecessors <= 0 when there are still cross chain edges flow to SuccChain.

arsenm added inline comments.Thu, Feb 13, 3:02 PM
llvm/test/CodeGen/ARM/block-placement-tail-duplication_pr44122.mir
8

I would expect you to be able to drop the IR section entirely and reduce this