This is an archive of the discontinued LLVM Phabricator instance.

[CGP] Merge empty case blocks if no extra moves are added.
ClosedPublic

Authored by bmakam on Aug 31 2017, 11:03 AM.

Details

Summary

Currently we skip merging when extra moves may be added in the header of switch instead of the case block, if the case block is used as an incoming
block of a PHI. If all the incoming values of the PHIs are non-constants and the destination block is dominated by the switch block then extra moves are likely not added by ISel, so there is no need to skip merging in this case.

Diff Detail

Repository
rL LLVM

Event Timeline

bmakam created this revision.Aug 31 2017, 11:03 AM
junbuml edited edge metadata.Sep 1 2017, 7:05 AM

I agree that skipping merging is not desirable when an extra mov is not actually added in the header of switch. If we can somehow estimate the generation of an extra mov in here, allowing merging must be a good choice. I'm not perfectly clear if checking constants is good enough for that estimation. However, it's very difficult to perfectly estimate that in CGP, so I'm okay with it if doing this is cheap enough and provide us with a reasonable level of estimation.

bmakam updated this revision to Diff 114502.Sep 9 2017, 10:05 PM
bmakam retitled this revision from [CodegenPrepare] Merge empty case blocks if no extra moves are added.(WIP) to [CGP] Merge empty case blocks if no extra moves are added..
bmakam edited the summary of this revision. (Show Details)
  • Added some more checks to allow merging empty cases when we can estimate extra moves are not added.
  • Added test cases.
  • spec2017/perlbench improves by 4.5%. No other improvements or regressions observed in Spec.

Please take a look.

kindly ping.

efriedma added inline comments.Sep 13 2017, 12:17 PM
lib/CodeGen/CodeGenPrepare.cpp
814 ↗(On Diff #114502)

Checking specifically for ConstantInt here seems a little weird... I would normally expect any constant to behave the same way.

bmakam updated this revision to Diff 115236.Sep 14 2017, 9:58 AM

Address Eli's comment.

bmakam marked an inline comment as done.Sep 14 2017, 9:58 AM
efriedma edited edge metadata.Sep 19 2017, 2:31 PM

I think this needs more testcases to illustrate the different possibilities here.

lib/CodeGen/CodeGenPrepare.cpp
737 ↗(On Diff #115236)

If you really need domtree, at least hoist computing it into it out into eliminateMostlyEmptyBlocks. (Computing it here leads to quadratic complexity for large functions.)

784 ↗(On Diff #115236)

Is this predicate significantly different from dominates(Pred, DestBB)? I mean, in theory it's slightly different, but the difference isn't relevant in your testcase.

816 ↗(On Diff #115236)

Leftover debug dump?

test/Transforms/CodeGenPrepare/skip-merging-case-block.ll
26 ↗(On Diff #115236)

Please try to avoid undef in testcases.

I think this needs more testcases to illustrate the different possibilities here.

Thanks for taking a look Eli. I'll add more testcases.

lib/CodeGen/CodeGenPrepare.cpp
737 ↗(On Diff #115236)

DT gets modified whenever an empty block is eliminated so I had it here. I agree this is not efficient, let me see if I can hoist it out safely.

784 ↗(On Diff #115236)

I'll add a testcase and hopefully it will be easier to understand.

816 ↗(On Diff #115236)

Oops! I will fix this.

test/Transforms/CodeGenPrepare/skip-merging-case-block.ll
26 ↗(On Diff #115236)

I will fix this.

bmakam updated this revision to Diff 116503.Sep 25 2017, 2:49 AM
bmakam edited the summary of this revision. (Show Details)

Address review comments:

  • Added more test cases.
  • Remove quadratic complexity to calculate DT.
  • Simplified the predicate to dominates(Pred, DestBB) as suggested by Eli.
bmakam marked 2 inline comments as done.Sep 25 2017, 2:53 AM

Remove quadratic complexity to calculate DT.

You're still computing DT once for every empty block?

bmakam added a comment.EditedSep 25 2017, 12:30 PM

Remove quadratic complexity to calculate DT.

You're still computing DT once for every empty block?

I'm now lazily computing DT only when an empty block can be merged into destination block that have all non-constant incoming phi values. In my tests I found I was computing DT for <10 times. If I have compute DT in eliminateMostlyEmptyBlocks I will be computing for all the functions even though we might not need it and will have to update the DT whenever an empty block is eliminated.

The real problem with computing DT there isn't how long it takes in common cases, it's that the total time will blow up quadratically for large functions.

Adding a couple more reviewers... the heuristic sort of makes sense to me, but I'd like another opinion.

hfinkel edited edge metadata.Sep 27 2017, 7:33 PM

The real problem with computing DT there isn't how long it takes in common cases, it's that the total time will blow up quadratically for large functions.

This certainly makes me nervous.

Can you just precompute the DT for the function? Maintaining the DT while removing mostly-empty blocks might not be hard.

Adding a couple more reviewers... the heuristic sort of makes sense to me, but I'd like another opinion.

lib/CodeGen/CodeGenPrepare.cpp
799 ↗(On Diff #116503)

You don't need to change this line (AFAIKT).

bmakam updated this revision to Diff 117074.Sep 28 2017, 4:17 PM

Precompute DT as suggested by Hal and Eli. Please take a look.

efriedma added inline comments.Sep 28 2017, 5:05 PM
lib/CodeGen/CodeGenPrepare.cpp
713 ↗(On Diff #117074)

This usage of "ModifiedDT" is weird; if the CFG is ever modified, you recompute every time you visit a basic blocK?

720 ↗(On Diff #117074)

New code should be using insertEdge/deleteEdge/applyUpdates to do domtree updates.

bmakam added inline comments.Sep 28 2017, 8:08 PM
lib/CodeGen/CodeGenPrepare.cpp
713 ↗(On Diff #117074)

In some corner cases [1], the entry block is removed and dominator tree cannot be notified of this change. In such cases we will have to recalculate the entire tree. This was primarily the reason why I was lazily computing DT earlier. Please let me know if there is a better way to handle this.
[1] test/Transforms/CodeGenPrepare/2008-11-24-RAUW-Self.ll

720 ↗(On Diff #117074)

I found this to be more easy to understand, I am happy to change this to DominatorTree::UpdateTypes/applyUpdates if this is the new coding standard.

bmakam added inline comments.Sep 28 2017, 8:38 PM
lib/CodeGen/CodeGenPrepare.cpp
713 ↗(On Diff #117074)

It seems like r314435 has taken care of this corner case in MergeBasicBlockIntoOnlyPred. So we can now simply pass DT to MergeBasicBlockIntoOnlyPred. I will rebase and update the patch.

bmakam added inline comments.Sep 28 2017, 11:42 PM
lib/CodeGen/CodeGenPrepare.cpp
720 ↗(On Diff #117074)

I tried using the new batch updater interface for DominatorTree. I create a vector of updates. First, insert edges from all predecessors of BB to DestBB. Second, remove edges from all predecessors of BB to BB and finally, delete edge from BB to DestBB. IIUC, the batch updater is efficient for multiple CFG updates, but in this case updating the IDom and erasing the Node seems more efficient to me.

+    eliminateMostlyEmptyBlock(BB);
     if (DT && !ModifiedDT) {
+/*
       BasicBlock *BBIDom = DT->getNode(BB)->getIDom()->getBlock();
       BasicBlock *DestBBIDom = DT->getNode(DestBB)->getIDom()->getBlock();
       BasicBlock *NewIDom = DT->findNearestCommonDominator(BBIDom, DestBBIDom);
       DT->changeImmediateDominator(DestBB, NewIDom);
       DT->eraseNode(BB);
+*/
+      SmallVector<DominatorTree::UpdateType, 3> Updates;
+      for (auto *PredBB : predecessors(BB)) {
+        if (PredBB == BB)
+          continue;
+        DominatorTree::UpdateType UT = {DominatorTree::Insert, PredBB, DestBB};
+        if (find(Updates, UT) == Updates.end()) {
+          Updates.push_back(UT);
+          Updates.push_back({DominatorTree::Delete, PredBB, BB});
+        }
+      }
+      if (!Updates.empty()) {
+        Updates.push_back({DominatorTree::Delete, BB, DestBB});
+        DT->applyUpdates(Updates);
+      }
     }
-    eliminateMostlyEmptyBlock(BB);

What do you think?

bmakam updated this revision to Diff 117093.Sep 28 2017, 11:56 PM

Use batch updater to inform DT about CFG updates.

bmakam updated this revision to Diff 117191.Sep 29 2017, 11:46 AM

Fix make-check asserts.

efriedma added inline comments.
lib/CodeGen/CodeGenPrepare.cpp
349 ↗(On Diff #117191)

You should be able to guarantee DT is available by putting "AU.addRequired<DominatorTreeWrapperPass>();" in getAnalysisUsage(). That will simplify a bunch of null checks. (It's okay not to preserve it for now, but add a comment noting that you're intentionally not preserving it.)

720 ↗(On Diff #117074)

IIRC some later dom operations become more expensive after you use changeImmediateDominator, but I don't remember the details. Adding a couple people to the CC list who would know more.

bmakam updated this revision to Diff 117210.Sep 29 2017, 1:05 PM
bmakam marked an inline comment as done.

Address review changes.

lib/CodeGen/CodeGenPrepare.cpp
720 ↗(On Diff #117074)

Interesting. Thanks, Eli. Anyways, I have now updated the patch to not use changeImmediateDominator and use the new batch update interfaces.

dberlin added inline comments.Sep 29 2017, 1:14 PM
lib/CodeGen/CodeGenPrepare.cpp
720 ↗(On Diff #117074)

FWIW: One of our usual concerns is that what people think is the right way to change the dominator tree is often incorrect in practice.
As for the niceness or not niceness, one significant advantage is that this can update the postdominatortree as well.

What you see as "change a simple dominator", even assuming it is right all the time for dominators, is definitely not right for postdominators :)

Let me tackle the inefficiency and verbosity:

Verbosity is something i mentioned to Jakub, and we're thinking about the kind of Updater interface that hides a little more of this (and updates both dom and postdom).

So i guess i'd much prefer to use the thing you think is inefficient because we know it is right, unless the inefficiency actually

  1. is real
  2. matters.

(but we are completely aware the interface is a little verbose at the moment, and we're gathering data from how it's being used to know what to do to fix it)

Thanks Daniel/Eli, I learnt something new today and am glad I asked the question. :)

efriedma added inline comments.Oct 3 2017, 5:25 PM
lib/CodeGen/CodeGenPrepare.cpp
954 ↗(On Diff #117210)

"BB->getParent() == DestBB->getParent()"?

Also, you can't use the pointer "BB" after you free it (so this needs to be before the eraseFromParent() call.)

kuhar added inline comments.Oct 3 2017, 5:29 PM
lib/CodeGen/CodeGenPrepare.cpp
954 ↗(On Diff #117210)

@efriedma deleteEdge requires the edge to be actually deleted, so the right thing to do is:

1.  BB->removeFromParent()
2.  DT->deleteEdge(BB, DestBB)
3.  delete BB
kuhar added inline comments.Oct 3 2017, 5:36 PM
lib/CodeGen/CodeGenPrepare.cpp
776 ↗(On Diff #117210)

Isn't this the same thing as saying all_of(PNs, ... == ...)? I think that would be more straight-forward and easy to read here.

951 ↗(On Diff #117210)

I think that it'd be better to leave this micro-optimization for the updater to do :)
I'll add it there tomorrow.

bmakam updated this revision to Diff 117725.Oct 4 2017, 1:17 PM

Address review comments.

bmakam marked 3 inline comments as done.Oct 4 2017, 1:18 PM
efriedma added inline comments.Oct 4 2017, 1:53 PM
lib/CodeGen/CodeGenPrepare.cpp
953 ↗(On Diff #117725)

"BB->getParent() == DestBB->getParent()" still here. What is that trying to check for?

bmakam added inline comments.Oct 4 2017, 2:18 PM
lib/CodeGen/CodeGenPrepare.cpp
953 ↗(On Diff #117725)

err, I was trying to avoid hitting this assert from DT->deleteEdge:
void llvm::DominatorTreeBase<llvm::BasicBlock, false>::deleteEdge(NodeT *, NodeT *) [NodeT = llvm::BasicBlock, IsPostDom = false]: Assertion `From->getParent() == Parent' failed.

But looks like this code is unreachable, so I don't need to delete the edge BB, DestBB after BB->eraseFromParent()?

bmakam added inline comments.Oct 4 2017, 2:34 PM
lib/CodeGen/CodeGenPrepare.cpp
953 ↗(On Diff #117725)

I think the right way to delete the edge is

  1. BB->getTerminator()->eraseFromParent()
  2. DT->deleteEdge(BB, DestBB)
  3. BB->eraseFromParent()

This way I don't hit the assert and I can delete the edge in DT.

bmakam updated this revision to Diff 117744.Oct 4 2017, 2:36 PM

fix the sequence of deleting an edge from DT.

(Gonna let jakub comment on what should work/not work here)

kuhar added inline comments.Oct 4 2017, 6:01 PM
lib/CodeGen/CodeGenPrepare.cpp
953 ↗(On Diff #117725)

Are you sure that calling BB->removeFromParent() actually sets the BB's Parent pointer? The sequence of calls I mention is used in a couple of places quite some time now.

Your method looks correct as well, but I wouldn't call it the most intuitive one :).

bmakam added inline comments.Oct 5 2017, 7:17 AM
lib/CodeGen/CodeGenPrepare.cpp
953 ↗(On Diff #117725)

Its possible I might be missing something here, but if I call BB->removeFromParent() and then DT->deleteEdge(BB, DestBB) I hit the above assert for 533 lit test cases with ninja check.

kuhar added inline comments.Oct 5 2017, 7:24 AM
lib/CodeGen/CodeGenPrepare.cpp
953 ↗(On Diff #117725)

Ah, I see, I missed that you have an edge BB -> DestBB and want to delete *BB*. In this case, your code looks fine.

bmakam updated this revision to Diff 117873.Oct 5 2017, 1:04 PM

Minor nit, use is_contained. NFCI.

Hi everyone,

Are there anymore comments on this patch? Thanks!

efriedma accepted this revision.Oct 26 2017, 2:10 PM

LGTM.

This revision is now accepted and ready to land.Oct 26 2017, 2:10 PM

Thanks for the reviews.

This revision was automatically updated to reflect the committed changes.