Currently we skip merging when extra moves may be added in the header of switch instead of the case block, if the case block is used as an incoming
block of a PHI. If all the incoming values of the PHIs are non-constants and the destination block is dominated by the switch block then extra moves are likely not added by ISel, so there is no need to skip merging in this case.
Details
Diff Detail
Event Timeline
I agree that skipping merging is not desirable when an extra mov is not actually added in the header of switch. If we can somehow estimate the generation of an extra mov in here, allowing merging must be a good choice. I'm not perfectly clear if checking constants is good enough for that estimation. However, it's very difficult to perfectly estimate that in CGP, so I'm okay with it if doing this is cheap enough and provide us with a reasonable level of estimation.
- Added some more checks to allow merging empty cases when we can estimate extra moves are not added.
- Added test cases.
- spec2017/perlbench improves by 4.5%. No other improvements or regressions observed in Spec.
Please take a look.
lib/CodeGen/CodeGenPrepare.cpp | ||
---|---|---|
761 | Checking specifically for ConstantInt here seems a little weird... I would normally expect any constant to behave the same way. |
I think this needs more testcases to illustrate the different possibilities here.
lib/CodeGen/CodeGenPrepare.cpp | ||
---|---|---|
689 | If you really need domtree, at least hoist computing it into it out into eliminateMostlyEmptyBlocks. (Computing it here leads to quadratic complexity for large functions.) | |
734 | Is this predicate significantly different from dominates(Pred, DestBB)? I mean, in theory it's slightly different, but the difference isn't relevant in your testcase. | |
763 | Leftover debug dump? | |
test/Transforms/CodeGenPrepare/skip-merging-case-block.ll | ||
26 ↗ | (On Diff #115236) | Please try to avoid undef in testcases. |
Thanks for taking a look Eli. I'll add more testcases.
lib/CodeGen/CodeGenPrepare.cpp | ||
---|---|---|
689 | DT gets modified whenever an empty block is eliminated so I had it here. I agree this is not efficient, let me see if I can hoist it out safely. | |
734 | I'll add a testcase and hopefully it will be easier to understand. | |
763 | Oops! I will fix this. | |
test/Transforms/CodeGenPrepare/skip-merging-case-block.ll | ||
26 ↗ | (On Diff #115236) | I will fix this. |
Address review comments:
- Added more test cases.
- Remove quadratic complexity to calculate DT.
- Simplified the predicate to dominates(Pred, DestBB) as suggested by Eli.
Remove quadratic complexity to calculate DT.
You're still computing DT once for every empty block?
I'm now lazily computing DT only when an empty block can be merged into destination block that have all non-constant incoming phi values. In my tests I found I was computing DT for <10 times. If I have compute DT in eliminateMostlyEmptyBlocks I will be computing for all the functions even though we might not need it and will have to update the DT whenever an empty block is eliminated.
The real problem with computing DT there isn't how long it takes in common cases, it's that the total time will blow up quadratically for large functions.
Adding a couple more reviewers... the heuristic sort of makes sense to me, but I'd like another opinion.
This certainly makes me nervous.
Can you just precompute the DT for the function? Maintaining the DT while removing mostly-empty blocks might not be hard.
Adding a couple more reviewers... the heuristic sort of makes sense to me, but I'd like another opinion.
lib/CodeGen/CodeGenPrepare.cpp | ||
---|---|---|
766 | You don't need to change this line (AFAIKT). |
lib/CodeGen/CodeGenPrepare.cpp | ||
---|---|---|
665 | In some corner cases [1], the entry block is removed and dominator tree cannot be notified of this change. In such cases we will have to recalculate the entire tree. This was primarily the reason why I was lazily computing DT earlier. Please let me know if there is a better way to handle this. | |
672 | I found this to be more easy to understand, I am happy to change this to DominatorTree::UpdateTypes/applyUpdates if this is the new coding standard. |
lib/CodeGen/CodeGenPrepare.cpp | ||
---|---|---|
665 | It seems like r314435 has taken care of this corner case in MergeBasicBlockIntoOnlyPred. So we can now simply pass DT to MergeBasicBlockIntoOnlyPred. I will rebase and update the patch. |
lib/CodeGen/CodeGenPrepare.cpp | ||
---|---|---|
672 | I tried using the new batch updater interface for DominatorTree. I create a vector of updates. First, insert edges from all predecessors of BB to DestBB. Second, remove edges from all predecessors of BB to BB and finally, delete edge from BB to DestBB. IIUC, the batch updater is efficient for multiple CFG updates, but in this case updating the IDom and erasing the Node seems more efficient to me. + eliminateMostlyEmptyBlock(BB); if (DT && !ModifiedDT) { +/* BasicBlock *BBIDom = DT->getNode(BB)->getIDom()->getBlock(); BasicBlock *DestBBIDom = DT->getNode(DestBB)->getIDom()->getBlock(); BasicBlock *NewIDom = DT->findNearestCommonDominator(BBIDom, DestBBIDom); DT->changeImmediateDominator(DestBB, NewIDom); DT->eraseNode(BB); +*/ + SmallVector<DominatorTree::UpdateType, 3> Updates; + for (auto *PredBB : predecessors(BB)) { + if (PredBB == BB) + continue; + DominatorTree::UpdateType UT = {DominatorTree::Insert, PredBB, DestBB}; + if (find(Updates, UT) == Updates.end()) { + Updates.push_back(UT); + Updates.push_back({DominatorTree::Delete, PredBB, BB}); + } + } + if (!Updates.empty()) { + Updates.push_back({DominatorTree::Delete, BB, DestBB}); + DT->applyUpdates(Updates); + } } - eliminateMostlyEmptyBlock(BB); What do you think? |
lib/CodeGen/CodeGenPrepare.cpp | ||
---|---|---|
305 | You should be able to guarantee DT is available by putting "AU.addRequired<DominatorTreeWrapperPass>();" in getAnalysisUsage(). That will simplify a bunch of null checks. (It's okay not to preserve it for now, but add a comment noting that you're intentionally not preserving it.) | |
672 | IIRC some later dom operations become more expensive after you use changeImmediateDominator, but I don't remember the details. Adding a couple people to the CC list who would know more. |
Address review changes.
lib/CodeGen/CodeGenPrepare.cpp | ||
---|---|---|
672 | Interesting. Thanks, Eli. Anyways, I have now updated the patch to not use changeImmediateDominator and use the new batch update interfaces. |
lib/CodeGen/CodeGenPrepare.cpp | ||
---|---|---|
672 | FWIW: One of our usual concerns is that what people think is the right way to change the dominator tree is often incorrect in practice. What you see as "change a simple dominator", even assuming it is right all the time for dominators, is definitely not right for postdominators :) Let me tackle the inefficiency and verbosity: Verbosity is something i mentioned to Jakub, and we're thinking about the kind of Updater interface that hides a little more of this (and updates both dom and postdom). So i guess i'd much prefer to use the thing you think is inefficient because we know it is right, unless the inefficiency actually
(but we are completely aware the interface is a little verbose at the moment, and we're gathering data from how it's being used to know what to do to fix it) |
lib/CodeGen/CodeGenPrepare.cpp | ||
---|---|---|
909 | "BB->getParent() == DestBB->getParent()"? Also, you can't use the pointer "BB" after you free it (so this needs to be before the eraseFromParent() call.) |
lib/CodeGen/CodeGenPrepare.cpp | ||
---|---|---|
910 | "BB->getParent() == DestBB->getParent()" still here. What is that trying to check for? |
lib/CodeGen/CodeGenPrepare.cpp | ||
---|---|---|
910 | err, I was trying to avoid hitting this assert from DT->deleteEdge: But looks like this code is unreachable, so I don't need to delete the edge BB, DestBB after BB->eraseFromParent()? |
lib/CodeGen/CodeGenPrepare.cpp | ||
---|---|---|
910 | I think the right way to delete the edge is
This way I don't hit the assert and I can delete the edge in DT. |
lib/CodeGen/CodeGenPrepare.cpp | ||
---|---|---|
910 | Are you sure that calling BB->removeFromParent() actually sets the BB's Parent pointer? The sequence of calls I mention is used in a couple of places quite some time now. Your method looks correct as well, but I wouldn't call it the most intuitive one :). |
lib/CodeGen/CodeGenPrepare.cpp | ||
---|---|---|
910 | Its possible I might be missing something here, but if I call BB->removeFromParent() and then DT->deleteEdge(BB, DestBB) I hit the above assert for 533 lit test cases with ninja check. |
lib/CodeGen/CodeGenPrepare.cpp | ||
---|---|---|
910 | Ah, I see, I missed that you have an edge BB -> DestBB and want to delete *BB*. In this case, your code looks fine. |
You should be able to guarantee DT is available by putting "AU.addRequired<DominatorTreeWrapperPass>();" in getAnalysisUsage(). That will simplify a bunch of null checks. (It's okay not to preserve it for now, but add a comment noting that you're intentionally not preserving it.)