Page MenuHomePhabricator

DomTree: Make PostDomTree immune to block successors swap
ClosedPublic

Authored by yrouban on Jul 28 2020, 8:42 AM.

Details

Summary

This is another fix for the bug 46098 "PostDominatorTree is different than a freshly computed one!" with opt -branch-prob -instcombine -block-freq -verify-dom-info

One of the fixes is the patch D81089 proposed to mark InstCombine as non-PreserveCFG pass if its branch predicate canonicalization swapped any branch successors. It causes re-calculation of CFG analysis that may impact performance.

Another approach (work-in-progres D84491, D84492, D84493 and D84495) is to move all the branch successors swapping transformations from InstCombine to SimplifyCFG. It needs much work to change many tests and may affect many user pipelines.

This fix is proposed by @kuhar in D81089:
... to chose some arbitrary way of selecting a fake-entry node from nodes in an scc. Right now this order is whatever children or inverse_children return, but could be based on the order of blocks in a function instead ....
So, while looking for the furthest away node in a reverse unreachable subgraph this patch runs DFS with successors in their function order. This order is indifferent to the order of successors, so is the furthest away node.

Diff Detail

Event Timeline

yrouban created this revision.Jul 28 2020, 8:42 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 28 2020, 8:42 AM
Herald added a subscriber: mgrang. · View Herald Transcript
yrouban requested review of this revision.Jul 28 2020, 8:42 AM

Thanks for looking into this!
I like this fix more :)

kuhar requested changes to this revision.Jul 28 2020, 11:21 AM

Thanks for pushing on this.

While this makes root finding insensitive to successors' order, I think it still may be possible to run into tree instability by changing the order of tree children (also plain domtree), and in turn changing DFS numbers used by other analyses. In the case of a full tree construction, we could populate an ordering like this before tree construction to solve the tree children order issue, but I don't think this will work in case of incremental updates. AFAIK tree updates would still be based on some Cfg/Graph traits, which may give you unstable order.

Given that you want to fix InstCombine not to swap successors and move this to SimplifyCFG, does this patch improve anything in the current pass pipeline?

Overall, looks like a nice improvement to me, if there is no noticeable impact on compilation times. Have you benchmarked this patch?

llvm/include/llvm/Support/GenericDomTreeConstruction.h
187

nit: use llvm::sort

189

The if checks that the size is at least 2, but this assumes something stronger: that both A and B are in the map. Since the map is mutable, this would insert missing elements making debugging hard IMO.

391

nit: consider using llvm::Optional to stack-allocate the map.

445–447

If I understand correctly, this will go over the whole function and populate the SuccOrder map with O(|F|) pairs.
Can we instead only populate it with nodes in the same SCC? This will still include some unnecessary nodes (reverse-reachable loop entries), but should be much smaller.

This revision now requires changes to proceed.Jul 28 2020, 11:21 AM

Thanks for pushing on this.

While this makes root finding insensitive to successors' order, I think it still may be possible to run into tree instability by changing the order of tree children (also plain domtree), and in turn changing DFS numbers used by other analyses. In the case of a full tree construction, we could populate an ordering like this before tree construction to solve the tree children order issue, but I don't think this will work in case of incremental updates. AFAIK tree updates would still be based on some Cfg/Graph traits, which may give you unstable order.

Given that you want to fix InstCombine not to swap successors and move this to SimplifyCFG, does this patch improve anything in the current pass pipeline?

I don't believe that is the correct fix, because it literally says "nope, there is no way to update PDT after swapping successors other than full domtree recomputation", that is why i'm pushing for fixing PDT.

Overall, looks like a nice improvement to me, if there is no noticeable impact on compilation times. Have you benchmarked this patch?

Given that you want to fix InstCombine not to swap successors and move this to SimplifyCFG, does this patch improve anything in the current pass pipeline?

I don't believe that is the correct fix, because it literally says "nope, there is no way to update PDT after swapping successors other than full domtree recomputation", that is why i'm pushing for fixing PDT.

This patch doesn't allow for updating PDT either, it just makes it insensitive to that order. I guess this is consistent with saying that domtrees should be compared by checking if they are isomorphic, such that we don't have to care about updating forward dominators after successor swaps either.

Given that you want to fix InstCombine not to swap successors and move this to SimplifyCFG, does this patch improve anything in the current pass pipeline?

I don't believe that is the correct fix, because it literally says "nope, there is no way to update PDT after swapping successors other than full domtree recomputation", that is why i'm pushing for fixing PDT.

This patch doesn't allow for updating PDT either, it just makes it insensitive to that order.

Yep, which should be enough to avoid touching instcombine (marking it as not preserving cfg / moving successor swapping into simplifycfg)

I guess this is consistent with saying that domtrees should be compared by checking if they are isomorphic, such that we don't have to care about updating forward dominators after successor swaps either.

I think this is a good direction.

Taking a few steps back, I'd love to see a cfg preservation check like D81558 land for both IR and MIR. Without agreeing on the definition of data structure equality/equivalence (CFG, domtree, etc.), it's difficult to make progress if we don't know what exactly to preserve, let alone agreeing on how to preserve things. Historically, I think this assumptions mismatch has been a source of difficult address bug creep.

Taking a few steps back, I'd love to see a cfg preservation check like D81558 ...

I'm about to update the patches.

While this makes root finding insensitive to successors' order, I think it still may be possible to run into tree instability by changing the order of tree children (also plain domtree),...

I feel I do not understand. Any example?

and in turn changing DFS numbers used by other analyses. ...

I agree that DFS numbers are changed by successors swap, but can we make an exclusion for DFS numbers? If not, then this patch is useless because successors swap change CFG by definition (DFS numbers will be changes if we recalculate them).

AFAIK tree updates would still be based on some Cfg/Graph traits, which may give you unstable order.

As I see successors swap cannot be encoded for DomTreeUpdater. So this kind of change are not tracked.

llvm/include/llvm/Support/GenericDomTreeConstruction.h
189

there should not be missing elements. All blocks of the parent are mappend. I will mark SuccOrder const.

445–447

I agree that adding only needed elements to the map would take less memory. Though it could complicate implementation and take more CPU cycles. Every time we need to add a bunch of new mapped nodes we need to iterate through the whole DT.Parent just to get its number in the DT.Parent iteration.

Taking a few steps back, I'd love to see a cfg preservation check like D81558 ...

I'm about to update the patches.

While this makes root finding insensitive to successors' order, I think it still may be possible to run into tree instability by changing the order of tree children (also plain domtree),...

I feel I do not understand. Any example?

and in turn changing DFS numbers used by other analyses. ...

I agree that DFS numbers are changed by successors swap, but can we make an exclusion for DFS numbers? If not, then this patch is useless because successors swap change CFG by definition (DFS numbers will be changes if we recalculate them).

AFAIK tree updates would still be based on some Cfg/Graph traits, which may give you unstable order.

As I see successors swap cannot be encoded for DomTreeUpdater. So this kind of change are not tracked.

Right. The key thing is that we don't care about the tree node children order, so not being able to track successor order in the updater is not an issue. It would be an issue for finding roots after/during updates, but since the updater bails out when roots change, we cannot do much there either.

llvm/include/llvm/Support/GenericDomTreeConstruction.h
445–447

Ack. I think this is fine as long as the compilation times are not regressed.

One more thing: should this be backported for the upcoming 11.0 release?

One more thing: should this be backported for the upcoming 11.0 release?

From my point of view I'm just fixing the bug 46098.
I suggest that we work on this change with a normal priority unless the fix is needed in 11.0.

yrouban marked 2 inline comments as done.Jul 30 2020, 12:58 AM
yrouban updated this revision to Diff 281833.Jul 30 2020, 1:50 AM

Addressed comments.
Added a test case with two reverse unreachable subgraphs.

kuhar added a comment.Jul 31 2020, 8:00 AM

The code looks fine to me. Have you checked if compilation times are affected by this patch?

llvm/include/llvm/Support/GenericDomTreeConstruction.h
162–165

Can you update the comment and explain what SuccOrder is for?

yrouban marked 2 inline comments as done.Aug 2 2020, 11:31 PM
yrouban updated this revision to Diff 282504.Aug 2 2020, 11:35 PM

added comment on SuccOrder parameter of runDFS().

The code looks fine to me. Have you checked if compilation times are affected by this patch?

I have not checked compile time impact. I see 3 options:

  1. Land the patch and hope that any regression will be caught by @nikic's http://llvm-compile-time-tracker.com/
  2. Ask @nikic to check this particular patch before I land it
  3. Ask @nikic to add a custom perf/ branch for me to test this patch on my own
nikic added a comment.Aug 3 2020, 1:10 AM

Here's the numbers for this patch: https://llvm-compile-time-tracker.com/compare.php?from=4fdc4d892b988bb9f2e06c3440971d28d6361722&to=63b8e6e79174411d9340790fa5e4f67bc73620a0&stat=instructions There's a measurable impact, but doesn't seem particularly concerning (geomean 0.1% regression).

llvm/include/llvm/Support/GenericDomTreeConstruction.h
394

You might want to use a SmallDenseMap, as number of successors is usually small.

yrouban added inline comments.Aug 3 2020, 2:32 AM
llvm/include/llvm/Support/GenericDomTreeConstruction.h
394

It is a successor set for all reverse unreachable nodes (like infinite loops). There are could be many of those. Ok let us try 16.

yrouban updated this revision to Diff 282544.Aug 3 2020, 2:34 AM

changed type of SuccOrder to SmallDenseMap<NodePtr, unsigned, 16>.

nikic added inline comments.Aug 3 2020, 4:10 AM
llvm/include/llvm/Support/GenericDomTreeConstruction.h
394

Oh, I see. Looks like switching to the SmallDenseMap does make things worse rather than better: https://llvm-compile-time-tracker.com/compare.php?from=63b8e6e79174411d9340790fa5e4f67bc73620a0&to=a0b0ea28c422a3e65c8df71b392f77047485f470&stat=instructions

So it's probably better to go with your original version here.

yrouban updated this revision to Diff 282579.Aug 3 2020, 4:33 AM

Reverted NodeOrderMap from SmallDenseMap to DenseMap.

@nikic reported that the geomean regression is 0.1%.
@kuhar, could you please lgtm?
Please consider taking look at D81558 (PreserveCFG checker).

kuhar accepted this revision.Aug 4 2020, 11:30 AM
This revision is now accepted and ready to land.Aug 4 2020, 11:30 AM
This revision was automatically updated to reflect the committed changes.

Sorry I had to revert in 1366d66a22a5f0d25fcc6e922118bb51ab22f8c1 : the MLIR build was broken.
Repro by adding -DLLVM_ENABLE_PROJECTS=mlir and try the target: ninja tools/mlir/lib/IR/CMakeFiles/obj.MLIRIR.dir/Dominance.cpp.o

I think the premerge check didn't run here because you didn't use arc to update your revision.

Thanks. I should have tested some more than just llvm.

This also broke the Polly buildbot twice: http://lab.llvm.org:8011/builders/polly-x86_64-linux/builds/3542 and http://lab.llvm.org:8011/builders/polly-x86_64-linux/builds/3547. Please also fix the Polly tests when you get notifications from the Polly buildbot.

This also broke the Polly buildbot twice: http://lab.llvm.org:8011/builders/polly-x86_64-linux/builds/3542 and http://lab.llvm.org:8011/builders/polly-x86_64-linux/builds/3547. Please also fix the Polly tests when you get notifications from the Polly buildbot.

ok. As I see, both failures have been fixed. Thanks @Meinersbur for fixing polly/test/Isl/Ast/alias_checks_with_empty_context.ll