This is an archive of the discontinued LLVM Phabricator instance.

[CodeGenPrepare][NFC] Update the dominator tree instead of rebuilding it
Needs ReviewPublic

Authored by chill on Jun 23 2023, 8:54 AM.

Diff Detail

Event Timeline

chill created this revision.Jun 23 2023, 8:54 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 23 2023, 8:54 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
chill requested review of this revision.Jun 23 2023, 8:54 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 23 2023, 8:54 AM

Have you tried to measure compile-time impact?

llvm/lib/CodeGen/CodeGenPrepare.cpp
7042

This might work, but it's going to be expensive; deleting the edge before adding the other edges will force the whole tree to be reconstructed, I think.

Can we use the SplitBlockAndInsertIfThen/SplitBlockAndInsertIfThenElse utilities here?

nikic added a comment.Jun 23 2023, 1:32 PM

As a high level comment: If we always work on DTU, that would allow to only flush the DT updates when the DT is next accessed, rather than in each transform.

llvm/lib/CodeGen/CodeGenPrepare.cpp
618–619

Would it be hard to also preserve the DT though the handful of transforms above? If we did that, we could directly use the DT from the pass manager, and also preserve it afterwards as well.

chill added a comment.Jun 29 2023, 9:38 AM

Have you tried to measure compile-time impact?

Yeah, I measured with the whole stack as of today and the results aren't ... impressive. Basically, differences within error/noise margin, maybe with a tiny hint of regression.

chill marked an inline comment as done.Jun 29 2023, 9:38 AM
chill updated this revision to Diff 537375.Jul 5 2023, 8:57 AM
chill added a comment.Jul 5 2023, 9:19 AM

Update:

  • now we use a DT from the optimisation pipeline, instead of building a new one in the pass
  • made applying DT updates even lazier - they are accumulated in a lazy DomTreeUpdater and applied when/if the DT is requested by a transform (and also at the end of the pass, when the updated is destructed).
  • some more transforms are made to update the DT
  • one extra rebuild of the DT removed

On CTMark with LTO, measuring only the final link step, wall-clock time and instruction count, I get:

TimeInst
7zip-0.32%-0.18%
bullet0.15%-0.12%
clamav-0.61%-0.14%
consumer-typeset-0.06%-0.07%
kc-0.41%-0.31%
lencod-0.29%-0.05%
mafft0.34%-0.06%
spass-0.08%-0.11%
sqlite0.06%-0.11%
tramp3d-0.14%-0.14%

or -0.14%/-0.13% geomean, respectively.

llvm/lib/CodeGen/CodeGenPrepare.cpp
618–619

Done, mostly.
For eliminateMostlyEmptyBlocks and SplitIndirectBrCriticalEdges it's not entirely apparent how to update the DT/LI, so I just rebuild it
of those transforms made changes.

chill marked an inline comment as done.Jul 5 2023, 9:20 AM
dmgreen added inline comments.Jul 18 2023, 1:03 AM
llvm/lib/CodeGen/CodeGenPrepare.cpp
435

I think DTU->getDomTree() will already do this.

8343

Formatting?

chill updated this revision to Diff 541957.Jul 19 2023, 4:39 AM
chill marked 2 inline comments as done.

This seems to look OK, right? I know there are dragons here with some functions, but it looks like the compile time is better and it seems like a better design to keep the DomTree up to date, if we can do so without destroying compile time.

llvm/lib/CodeGen/CodeGenPrepare.cpp
607

This should be less indented?

690

ConstantFoldTerminator

896

As a general rule you should probably just run clang-format on any patch you upload :)

chill updated this revision to Diff 545540.Jul 31 2023, 1:54 AM
chill marked 3 inline comments as done.
nikic accepted this revision.Jul 31 2023, 1:05 PM

LGTM

llvm/lib/CodeGen/CodeGenPrepare.cpp
6180

This is a bit unfortunate, would be better if SplitEdge accepted DTU. But it looks like that would take a larger refactoring.

This revision is now accepted and ready to land.Jul 31 2023, 1:05 PM

fyi, I bisected a crash during codegen prepare to this commit. I'm trying to reduce a repro now.

Reduced:

struct a {
  a(decltype(nullptr));
  long b;
} *c;
enum { d };
struct t {
  _Atomic(a) e;
};
struct f : t {};
struct atomic {
  f g;
  void h() {
    f *j = &g;
    t *k = j;
    __c11_atomic_load(&k->e, d);
  }
};
a l(a *addr) {
  reinterpret_cast<atomic *>(addr)->h();
  return 0;
}
struct {
  void m(long v) {
    long w = sizeof(this);
    c = reinterpret_cast<a *>(w);
    a *o = c;
    l(&o[v]);
  }
} p;
a q();
long r;
bool u();
a s() {
  if (u()) return q();
  for (long i = r; i; i++) p.m(i);
  return nullptr;
}

Compiled w/ bin/clang -c /tmp/repro.cc -O2 -std=c++17

1.      <eof> parser at end of file
2.      Code generation
3.      Running pass 'Function Pass Manager' on module '/tmp/repro.cc'.
4.      Running pass 'CodeGen Prepare' on function '@_Z1sv'
...
 #5 0x00005632704fe3b9 llvm::ilist_node_base<true>::isSentinel() const /home/rupprecht/src/llvm-project/llvm/include/llvm/ADT/ilist_node_base.h:45:36
 #6 0x00005632704fe3b9 llvm::ilist_node_base<true>::isKnownSentinel() const /home/rupprecht/src/llvm-project/llvm/include/llvm/ADT/ilist_node_base.h:46:41
 #7 0x00005632704fe3b9 llvm::ilist_iterator<llvm::ilist_detail::node_options<llvm::BasicBlock, true, false, void>, false, false>::operator*() const /home/rupprecht/src/llvm-project/llvm/include/llvm/ADT/ilist_iterator.h:138:5
 #8 0x00005632704fe3b9 llvm::early_inc_iterator_impl<llvm::ilist_iterator<llvm::ilist_detail::node_options<llvm::BasicBlock, true, false, void>, false, false>>::operator*() /home/rupprecht/src/llvm-project/llvm/include/llvm/ADT/STLExtras.h:631:12
 #9 0x00005632704fe3b9 (anonymous namespace)::CodeGenPrepare::runOnFunction(llvm::Function&) /home/rupprecht/src/llvm-project/llvm/lib/CodeGen/CodeGenPrepare.cpp:622:25
#10 0x0000563270a9d9ac llvm::FPPassManager::runOnFunction(llvm::Function&) /home/rupprecht/src/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:0:27
#11 0x0000563270aa3d61 llvm::FPPassManager::runOnModule(llvm::Module&) /home/rupprecht/src/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1481:13
#12 0x0000563270a9e008 (anonymous namespace)::MPPassManager::runOnModule(llvm::Module&) /home/rupprecht/src/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:0:27
chill reopened this revision.Aug 2 2023, 12:48 AM

Thank you! I'll look at it.

This revision is now accepted and ready to land.Aug 2 2023, 12:48 AM
chill planned changes to this revision.Aug 2 2023, 12:48 AM
chill added a comment.Aug 2 2023, 2:48 AM

I couldn't so far reproduce it (basic config with assertions and expensive checks).
Maybe llvm::make_early_inc_range(F) fails because of non-finalised deletion of blocks ... I'll keep on digging.

chill added a comment.Aug 2 2023, 2:53 AM

I couldn't so far reproduce it (basic config with assertions and expensive checks).
Maybe llvm::make_early_inc_range(F) fails because of non-finalised deletion of blocks ... I'll keep on digging.

Quite likely.

I did a bootstrap on a slightly modified version where I forced the integrity checks on the DT tree,
unfortunately that had the side-effect of forcing DTU updates.

When I remove the checks I get the crash.

chill updated this revision to Diff 548647.Aug 9 2023, 9:37 AM

Fixed the crash during compilation. Requesting the DT, while vising all the function blocks, can have the side effect of
deleting blocks, which can cause the traversal to access free/reused memory.

I think I fixed this by checking/making sure all the loops over the basic blocks fall into these cases:

  • basic traversal (e.g for (auto &BB : F) ... ) while not requesting the DT in the body of the loop.
  • traversal using an auxiliary vector of WeakTrackingVHs.
  • the main loops of the pass:
DTU->flush();
for (BasicBlock &BB : llvm::make_early_inc_range(F)) {

which can only delete the current block (in dupRetToEnableTailCallOpts).
The extra flush of the DTU (added in this update) ensures the current block is valid at the start of the first iteration
and the early increment ensures the block is valid at the start of each subsequent iteration.

This revision is now accepted and ready to land.Aug 9 2023, 9:37 AM
chill requested review of this revision.Aug 9 2023, 9:38 AM
chill added a reviewer: rupprecht.
chill updated this revision to Diff 556117.Sep 7 2023, 2:29 AM

rebase and ping?