Details
Diff Detail
Event Timeline
Have you tried to measure compile-time impact?
llvm/lib/CodeGen/CodeGenPrepare.cpp | ||
---|---|---|
7042 | This might work, but it's going to be expensive; deleting the edge before adding the other edges will force the whole tree to be reconstructed, I think. Can we use the SplitBlockAndInsertIfThen/SplitBlockAndInsertIfThenElse utilities here? |
As a high level comment: If we always work on DTU, that would allow to only flush the DT updates when the DT is next accessed, rather than in each transform.
llvm/lib/CodeGen/CodeGenPrepare.cpp | ||
---|---|---|
618–619 | Would it be hard to also preserve the DT though the handful of transforms above? If we did that, we could directly use the DT from the pass manager, and also preserve it afterwards as well. |
Yeah, I measured with the whole stack as of today and the results aren't ... impressive. Basically, differences within error/noise margin, maybe with a tiny hint of regression.
Update:
- now we use a DT from the optimisation pipeline, instead of building a new one in the pass
- made applying DT updates even lazier - they are accumulated in a lazy DomTreeUpdater and applied when/if the DT is requested by a transform (and also at the end of the pass, when the updated is destructed).
- some more transforms are made to update the DT
- one extra rebuild of the DT removed
On CTMark with LTO, measuring only the final link step, wall-clock time and instruction count, I get:
Time | Inst | |
7zip | -0.32% | -0.18% |
bullet | 0.15% | -0.12% |
clamav | -0.61% | -0.14% |
consumer-typeset | -0.06% | -0.07% |
kc | -0.41% | -0.31% |
lencod | -0.29% | -0.05% |
mafft | 0.34% | -0.06% |
spass | -0.08% | -0.11% |
sqlite | 0.06% | -0.11% |
tramp3d | -0.14% | -0.14% |
or -0.14%/-0.13% geomean, respectively.
llvm/lib/CodeGen/CodeGenPrepare.cpp | ||
---|---|---|
618–619 | Done, mostly. |
This seems to look OK, right? I know there are dragons here with some functions, but it looks like the compile time is better and it seems like a better design to keep the DomTree up to date, if we can do so without destroying compile time.
llvm/lib/CodeGen/CodeGenPrepare.cpp | ||
---|---|---|
607 | This should be less indented? | |
690 | ConstantFoldTerminator | |
896 | As a general rule you should probably just run clang-format on any patch you upload :) |
LGTM
llvm/lib/CodeGen/CodeGenPrepare.cpp | ||
---|---|---|
6180 | This is a bit unfortunate, would be better if SplitEdge accepted DTU. But it looks like that would take a larger refactoring. |
fyi, I bisected a crash during codegen prepare to this commit. I'm trying to reduce a repro now.
Reduced:
struct a { a(decltype(nullptr)); long b; } *c; enum { d }; struct t { _Atomic(a) e; }; struct f : t {}; struct atomic { f g; void h() { f *j = &g; t *k = j; __c11_atomic_load(&k->e, d); } }; a l(a *addr) { reinterpret_cast<atomic *>(addr)->h(); return 0; } struct { void m(long v) { long w = sizeof(this); c = reinterpret_cast<a *>(w); a *o = c; l(&o[v]); } } p; a q(); long r; bool u(); a s() { if (u()) return q(); for (long i = r; i; i++) p.m(i); return nullptr; }
Compiled w/ bin/clang -c /tmp/repro.cc -O2 -std=c++17
1. <eof> parser at end of file 2. Code generation 3. Running pass 'Function Pass Manager' on module '/tmp/repro.cc'. 4. Running pass 'CodeGen Prepare' on function '@_Z1sv' ... #5 0x00005632704fe3b9 llvm::ilist_node_base<true>::isSentinel() const /home/rupprecht/src/llvm-project/llvm/include/llvm/ADT/ilist_node_base.h:45:36 #6 0x00005632704fe3b9 llvm::ilist_node_base<true>::isKnownSentinel() const /home/rupprecht/src/llvm-project/llvm/include/llvm/ADT/ilist_node_base.h:46:41 #7 0x00005632704fe3b9 llvm::ilist_iterator<llvm::ilist_detail::node_options<llvm::BasicBlock, true, false, void>, false, false>::operator*() const /home/rupprecht/src/llvm-project/llvm/include/llvm/ADT/ilist_iterator.h:138:5 #8 0x00005632704fe3b9 llvm::early_inc_iterator_impl<llvm::ilist_iterator<llvm::ilist_detail::node_options<llvm::BasicBlock, true, false, void>, false, false>>::operator*() /home/rupprecht/src/llvm-project/llvm/include/llvm/ADT/STLExtras.h:631:12 #9 0x00005632704fe3b9 (anonymous namespace)::CodeGenPrepare::runOnFunction(llvm::Function&) /home/rupprecht/src/llvm-project/llvm/lib/CodeGen/CodeGenPrepare.cpp:622:25 #10 0x0000563270a9d9ac llvm::FPPassManager::runOnFunction(llvm::Function&) /home/rupprecht/src/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:0:27 #11 0x0000563270aa3d61 llvm::FPPassManager::runOnModule(llvm::Module&) /home/rupprecht/src/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1481:13 #12 0x0000563270a9e008 (anonymous namespace)::MPPassManager::runOnModule(llvm::Module&) /home/rupprecht/src/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:0:27
I couldn't so far reproduce it (basic config with assertions and expensive checks).
Maybe llvm::make_early_inc_range(F) fails because of non-finalised deletion of blocks ... I'll keep on digging.
Quite likely.
I did a bootstrap on a slightly modified version where I forced the integrity checks on the DT tree,
unfortunately that had the side-effect of forcing DTU updates.
When I remove the checks I get the crash.
Fixed the crash during compilation. Requesting the DT, while vising all the function blocks, can have the side effect of
deleting blocks, which can cause the traversal to access free/reused memory.
I think I fixed this by checking/making sure all the loops over the basic blocks fall into these cases:
- basic traversal (e.g for (auto &BB : F) ... ) while not requesting the DT in the body of the loop.
- traversal using an auxiliary vector of WeakTrackingVHs.
- the main loops of the pass:
DTU->flush(); for (BasicBlock &BB : llvm::make_early_inc_range(F)) {
which can only delete the current block (in dupRetToEnableTailCallOpts).
The extra flush of the DTU (added in this update) ensures the current block is valid at the start of the first iteration
and the early increment ensures the block is valid at the start of each subsequent iteration.
I think DTU->getDomTree() will already do this.