Some targets may add required passes via TargetMachine::registerPassBuilderCallbacks(). We need to run those even under -O0. As an example, BPFTargetMachine adds BPFAbstractMemberAccessPass, a required pass. This also allows us to clean up BackendUtil.cpp (and out-of-tree Rust usage of the NPM) by allowing us to share added passes like coroutines and sanitizers between -O0 and other optimization levels. Since callbacks may end up not adding passes, we need to check if the pass managers are empty before adding them, so PassManager now has an isEmpty() function. For example, polly adds callbacks but doesn't always add passes in those callbacks, so this is necessary to keep -debug-pass-manager tests' output from changing depending on if polly is enabled or not. Tests are a continuation of those added in https://reviews.llvm.org/D89083.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
clang/lib/CodeGen/BackendUtil.cpp | ||
---|---|---|
1444 | It should be possible to simplify this function a lot now. It currently duplicates nearly all logic (for sanitizers etc) between O0 and O1+. Or is that planned for a followup? |
clang/lib/CodeGen/BackendUtil.cpp | ||
---|---|---|
1444 | Yup, I have that ready to go after this :) |
IIUC, this is the NPM version of EP_EarlyAsPossible in legacy PM for O0, right? Between the choice of (1) [this patch and ] reusing all existing EP callbacks and letting optnone filtering out non-required passes (which I think does not work for -disable-O0-optnone) and (2) a separate EP callback, say, O0EPCallbacks, I would vote for the latter since it is more explicit and does exactly what we expect it to do. The cost is that we also need to add passes to O0EPCallbacks separately which I think is justified since required passes should be rare.
I took a look at the legacy PM, it adds EP_EarlyAsPossible and EP_EnabledOnOptLevel0 for -O0. I changed this to match EP_EarlyAsPossible, then we can manually add passes corresponding to EP_EnabledOnOptLevel0. EP_EnabledOnOptLevel0 was only used for sanitizers and coroutines, which are already handled. So this should take care of target specific required IR passes, since those are typically run as EP_EarlyAsPossible in the legacy PM.
(I'll update the documentation once we've agreed)
llvm/include/llvm/Passes/PassBuilder.h | ||
---|---|---|
609 | invokeO0EPCallbacks? I saw there is PassBuilder::invokePeepholeEPCallbacks. | |
llvm/lib/Passes/PassBuilder.cpp | ||
1667 | What I have in mind is a newly added O0EPCallbacks field in PassBuilder class. Then we can keep existing EPCallbacks (including PipelineStartEPCallbacks) for >O0 optimization pipeline. Yeah, then you need to add related passes to O0EPCallbacks (for BPF in this case). |
llvm/lib/Passes/PassBuilder.cpp | ||
---|---|---|
1667 | It's a tradeoff between having to specify required passes in both O0EPCallbacks and PipelineStartEPCallbacks which is repetitive, versus making all callbacks in PipelineStartEPCallbacks run at -O0, meaning even optional passes in PipelineStartEPCallbacks will run at -O0 (may be skipped via optnone). The legacy PM chooses the first, and I'm inclined to keep it that way just for consistency. If we did go down the second route, we could just have a second TargetMachine API like TargetMachine::addO0Passes() which directly adds passes to a ModulePassManager. |
llvm/lib/Passes/PassBuilder.cpp | ||
---|---|---|
1667 | This seems more in line with the LPM behavior for O0. |
llvm/lib/Passes/PassBuilder.cpp | ||
---|---|---|
1667 |
Indeed.
I'm lost here. Do you mean to say the second?
Do you mean to say the first? This is not much different from adding to O0EPCallbacks in TargetMachine::registerPassBuilderCallbacks. My proposal to have both O0EPCallbacks and PipelineStartEPCallbacks is that I'm not sure why we want to run all EP callbacks at O0. Do we have use cases for that? |
llvm/lib/Passes/PassBuilder.cpp | ||
---|---|---|
1667 | I am so sorry, yes I got them flipped. As of the current patch, we're not running all callbacks, just PipelineStartEPCallbacks which is in line with the legacy PM. |
llvm/lib/Passes/PassBuilder.cpp | ||
---|---|---|
1667 |
Definitely agree that this is in line with legacy PM. I don't feel strongly to go either way. Since @echristo is actively tuning O0, it would be helpful to know his opinion. |
If you are not invoking all the needed callbacks in invokeRegisteredO0EPCallbacks(), I would suggest to instead provide a set of methods like invokePipelineStartEPCallbacks(), invokeOptimizerLastEPCallbacks() that allow the consumer to chose which callbacks to invoke. (Just those two would be sufficient for my purposes.)
I'd really like to avoid this dance: https://github.com/rust-lang/rust/blob/c71248b70870960af9993de4f31d3cba9bbce7e8/compiler/rustc_llvm/llvm-wrapper/PassWrapper.cpp#L796 Where you need to separately collect all callbacks yourself, because the PassBuilder neither makes the necessary members public, nor provides a means to invoke the callbacks explicitly.
Looking at BackendUtil.cpp in Clang as well as the Rust code, I'm back to thinking that we should provide a way to to all callbacks. Then in the case of passes added via TargetMachine, we should extend TargetMachine::registerPassBuilderCallbacks to also take a PassBuilder::OptimizationLevel. Then targets can skip adding optional optimization passes at -O0. Does that make sense? It would allow us to clean up Clang and Rust.
In fact I see in AMDGPUTargetMachine::adjustPassManager a check for an optimization level, so some targets are already doing this, although it looks like it's more for codegen and not IR passes.
I'm more convinced now that my initial direction was the cleanest way. It would simplify users of PassBuilder quite a bit.
I don't want to include PassManager.h into TargetMachine.h just to get PassBuilder::OptimizationLevel, so either we could factor out OptimizationLevel into its own header file so that we can pass an OptimizationLevel to TargetMachine::registerPassBuilderCallbacks, or we could just pass a OnlyAddRequiredPasses bool to TargetMachine::registerPassBuilderCallbacks. But either way, that can happen in a future change when the need arises (such as when we implement AMDGPUTargetMachine::registerPassBuilderCallbacks).
https://reviews.llvm.org/D90486 for adding OptimizationLevel to registerPassBuilderCallbacks(), although this one should land first since that one needs to update PassBuilder::runRegisteredEPCallbacks() to call TargetMachine::registerPassBuilderCallbacks()
$ cmake \ -G Ninja \ -DCMAKE_BUILD_TYPE=Release \ -DCMAKE_C_COMPILER=$(command -v clang) \ -DCMAKE_CXX_COMPILER=$(command -v clang++) \ -DLLVM_CCACHE_BUILD=ON \ -DLLVM_ENABLE_PROJECTS="clang;polly" \ -DLLVM_TARGETS_TO_BUILD=X86 \ -DLLVM_USE_LINKER=$(command -v ld.lld) \ ../llvm $ ninja check-clang ... [0/1] Running the Clang regression tests -- Testing: 26861 tests, 64 workers -- Testing: llvm-lit: /tmp/llvm-project/llvm/utils/lit/lit/llvm/config.py:347: note: using clang: /tmp/llvm-project/build/bin/clang 0.. 10 FAIL: Clang :: CodeGen/lto-newpm-pipeline.c (3731 of 26861) ******************** TEST 'Clang :: CodeGen/lto-newpm-pipeline.c' FAILED ******************** Script: -- : 'RUN: at line 3'; /tmp/llvm-project/build/bin/clang -cc1 -internal-isystem /tmp/llvm-project/build/lib/clang/12.0.0/include -nostdsysteminc -triple x86_64-unknown-linux-gnu -emit-llvm-bc -o /dev/null -fexperimental-new-pass-manager -fdebug-pass-manager -flto=full -O0 /tmp/llvm-project/clang/test/CodeGen/lto-newpm-pipeline.c 2>&1 | /tmp/llvm-project/build/bin/FileCheck /tmp/llvm-project/clang/test/CodeGen/lto-newpm-pipeline.c -check-prefix=CHECK-FULL-O0 : 'RUN: at line 5'; /tmp/llvm-project/build/bin/clang -cc1 -internal-isystem /tmp/llvm-project/build/lib/clang/12.0.0/include -nostdsysteminc -triple x86_64-unknown-linux-gnu -emit-llvm-bc -o /dev/null -fexperimental-new-pass-manager -fdebug-pass-manager -flto=thin -O0 /tmp/llvm-project/clang/test/CodeGen/lto-newpm-pipeline.c 2>&1 | /tmp/llvm-project/build/bin/FileCheck /tmp/llvm-project/clang/test/CodeGen/lto-newpm-pipeline.c -check-prefix=CHECK-THIN-O0 : 'RUN: at line 7'; /tmp/llvm-project/build/bin/clang -cc1 -internal-isystem /tmp/llvm-project/build/lib/clang/12.0.0/include -nostdsysteminc -triple x86_64-unknown-linux-gnu -emit-llvm-bc -o /dev/null -fexperimental-new-pass-manager -fdebug-pass-manager -flto=full -O1 /tmp/llvm-project/clang/test/CodeGen/lto-newpm-pipeline.c 2>&1 | /tmp/llvm-project/build/bin/FileCheck /tmp/llvm-project/clang/test/CodeGen/lto-newpm-pipeline.c -check-prefix=CHECK-FULL-OPTIMIZED : 'RUN: at line 9'; /tmp/llvm-project/build/bin/clang -cc1 -internal-isystem /tmp/llvm-project/build/lib/clang/12.0.0/include -nostdsysteminc -triple x86_64-unknown-linux-gnu -emit-llvm-bc -o /dev/null -fexperimental-new-pass-manager -fdebug-pass-manager -flto=thin -O1 /tmp/llvm-project/clang/test/CodeGen/lto-newpm-pipeline.c 2>&1 | /tmp/llvm-project/build/bin/FileCheck /tmp/llvm-project/clang/test/CodeGen/lto-newpm-pipeline.c -check-prefix=CHECK-THIN-OPTIMIZED : 'RUN: at line 11'; /tmp/llvm-project/build/bin/clang -cc1 -internal-isystem /tmp/llvm-project/build/lib/clang/12.0.0/include -nostdsysteminc -triple x86_64-unknown-linux-gnu -emit-llvm-bc -o /dev/null -fexperimental-new-pass-manager -fdebug-pass-manager -flto=full -O2 /tmp/llvm-project/clang/test/CodeGen/lto-newpm-pipeline.c 2>&1 | /tmp/llvm-project/build/bin/FileCheck /tmp/llvm-project/clang/test/CodeGen/lto-newpm-pipeline.c -check-prefix=CHECK-FULL-OPTIMIZED : 'RUN: at line 13'; /tmp/llvm-project/build/bin/clang -cc1 -internal-isystem /tmp/llvm-project/build/lib/clang/12.0.0/include -nostdsysteminc -triple x86_64-unknown-linux-gnu -emit-llvm-bc -o /dev/null -fexperimental-new-pass-manager -fdebug-pass-manager -flto=thin -O2 /tmp/llvm-project/clang/test/CodeGen/lto-newpm-pipeline.c 2>&1 | /tmp/llvm-project/build/bin/FileCheck /tmp/llvm-project/clang/test/CodeGen/lto-newpm-pipeline.c -check-prefix=CHECK-THIN-OPTIMIZED : 'RUN: at line 15'; /tmp/llvm-project/build/bin/clang -cc1 -internal-isystem /tmp/llvm-project/build/lib/clang/12.0.0/include -nostdsysteminc -triple x86_64-unknown-linux-gnu -emit-llvm-bc -o /dev/null -fexperimental-new-pass-manager -fdebug-pass-manager -flto=full -O3 /tmp/llvm-project/clang/test/CodeGen/lto-newpm-pipeline.c 2>&1 | /tmp/llvm-project/build/bin/FileCheck /tmp/llvm-project/clang/test/CodeGen/lto-newpm-pipeline.c -check-prefix=CHECK-FULL-OPTIMIZED : 'RUN: at line 17'; /tmp/llvm-project/build/bin/clang -cc1 -internal-isystem /tmp/llvm-project/build/lib/clang/12.0.0/include -nostdsysteminc -triple x86_64-unknown-linux-gnu -emit-llvm-bc -o /dev/null -fexperimental-new-pass-manager -fdebug-pass-manager -flto=thin -O3 /tmp/llvm-project/clang/test/CodeGen/lto-newpm-pipeline.c 2>&1 | /tmp/llvm-project/build/bin/FileCheck /tmp/llvm-project/clang/test/CodeGen/lto-newpm-pipeline.c -check-prefix=CHECK-THIN-OPTIMIZED : 'RUN: at line 19'; /tmp/llvm-project/build/bin/clang -cc1 -internal-isystem /tmp/llvm-project/build/lib/clang/12.0.0/include -nostdsysteminc -triple x86_64-unknown-linux-gnu -emit-llvm-bc -o /dev/null -fexperimental-new-pass-manager -fdebug-pass-manager -flto=full -Os /tmp/llvm-project/clang/test/CodeGen/lto-newpm-pipeline.c 2>&1 | /tmp/llvm-project/build/bin/FileCheck /tmp/llvm-project/clang/test/CodeGen/lto-newpm-pipeline.c -check-prefix=CHECK-FULL-OPTIMIZED : 'RUN: at line 21'; /tmp/llvm-project/build/bin/clang -cc1 -internal-isystem /tmp/llvm-project/build/lib/clang/12.0.0/include -nostdsysteminc -triple x86_64-unknown-linux-gnu -emit-llvm-bc -o /dev/null -fexperimental-new-pass-manager -fdebug-pass-manager -flto=thin -Os /tmp/llvm-project/clang/test/CodeGen/lto-newpm-pipeline.c 2>&1 | /tmp/llvm-project/build/bin/FileCheck /tmp/llvm-project/clang/test/CodeGen/lto-newpm-pipeline.c -check-prefix=CHECK-THIN-OPTIMIZED : 'RUN: at line 23'; /tmp/llvm-project/build/bin/clang -cc1 -internal-isystem /tmp/llvm-project/build/lib/clang/12.0.0/include -nostdsysteminc -triple x86_64-unknown-linux-gnu -emit-llvm-bc -o /dev/null -fexperimental-new-pass-manager -fdebug-pass-manager -flto=full -Oz /tmp/llvm-project/clang/test/CodeGen/lto-newpm-pipeline.c 2>&1 | /tmp/llvm-project/build/bin/FileCheck /tmp/llvm-project/clang/test/CodeGen/lto-newpm-pipeline.c -check-prefix=CHECK-FULL-OPTIMIZED : 'RUN: at line 25'; /tmp/llvm-project/build/bin/clang -cc1 -internal-isystem /tmp/llvm-project/build/lib/clang/12.0.0/include -nostdsysteminc -triple x86_64-unknown-linux-gnu -emit-llvm-bc -o /dev/null -fexperimental-new-pass-manager -fdebug-pass-manager -flto=thin -Oz /tmp/llvm-project/clang/test/CodeGen/lto-newpm-pipeline.c 2>&1 | /tmp/llvm-project/build/bin/FileCheck /tmp/llvm-project/clang/test/CodeGen/lto-newpm-pipeline.c -check-prefix=CHECK-THIN-OPTIMIZED -- Exit Code: 1 Command Output (stderr): -- /tmp/llvm-project/clang/test/CodeGen/lto-newpm-pipeline.c:34:24: error: CHECK-FULL-O0-NEXT: is not on the line after the previous match // CHECK-FULL-O0-NEXT: Running pass: BitcodeWriterPass ^ <stdin>:9:1: note: 'next' match was here Running pass: BitcodeWriterPass on /tmp/llvm-project/clang/test/CodeGen/lto-newpm-pipeline.c ^ <stdin>:6:33: note: previous match ended here Running pass: NameAnonGlobalPass on /tmp/llvm-project/clang/test/CodeGen/lto-newpm-pipeline.c ^ <stdin>:7:1: note: non-matching line after previous match is here Starting llvm::Function pass manager run. ^ Input file: <stdin> Check file: /tmp/llvm-project/clang/test/CodeGen/lto-newpm-pipeline.c -dump-input=help explains the following input dump. Input was: <<<<<< 1: Starting llvm::Module pass manager run. 2: Running pass: AlwaysInlinerPass on /tmp/llvm-project/clang/test/CodeGen/lto-newpm-pipeline.c 3: Running analysis: InnerAnalysisManagerProxy<llvm::FunctionAnalysisManager, llvm::Module> on /tmp/llvm-project/clang/test/CodeGen/lto-newpm-pipeline.c 4: Running analysis: ProfileSummaryAnalysis on /tmp/llvm-project/clang/test/CodeGen/lto-newpm-pipeline.c 5: Running pass: CanonicalizeAliasesPass on /tmp/llvm-project/clang/test/CodeGen/lto-newpm-pipeline.c 6: Running pass: NameAnonGlobalPass on /tmp/llvm-project/clang/test/CodeGen/lto-newpm-pipeline.c 7: Starting llvm::Function pass manager run. 8: Finished llvm::Function pass manager run. 9: Running pass: BitcodeWriterPass on /tmp/llvm-project/clang/test/CodeGen/lto-newpm-pipeline.c next:34 !~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: match on wrong line 10: Running analysis: ModuleSummaryIndexAnalysis on /tmp/llvm-project/clang/test/CodeGen/lto-newpm-pipeline.c 11: Running analysis: BlockFrequencyAnalysis on Foo 12: Running analysis: BranchProbabilityAnalysis on Foo 13: Running analysis: LoopAnalysis on Foo 14: Running analysis: DominatorTreeAnalysis on Foo 15: Running analysis: TargetLibraryAnalysis on Foo 16: Running analysis: PostDominatorTreeAnalysis on Foo 17: Finished llvm::Module pass manager run. >>>>>> -- ******************** Testing: 0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90.. ******************** Failed Tests (1): Clang :: CodeGen/lto-newpm-pipeline.c Testing Time: 71.92s Unsupported : 756 Passed : 26079 Expectedly Failed: 25 Failed : 1 FAILED: tools/clang/test/CMakeFiles/check-clang cd /tmp/llvm-project/build/tools/clang/test && /usr/bin/python3.8 /tmp/llvm-project/build/./bin/llvm-lit -sv --param USE_Z3_SOLVER=0 /tmp/llvm-project/build/tools/clang/test ninja: build stopped: subcommand failed.
Did you consider just removing the -NEXT suffix? Polly might not be the only case, other statically linked pass-plugins may also insert themselves at extension points.
@serge-sans-paille might have some input as well.
Are there passes that should be running at -O0 that aren't target specific? Otherwise the callbacks should only add passes if the optimization level isn't -O0.
At least for the legacy pass manager, there is an EP_EnabledOnOptLevel0 insertion point, that pass plugins can use to insert passes at -O0.
That should be obsolete by the OptimizationLevel parameter in the callbacks. The callbacks can inspect the OptimizationLevel to see if it's O0 or not when deciding to add a pass to the pipeline.
My point is that it is possible for a pass plugin (such as llvm/examples/Bye) that can add themselves to the -O0 pipeline. If those plugins are linked statically, it will break tests that assume that no passes are added at those extension points.
I don't quite understand why Polly breaks anything: Unlike the "Bye" pass, it only adds something to the pass pipeline when -polly is passed to the command line. If it helps, I could change Polly to not even add a callback unless enabled. However, this does not solve the general pass plugin problem.
With this change in its current state, it's ok to add callbacks, as long as they don't add passes at -O0.
Polly adds some callbacks. When I first submitted this, polly would add some callbacks to VectorizerStartEPCallbacks. Those callbacks didn't actually add any passes (I didn't investigate that, but it makes sense at -O0). However, the previous logic would add a FunctionPassManager if there were any callbacks, so we had an extra empty FunctionPassManager being run, which changed the output of -debug-pass-manager. Now I've changed it so that we check if the FunctionPassManager actually has any passes added to it before adding it to the ModulePassManager.
So previously it was
if (!VectorizerStartEPCallbacks.empty()) { FunctionPassManager FPM(DebugLogging); for (auto &C : VectorizerStartEPCallbacks) C(FPM, Level); MPM.addPass(createModuleToFunctionPassAdaptor(std::move(FPM))); }
Now it is
if (!VectorizerStartEPCallbacks.empty()) { FunctionPassManager FPM(DebugLogging); for (auto &C : VectorizerStartEPCallbacks) C(FPM, Level); if (!FPM.isEmpty()) MPM.addPass(createModuleToFunctionPassAdaptor(std::move(FPM))); }
which should take care of added registered callbacks that don't actually add anything at -O0.
Could you try configuring LLVM with -DLLVM_BYE_LINK_INTO_TOOLS=ON? I assume the test will break because it unconditionally adds a FunctionPass.
Thanks, I wasn't aware of this. Looks like other opt -passes='default<O#>' tests also look at if it present or not.
The following already fail with -DLLVM_BYE_LINK_INTO_TOOLS=ON
LLVM :: Other/opt-O0-pipeline.ll LLVM :: Other/opt-O2-pipeline.ll LLVM :: Other/opt-O3-pipeline-enable-matrix.ll LLVM :: Other/opt-O3-pipeline.ll LLVM :: Other/opt-Os-pipeline.ll
these are all legacy PM related.
AFAICT all issues were addressed.
@Meinersbur: are there any concerns left with this patch?
I'd prefer a solution that is agnostic to which pass plugins are compiled statically; however, this is also how Bye pass tests itself whether it us added to the pass pipeline too.
The following already fail with -DLLVM_BYE_LINK_INTO_TOOLS=ON
LLVM :: Other/opt-O0-pipeline.ll LLVM :: Other/opt-O2-pipeline.ll LLVM :: Other/opt-O3-pipeline-enable-matrix.ll LLVM :: Other/opt-O3-pipeline.ll LLVM :: Other/opt-Os-pipeline.llthese are all legacy PM related.
Probably a consequence of not being enabled by default. Tests do not get updated.
It should be possible to simplify this function a lot now. It currently duplicates nearly all logic (for sanitizers etc) between O0 and O1+. Or is that planned for a followup?