The ArgumentPromotion pass uses Mem2Reg promotion at the end to cutting
down generated alloca instructions as well as meaningless stores and
this behavior can leave unused (dead) arguments. To eliminate the dead
arguments and therefore let the DeadCodeElimination remove becoming dead
inserted GEPs as well as loads and casts in the callers, the
DeadArgumentElimination pass should be run after the ArgumentPromotion
one.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Time | Test | |
---|---|---|
60,060 ms | x64 debian > ThreadSanitizer-x86_64.ThreadSanitizer-x86_64::restore_stack.cpp Script:
--
: 'RUN: at line 1'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang --driver-mode=g++ -fsanitize=thread -Wall -m64 -msse4.2 -gline-tables-only -I/var/lib/buildkite-agent/builds/llvm-project/compiler-rt/test/tsan/../ -std=c++11 -I/var/lib/buildkite-agent/builds/llvm-project/compiler-rt/test/tsan/../ -O1 /var/lib/buildkite-agent/builds/llvm-project/compiler-rt/test/tsan/restore_stack.cpp -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/compiler-rt/test/tsan/X86_64Config/Output/restore_stack.cpp.tmp && not /var/lib/buildkite-agent/builds/llvm-project/build/projects/compiler-rt/test/tsan/X86_64Config/Output/restore_stack.cpp.tmp 2>&1 | FileCheck /var/lib/buildkite-agent/builds/llvm-project/compiler-rt/test/tsan/restore_stack.cpp
|
Event Timeline
There should be a PhaseOrdering test that shows that the argument is not removed in the current opimization pipeline.
llvm/lib/Passes/PassBuilderPipelines.cpp | ||
---|---|---|
775 | I don't think we need a late module pass for this, it can be part of the normal module optimization pipeline (post module simplification). |
Thank you for the suggestion. I've added the test (3b7650da725) and rebased the current patch onto main to modify the test here. Also, I've removed the example from the patch's description because it has actually been moved into the test.
llvm/lib/Passes/PassBuilderPipelines.cpp | ||
---|---|---|
775 | I'm afraid I didn't get your comment. What I tried is to add DAE with MIWP.addModulePass but such module passes run before call graph passes. |
Add a PhaseOrdering test to show how the patch affects dead argument elimination after argument promotion.
llvm/lib/Passes/PassBuilderPipelines.cpp | ||
---|---|---|
731–733 | looks like it, usually it's better to use https://github.com/llvm-mirror/clang/blob/master/tools/clang-format/clang-format-diff.py |
Do we need to retain the run of DeadArgumentEliminationPass in the original position or is a single run at the new position sufficient?
llvm/lib/Passes/PassBuilderPipelines.cpp | ||
---|---|---|
1611 | If these formatting changes make sense, I can introduce them in a separate patch. Should I? |
llvm/lib/Passes/PassBuilderPipelines.cpp | ||
---|---|---|
1611 | They are fine here, I was just checking. Thanks! |
Good point! I tried and also removed the guard that the DAE pass should run with O3 only (currently it run exactly as the pass in the original position did: with any O > 0). I have a chance to run the LLVM :: Transforms tests only, one test failed - LLVM :: Transforms\InstCombine\unused-nonnull.ll:
error: CHECK-SAME: expected string not found in input ; CHECK-SAME: (i32 [[ARGC:%.*]], i8** nocapture readnone [[ARGV:%.*]]) local_unnamed_addr #[[ATTR0:[0-9]+]] { ^ <stdin>:7:17: note: scanning from here define i32 @main(i32 %argc, i8** nocapture readonly %argv) local_unnamed_addr #0 {
The difference is that the %argv argument has the readonly attribute, not readnone. I'm not sure whether this makes much sense (I guess this could because readnone can theoretically open a door for some optimizations for which readonly cannot).
I've submitted this change to let the buildbot run all the tests and see the difference.
llvm/lib/Passes/PassBuilderPipelines.cpp | ||
---|---|---|
775 | Oh, I think I have got the idea: instead of adding the DAE with MIWP.addLateModulePass() just after adding the ArgumentPromotion in the buildInlinerPipeline, to add DAE after this line with usual addPass: MPM.addPass(buildInlinerPipeline(Level, Phase)); This makes sense when we removing the original point of adding the DAE pass, because the buildInlinerPipeline() function is called not in every case. Actually, we just moving DAE after inlining. |
Try to remove the DAE from the original point. Also, apply the suggestion from @nikic - make the DAE a part of the normal module optimization pipeline (post module simplification).
moving DAE after the function simplification pipeline makes sense
this would be fixed by running PostOrderFunctionAttrsPass after the function simplification pipeline which is something I've wanted to do but never got around to testing. we should be inferring more precise attributes after fully simplifying functions. the only case that might regress is recursive functions since when visiting the recursive calls we haven't computed attributes for the recursive functions yet
this test is regressing with this patch because previously DAE would replace passed arguments with poison if the argument isn't used in the function, removing the use of %argv in @main and func-attrs would mark %argv as readnone, but with this patch func-attrs runs on @main before the use of %argv is eliminated. the call to @compute is eliminated in all cases in the function simplification pipeline due to inferring the returned attribute on %x, which is why running func-attrs after the function simplification pipeline fixes this issue
@@ -759,9 +758,6 @@ PassBuilder::buildInlinerPipeline(OptimizationLevel Level, if (AttributorRun & AttributorRunOption::CGSCC) MainCGPipeline.addPass(AttributorCGSCCPass()); - // Now deduce any function attributes based in the current code. - MainCGPipeline.addPass(PostOrderFunctionAttrsPass()); - // When at O3 add argument promotion to the pass pipeline. // FIXME: It isn't at all clear why this should be limited to O3. if (Level == OptimizationLevel::O3) @@ -781,6 +777,9 @@ PassBuilder::buildInlinerPipeline(OptimizationLevel Level, buildFunctionSimplificationPipeline(Level, Phase), PTO.EagerlyInvalidateAnalyses, EnableNoRerunSimplificationPipeline)); + // Now deduce any function attributes based in the current code. + MainCGPipeline.addPass(PostOrderFunctionAttrsPass()); + MainCGPipeline.addPass(CoroSplitPass(Level != OptimizationLevel::O0));
@aeubanks Thank you for the great explanation.
I've applied your suggestion and re-uploaded the patch. The test unused-nonnull.ll has been fixed with keeping the readnone attribute.
[Pipelines] Fix the Clang :: CodeGenCoroutines/coro-elide.cpp
Now, the %_Z5task1v.Frame type doesn't contain a field of the
%_Z5task0v.Frame one.
I'd tried moving func-attrs after the function simplification pipeline and perf-wise on internal benchmarks it looks fine, but it causes some compile time regressions: https://llvm-compile-time-tracker.com/compare.php?from=d3dd6e57fe84e90cadcdc78fa71d632f6573f156&to=592b8acec55f577713a5e1fb610a36c8742c682b&stat=instructions
My guess is something to do with caching/invalidating analyses but I'm not 100% sure
@aeubanks Hmm, if I correctly get your comment, I should revert this patch to the state before the proposed solution with moving the PostOrderFunctionAttrsPass at the end of the buildInlinerPipeline function regardless of the readonly instead of readnone regression. Personally along with your concern about compilation time, I have a concern about some changing in coroutines compilation, the Clang :: CodeGenCoroutines/coro-elide.cpp test demonstrates them:
// CHECK-NOT: %_Z5task1v.Frame = type {{.*}}%_Z5task0v.Frame
instead of
// CHECK: %_Z5task1v.Frame = type {{.*}}%_Z5task0v.Frame
I've reverted the latest changes because they require more investigation and, as you said, should be introduced in a separate patch.
Return the PostOrderFunctionAttrsPass pass back on its original place in the pipeline.
I think even with the readnone -> readonly change this patch should be fine, but let me run some internal benchmarks on this patch
the func-attrs change can come later