The ArgumentPromotion pass uses Mem2Reg promotion at the end to cutting
down generated alloca instructions as well as meaningless stores and
this behavior can leave unused (dead) arguments. To eliminate the dead
arguments and therefore let the DeadCodeElimination remove becoming dead
inserted GEPs as well as loads and casts in the callers, the
DeadArgumentElimination pass should be run after the ArgumentPromotion
one.
Details
- Reviewers
aeubanks mtrofin fhahn yln serge-sans-paille nikic jdoerfert - Commits
- rG1c530500ab86: [Pipelines] Introduce DAE after ArgumentPromotion
rGb10a341aa5b0: [Pipelines] Introduce DAE after ArgumentPromotion
rG879f5118fc74: [Pipelines] Introduce DAE after ArgumentPromotion
rG3f20dcbf708c: [Pipelines] Introduce DAE after ArgumentPromotion
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
There should be a PhaseOrdering test that shows that the argument is not removed in the current opimization pipeline.
llvm/lib/Passes/PassBuilderPipelines.cpp | ||
---|---|---|
775 | I don't think we need a late module pass for this, it can be part of the normal module optimization pipeline (post module simplification). |
Thank you for the suggestion. I've added the test (3b7650da725) and rebased the current patch onto main to modify the test here. Also, I've removed the example from the patch's description because it has actually been moved into the test.
llvm/lib/Passes/PassBuilderPipelines.cpp | ||
---|---|---|
775 | I'm afraid I didn't get your comment. What I tried is to add DAE with MIWP.addModulePass but such module passes run before call graph passes. |
Add a PhaseOrdering test to show how the patch affects dead argument elimination after argument promotion.
llvm/lib/Passes/PassBuilderPipelines.cpp | ||
---|---|---|
731–733 | looks like it, usually it's better to use https://github.com/llvm-mirror/clang/blob/master/tools/clang-format/clang-format-diff.py |
Do we need to retain the run of DeadArgumentEliminationPass in the original position or is a single run at the new position sufficient?
llvm/lib/Passes/PassBuilderPipelines.cpp | ||
---|---|---|
1611 | If these formatting changes make sense, I can introduce them in a separate patch. Should I? |
llvm/lib/Passes/PassBuilderPipelines.cpp | ||
---|---|---|
1611 | They are fine here, I was just checking. Thanks! |
Good point! I tried and also removed the guard that the DAE pass should run with O3 only (currently it run exactly as the pass in the original position did: with any O > 0). I have a chance to run the LLVM :: Transforms tests only, one test failed - LLVM :: Transforms\InstCombine\unused-nonnull.ll:
error: CHECK-SAME: expected string not found in input ; CHECK-SAME: (i32 [[ARGC:%.*]], i8** nocapture readnone [[ARGV:%.*]]) local_unnamed_addr #[[ATTR0:[0-9]+]] { ^ <stdin>:7:17: note: scanning from here define i32 @main(i32 %argc, i8** nocapture readonly %argv) local_unnamed_addr #0 {
The difference is that the %argv argument has the readonly attribute, not readnone. I'm not sure whether this makes much sense (I guess this could because readnone can theoretically open a door for some optimizations for which readonly cannot).
I've submitted this change to let the buildbot run all the tests and see the difference.
llvm/lib/Passes/PassBuilderPipelines.cpp | ||
---|---|---|
775 | Oh, I think I have got the idea: instead of adding the DAE with MIWP.addLateModulePass() just after adding the ArgumentPromotion in the buildInlinerPipeline, to add DAE after this line with usual addPass: MPM.addPass(buildInlinerPipeline(Level, Phase)); This makes sense when we removing the original point of adding the DAE pass, because the buildInlinerPipeline() function is called not in every case. Actually, we just moving DAE after inlining. |
Try to remove the DAE from the original point. Also, apply the suggestion from @nikic - make the DAE a part of the normal module optimization pipeline (post module simplification).
moving DAE after the function simplification pipeline makes sense
this would be fixed by running PostOrderFunctionAttrsPass after the function simplification pipeline which is something I've wanted to do but never got around to testing. we should be inferring more precise attributes after fully simplifying functions. the only case that might regress is recursive functions since when visiting the recursive calls we haven't computed attributes for the recursive functions yet
this test is regressing with this patch because previously DAE would replace passed arguments with poison if the argument isn't used in the function, removing the use of %argv in @main and func-attrs would mark %argv as readnone, but with this patch func-attrs runs on @main before the use of %argv is eliminated. the call to @compute is eliminated in all cases in the function simplification pipeline due to inferring the returned attribute on %x, which is why running func-attrs after the function simplification pipeline fixes this issue
@@ -759,9 +758,6 @@ PassBuilder::buildInlinerPipeline(OptimizationLevel Level, if (AttributorRun & AttributorRunOption::CGSCC) MainCGPipeline.addPass(AttributorCGSCCPass()); - // Now deduce any function attributes based in the current code. - MainCGPipeline.addPass(PostOrderFunctionAttrsPass()); - // When at O3 add argument promotion to the pass pipeline. // FIXME: It isn't at all clear why this should be limited to O3. if (Level == OptimizationLevel::O3) @@ -781,6 +777,9 @@ PassBuilder::buildInlinerPipeline(OptimizationLevel Level, buildFunctionSimplificationPipeline(Level, Phase), PTO.EagerlyInvalidateAnalyses, EnableNoRerunSimplificationPipeline)); + // Now deduce any function attributes based in the current code. + MainCGPipeline.addPass(PostOrderFunctionAttrsPass()); + MainCGPipeline.addPass(CoroSplitPass(Level != OptimizationLevel::O0));
@aeubanks Thank you for the great explanation.
I've applied your suggestion and re-uploaded the patch. The test unused-nonnull.ll has been fixed with keeping the readnone attribute.
[Pipelines] Fix the Clang :: CodeGenCoroutines/coro-elide.cpp
Now, the %_Z5task1v.Frame type doesn't contain a field of the
%_Z5task0v.Frame one.
I'd tried moving func-attrs after the function simplification pipeline and perf-wise on internal benchmarks it looks fine, but it causes some compile time regressions: https://llvm-compile-time-tracker.com/compare.php?from=d3dd6e57fe84e90cadcdc78fa71d632f6573f156&to=592b8acec55f577713a5e1fb610a36c8742c682b&stat=instructions
My guess is something to do with caching/invalidating analyses but I'm not 100% sure
@aeubanks Hmm, if I correctly get your comment, I should revert this patch to the state before the proposed solution with moving the PostOrderFunctionAttrsPass at the end of the buildInlinerPipeline function regardless of the readonly instead of readnone regression. Personally along with your concern about compilation time, I have a concern about some changing in coroutines compilation, the Clang :: CodeGenCoroutines/coro-elide.cpp test demonstrates them:
// CHECK-NOT: %_Z5task1v.Frame = type {{.*}}%_Z5task0v.Frame
instead of
// CHECK: %_Z5task1v.Frame = type {{.*}}%_Z5task0v.Frame
I've reverted the latest changes because they require more investigation and, as you said, should be introduced in a separate patch.
Return the PostOrderFunctionAttrsPass pass back on its original place in the pipeline.
I think even with the readnone -> readonly change this patch should be fine, but let me run some internal benchmarks on this patch
the func-attrs change can come later
@aeubanks Sorry for the late answer. Did you have a chance to run the benchmarks? If so, could you share the results?
sorry for the big delay, I did run the benchmarks and it looks fine
ran this through llvm-compile-time-tracker, looks good https://llvm-compile-time-tracker.com/compare.php?from=552b59b9e69fe1cb2b1ee0cb49cf8376a3dc0869&to=7af184006a9fd9d8deca5b7ae3625127fbb42535&stat=instructions
so I think we're good to go
@aeubanks Thank you very much for the benchmark results and patch review. I've landed the patch.
FYI, this broke the LLDB build bot: https://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/46324/execution/node/74/log/
Looks like we're testing that inlined unused parameters display correctly...
AssertionError: '(void *) unused1 = <no location, value may have been optimized out>' not found in '(void *) unused1 = 0x000000016fdff4d0\n'
But with this patch DWARF contains this extra entry for the unused parameter:
0x00000045: DW_TAG_formal_parameter DW_AT_location (0x00000000: [0x0000000100003f1c, 0x0000000100003f20): DW_OP_reg0 W0 [0x0000000100003f20, 0x0000000100003f24): DW_OP_entry_value(DW_OP_reg0 W0), DW_OP_stack_value) DW_AT_abstract_origin (0x00000067 "unused1")
whereas previously it was,
0x00000045: DW_TAG_formal_parameter DW_AT_abstract_origin (0x00000061 "unused1")
Maybe a flaw in the test? Any idea if this is a debug-info regression?
@Michael137 Thank you very much for the information!
I'm not sure, but it looks like the introduced change of the readnone attribute to readonly might make impact on DWARF. Unfortunately, I have no idea should this changes in DWARF be fixed or just it is enough to actualize the test.
I've reverted the patch to give our time to make the decision about DWARF generation.
I tried to triage a bit. The test lldb\test\API\functionalities\unused-inlined-parameters\TestUnusedInlinedParameters.py compiles the code in main.c with -O1 and generates the following IR for the @f function:
; Function Attrs: alwaysinline nounwind uwtable define dso_local void @f(ptr nocapture noundef readnone %unused1, i32 noundef %used, i32 noundef %unused2) local_unnamed_addr #1 { entry: tail call void @use(i32 noundef %used) ret void }
With the reverted patch, the IR looks like the follow:
; Function Attrs: alwaysinline nounwind uwtable define dso_local void @f(ptr nocapture readnone %unused1, i32 noundef %used, i32 %unused2) local_unnamed_addr #1 { entry: tail call void @use(i32 noundef %used) ret void }
So, as we can see, the attribute readnone is present for the first unused argument for which an additional piece of the DWARF code is generated in both IRs but the noundef attribute is omitted in the second case (so, this attribute is introduces by some changes in the patch and this is "documented" by the changes in the llvm/test/Transforms/InstCombine/unused-nonnull.ll test). I believe this comment from @aeubanks can describe what happens on the IR level: D128830#3639373
I'd try compiling with clang -g2 to see the effects of this patch on the debug info in the IR
Based on that debug info it looks like the patch might've improved things - the 'previous' description has no location, the new one has a location (if it's correct - is there evidence it's incorrect?)
What was the expected behavior of the test? What's the new behavior? Oh, I can read the assertion now.
The assertion was that there is no location - but now there is a location. That looks like a good thing?
Maybe a flaw in the test? Any idea if this is a debug-info regression?
I think we can "fix" the test with the following patch:
diff --git a/lldb/test/API/functionalities/unused-inlined-parameters/main.c b/lldb/test/API/functionalities/unused-inlined-parameters/main.c index f2ef5dcc213d..9b9f95f6c946 100644 --- a/lldb/test/API/functionalities/unused-inlined-parameters/main.c +++ b/lldb/test/API/functionalities/unused-inlined-parameters/main.c @@ -7,6 +7,7 @@ __attribute__((always_inline)) void f(void *unused1, int used, int unused2) { } int main(int argc, char **argv) { - f(argv, 42, 1); + char *undefined; + f(undefined, 42, 1); return 0; -}
Made the change to the test. Confirmed it passes with and without the patch. Feel free to push again.
Thanks!
Colleagues, thank you for the discussion.
@aprantl @Michael137 Thank you very much for the patch and the test changes and confirmation. I've pushed the patch again.
This triggers failed asserts for me:
$ cat repro.c static float strtof(char *, char *) {} void a() { strtof(a, 0); } $ clang -target x86_64-w64-mingw32 -w -c repro.c -O3 clang: ../lib/Analysis/CGSCCPassManager.cpp:958: updateCGAndAnalysisManagerForPass(llvm::LazyCallGraph&, llvm::LazyCallGraph::SCC&, llvm::LazyCallGraph::Node&, llvm::CGSCCAnalysisManager&, llvm::CGSCCUpdateResult&, llvm::FunctionAnalysisManager&, bool)::<lambda(llvm::Function&)>: Assertion `RefereeN && "Visited function should already have an associated node"' failed.
@mstorsjo Thank you very much for the information. Unfortunately, our tests didn't catch this problem.
I've reproduced this on Windows even w/o mingw. Some time is required for triaging.
We also saw this assert on our Windows build, and it also can be reproduced in Linux:
$ cat test2.c static char *getenv(char *) {} void foo() { getenv(""); } $ ~/src/upstream/879f5118fc74657e4a5c4eff6810098e1eed75ac-linux/bin/clang -c -O3 test2.c test2.c:1:27: warning: omitting the parameter name in a function definition is a C2x extension [-Wc2x-extensions] static char *getenv(char *) {} ^ test2.c:1:30: warning: non-void function does not return a value [-Wreturn-type] static char *getenv(char *) {} ^ clang: /home/dyung/src/upstream/llvm_clean_git/llvm/lib/Analysis/CGSCCPassManager.cpp:958: updateCGAndAnalysisManagerForPass(llvm::LazyCallGraph&, llvm::LazyCallGraph::SCC&, llvm::LazyCallGraph::Node&, llvm::CGSCCAnalysisManager&, llvm::CGSCCUpdateResult&, llvm::FunctionAnalysisManager&, bool)::<lambda(llvm::Function&)>: Assertion `RefereeN && "Visited function should already have an associated node"' failed.
The key seems to be the special function getenv(). If I rename the function, the crash does not occur.
Note that our internal builder found this while trying to build compiler-rt/lib/profile/InstrProfilingFile.c using the newly built compiler. If you think this might take a while to debug, could you please revert the change while you investigate?
No problem, I've reverted the commit while I need some time to build clang with the reverted commit even to make it clear the commit is guilty.
I'm sorry. It's very interesting, in @mstorsjo case, a function from the standard C library is used: strtof. When I renamed it in the test into strtof2, the problem has becomes not reproducible.
@Michael137 It looks like some DWARF generation/debugger test can fail after the revertion.
Noted, thanks for the heads up
AFAIK, the only test that'd now fail on the debug-info side is: https://reviews.llvm.org/D132664
reduced:
define void @a() { entry: %call = call float @strtof(ptr noundef null, ptr noundef null) ret void } define internal float @strtof(ptr noundef %0, ptr noundef %1) nounwind { entry: ret float 0.0 }
./build/rel/bin/opt -passes='inline,argpromotion' -disable-output /tmp/b.ll
likely something to do with how the CGSCC pass manager handles lib function (see isKnownLibFunction in LazyCallGraph.cpp)
I've run into https://github.com/llvm/llvm-project/issues/56503 after this patch so I've reverted this for now while I'm fixing that. Sorry for this patch exposing so many pre-existing issues. Will reland after fixing.
that was fixed and I've relanded the patch
hopefully there aren't any more CGSCC issues this uncovers
are these changes due to running clang-format (i.e. the previous change didn't?)