This is an archive of the discontinued LLVM Phabricator instance.

[Pipelines] Introduce DAE after ArgumentPromotion
ClosedPublic

Authored by psamolysov on Jun 29 2022, 8:25 AM.

Details

Summary

The ArgumentPromotion pass uses Mem2Reg promotion at the end to cutting
down generated alloca instructions as well as meaningless stores and
this behavior can leave unused (dead) arguments. To eliminate the dead
arguments and therefore let the DeadCodeElimination remove becoming dead
inserted GEPs as well as loads and casts in the callers, the
DeadArgumentElimination pass should be run after the ArgumentPromotion
one.

Diff Detail

Event Timeline

psamolysov created this revision.Jun 29 2022, 8:25 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 29 2022, 8:25 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
psamolysov requested review of this revision.Jun 29 2022, 8:25 AM
nikic added a comment.Jun 30 2022, 1:08 AM

There should be a PhaseOrdering test that shows that the argument is not removed in the current opimization pipeline.

llvm/lib/Passes/PassBuilderPipelines.cpp
772

I don't think we need a late module pass for this, it can be part of the normal module optimization pipeline (post module simplification).

There should be a PhaseOrdering test that shows that the argument is not removed in the current opimization pipeline.

Thank you for the suggestion. I've added the test (3b7650da725) and rebased the current patch onto main to modify the test here. Also, I've removed the example from the patch's description because it has actually been moved into the test.

llvm/lib/Passes/PassBuilderPipelines.cpp
772

I'm afraid I didn't get your comment. What I tried is to add DAE with MIWP.addModulePass but such module passes run before call graph passes.

psamolysov edited the summary of this revision. (Show Details)

Add a PhaseOrdering test to show how the patch affects dead argument elimination after argument promotion.

psamolysov planned changes to this revision.Jun 30 2022, 4:54 AM
psamolysov added a comment.EditedJun 30 2022, 4:54 AM

A number of tests failed, I'm going to fix them ASAP.

Actualize the new PM tests.

mtrofin added inline comments.Jun 30 2022, 7:50 AM
llvm/lib/Passes/PassBuilderPipelines.cpp
731–733

are these changes due to running clang-format (i.e. the previous change didn't?)

1611

same clang-format q

1689

spurious change?

1712

was this clang-format?

fhahn added inline comments.Jun 30 2022, 7:52 AM
llvm/lib/Passes/PassBuilderPipelines.cpp
731–733
fhahn added a comment.Jun 30 2022, 7:53 AM

Do we need to retain the run of DeadArgumentEliminationPass in the original position or is a single run at the new position sufficient?

psamolysov added inline comments.Jun 30 2022, 7:57 AM
llvm/lib/Passes/PassBuilderPipelines.cpp
731–733

@mtrofin Yes, this change is the result of a run of clang-format as well as the previous one.

@fhahn Thank you for the suggestion.

1611

Should I revert the changes introduced by clang-format?

1689

This change is introduced by clang-format. Should there be two empty lines between MainFPM.addPass(MergedLoadStoreMotionPass()); and if (EnableConstraintElimination)?

1712

Yes, it was.

psamolysov added inline comments.Jun 30 2022, 8:03 AM
llvm/lib/Passes/PassBuilderPipelines.cpp
1611

If these formatting changes make sense, I can introduce them in a separate patch. Should I?

mtrofin added inline comments.Jun 30 2022, 8:05 AM
llvm/lib/Passes/PassBuilderPipelines.cpp
1611

They are fine here, I was just checking. Thanks!

Do we need to retain the run of DeadArgumentEliminationPass in the original position or is a single run at the new position sufficient?

Good point! I tried and also removed the guard that the DAE pass should run with O3 only (currently it run exactly as the pass in the original position did: with any O > 0). I have a chance to run the LLVM :: Transforms tests only, one test failed - LLVM :: Transforms\InstCombine\unused-nonnull.ll:

error: CHECK-SAME: expected string not found in input
; CHECK-SAME: (i32 [[ARGC:%.*]], i8** nocapture readnone [[ARGV:%.*]]) local_unnamed_addr #[[ATTR0:[0-9]+]] {
              ^
<stdin>:7:17: note: scanning from here
define i32 @main(i32 %argc, i8** nocapture readonly %argv) local_unnamed_addr #0 {

The difference is that the %argv argument has the readonly attribute, not readnone. I'm not sure whether this makes much sense (I guess this could because readnone can theoretically open a door for some optimizations for which readonly cannot).

I've submitted this change to let the buildbot run all the tests and see the difference.

llvm/lib/Passes/PassBuilderPipelines.cpp
772

Oh, I think I have got the idea: instead of adding the DAE with MIWP.addLateModulePass() just after adding the ArgumentPromotion in the buildInlinerPipeline, to add DAE after this line with usual addPass:

MPM.addPass(buildInlinerPipeline(Level, Phase));

This makes sense when we removing the original point of adding the DAE pass, because the buildInlinerPipeline() function is called not in every case. Actually, we just moving DAE after inlining.

Try to remove the DAE from the original point. Also, apply the suggestion from @nikic - make the DAE a part of the normal module optimization pipeline (post module simplification).

Fix Clang :: CodeGen/thinlto-distributed-newpm.ll

Herald added a project: Restricted Project. · View Herald TranscriptJun 30 2022, 11:13 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript

moving DAE after the function simplification pipeline makes sense

Do we need to retain the run of DeadArgumentEliminationPass in the original position or is a single run at the new position sufficient?

Good point! I tried and also removed the guard that the DAE pass should run with O3 only (currently it run exactly as the pass in the original position did: with any O > 0). I have a chance to run the LLVM :: Transforms tests only, one test failed - LLVM :: Transforms\InstCombine\unused-nonnull.ll:

error: CHECK-SAME: expected string not found in input
; CHECK-SAME: (i32 [[ARGC:%.*]], i8** nocapture readnone [[ARGV:%.*]]) local_unnamed_addr #[[ATTR0:[0-9]+]] {
              ^
<stdin>:7:17: note: scanning from here
define i32 @main(i32 %argc, i8** nocapture readonly %argv) local_unnamed_addr #0 {

The difference is that the %argv argument has the readonly attribute, not readnone. I'm not sure whether this makes much sense (I guess this could because readnone can theoretically open a door for some optimizations for which readonly cannot).

this would be fixed by running PostOrderFunctionAttrsPass after the function simplification pipeline which is something I've wanted to do but never got around to testing. we should be inferring more precise attributes after fully simplifying functions. the only case that might regress is recursive functions since when visiting the recursive calls we haven't computed attributes for the recursive functions yet

this test is regressing with this patch because previously DAE would replace passed arguments with poison if the argument isn't used in the function, removing the use of %argv in @main and func-attrs would mark %argv as readnone, but with this patch func-attrs runs on @main before the use of %argv is eliminated. the call to @compute is eliminated in all cases in the function simplification pipeline due to inferring the returned attribute on %x, which is why running func-attrs after the function simplification pipeline fixes this issue

@@ -759,9 +758,6 @@ PassBuilder::buildInlinerPipeline(OptimizationLevel Level,
   if (AttributorRun & AttributorRunOption::CGSCC)
     MainCGPipeline.addPass(AttributorCGSCCPass());
 
-  // Now deduce any function attributes based in the current code.
-  MainCGPipeline.addPass(PostOrderFunctionAttrsPass());
-
   // When at O3 add argument promotion to the pass pipeline.
   // FIXME: It isn't at all clear why this should be limited to O3.
   if (Level == OptimizationLevel::O3)
@@ -781,6 +777,9 @@ PassBuilder::buildInlinerPipeline(OptimizationLevel Level,
       buildFunctionSimplificationPipeline(Level, Phase),
       PTO.EagerlyInvalidateAnalyses, EnableNoRerunSimplificationPipeline));
 
+  // Now deduce any function attributes based in the current code.
+  MainCGPipeline.addPass(PostOrderFunctionAttrsPass());
+
   MainCGPipeline.addPass(CoroSplitPass(Level != OptimizationLevel::O0));

@aeubanks Thank you for the great explanation.

I've applied your suggestion and re-uploaded the patch. The test unused-nonnull.ll has been fixed with keeping the readnone attribute.

[Pipelines] Fix the Clang :: CodeGenCoroutines/coro-elide.cpp

Now, the %_Z5task1v.Frame type doesn't contain a field of the
%_Z5task0v.Frame one.

I'd tried moving func-attrs after the function simplification pipeline and perf-wise on internal benchmarks it looks fine, but it causes some compile time regressions: https://llvm-compile-time-tracker.com/compare.php?from=d3dd6e57fe84e90cadcdc78fa71d632f6573f156&to=592b8acec55f577713a5e1fb610a36c8742c682b&stat=instructions
My guess is something to do with caching/invalidating analyses but I'm not 100% sure

(and we should definitely separate that out into its own patch first)

@aeubanks Hmm, if I correctly get your comment, I should revert this patch to the state before the proposed solution with moving the PostOrderFunctionAttrsPass at the end of the buildInlinerPipeline function regardless of the readonly instead of readnone regression. Personally along with your concern about compilation time, I have a concern about some changing in coroutines compilation, the Clang :: CodeGenCoroutines/coro-elide.cpp test demonstrates them:

// CHECK-NOT: %_Z5task1v.Frame = type {{.*}}%_Z5task0v.Frame

instead of

// CHECK: %_Z5task1v.Frame = type {{.*}}%_Z5task0v.Frame

I've reverted the latest changes because they require more investigation and, as you said, should be introduced in a separate patch.

Return the PostOrderFunctionAttrsPass pass back on its original place in the pipeline.

I think even with the readnone -> readonly change this patch should be fine, but let me run some internal benchmarks on this patch
the func-attrs change can come later

... but let me run some internal benchmarks on this patch

@aeubanks Sorry for the late answer. Did you have a chance to run the benchmarks? If so, could you share the results?

aeubanks accepted this revision.Aug 23 2022, 1:35 PM

sorry for the big delay, I did run the benchmarks and it looks fine

ran this through llvm-compile-time-tracker, looks good https://llvm-compile-time-tracker.com/compare.php?from=552b59b9e69fe1cb2b1ee0cb49cf8376a3dc0869&to=7af184006a9fd9d8deca5b7ae3625127fbb42535&stat=instructions

so I think we're good to go

This revision is now accepted and ready to land.Aug 23 2022, 1:35 PM
This revision was landed with ongoing or failed builds.Aug 24 2022, 12:38 AM
This revision was automatically updated to reflect the committed changes.
psamolysov added a comment.EditedAug 24 2022, 12:44 AM

@aeubanks Thank you very much for the benchmark results and patch review. I've landed the patch.

Michael137 added a subscriber: Michael137.EditedAug 24 2022, 2:33 AM

FYI, this broke the LLDB build bot: https://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/46324/execution/node/74/log/

Looks like we're testing that inlined unused parameters display correctly...

AssertionError: '(void *) unused1 = <no location, value may have been optimized out>' not found in '(void *) unused1 = 0x000000016fdff4d0\n'

But with this patch DWARF contains this extra entry for the unused parameter:

0x00000045:     DW_TAG_formal_parameter                                                                             
                  DW_AT_location    (0x00000000:                                                                    
                     [0x0000000100003f1c, 0x0000000100003f20): DW_OP_reg0 W0                                        
                     [0x0000000100003f20, 0x0000000100003f24): DW_OP_entry_value(DW_OP_reg0 W0), DW_OP_stack_value) 
                  DW_AT_abstract_origin (0x00000067 "unused1")

whereas previously it was,

0x00000045:     DW_TAG_formal_parameter                       
                  DW_AT_abstract_origin (0x00000061 "unused1")

Maybe a flaw in the test? Any idea if this is a debug-info regression?

@Michael137 Thank you very much for the information!

I'm not sure, but it looks like the introduced change of the readnone attribute to readonly might make impact on DWARF. Unfortunately, I have no idea should this changes in DWARF be fixed or just it is enough to actualize the test.

I've reverted the patch to give our time to make the decision about DWARF generation.

Michael137 added subscribers: aprantl, dblaikie.EditedAug 24 2022, 2:59 AM

@Michael137 Thank you very much for the information!

I'm not sure, but it looks like the introduced change of the readnone attribute to readonly might make impact on DWARF. Unfortunately, I have no idea should this changes in DWARF be fixed or just it is enough to actualize the test.

I've reverted the patch to give our time to make the decision about DWARF generation.

Thanks!

@aprantl @dblaikie Looks like this needs to accommodate existing DWARF generation behaviour?

FYI, this test compiles with -O1, presumably expecting unused params to get optimised out (and not having a location attr in DWARF).

psamolysov added a comment.EditedAug 24 2022, 3:13 AM

I tried to triage a bit. The test lldb\test\API\functionalities\unused-inlined-parameters\TestUnusedInlinedParameters.py compiles the code in main.c with -O1 and generates the following IR for the @f function:

; Function Attrs: alwaysinline nounwind uwtable
define dso_local void @f(ptr nocapture noundef readnone %unused1, i32 noundef %used, i32 noundef %unused2) local_unnamed_addr #1 {
entry:
  tail call void @use(i32 noundef %used)
  ret void
}

With the reverted patch, the IR looks like the follow:

; Function Attrs: alwaysinline nounwind uwtable
define dso_local void @f(ptr nocapture readnone %unused1, i32 noundef %used, i32 %unused2) local_unnamed_addr #1 {
entry:
  tail call void @use(i32 noundef %used)
  ret void
}

So, as we can see, the attribute readnone is present for the first unused argument for which an additional piece of the DWARF code is generated in both IRs but the noundef attribute is omitted in the second case (so, this attribute is introduces by some changes in the patch and this is "documented" by the changes in the llvm/test/Transforms/InstCombine/unused-nonnull.ll test). I believe this comment from @aeubanks can describe what happens on the IR level: D128830#3639373

I'd try compiling with clang -g2 to see the effects of this patch on the debug info in the IR

FYI, this broke the LLDB build bot: https://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/46324/execution/node/74/log/

Looks like we're testing that inlined unused parameters display correctly...

AssertionError: '(void *) unused1 = <no location, value may have been optimized out>' not found in '(void *) unused1 = 0x000000016fdff4d0\n'

But with this patch DWARF contains this extra entry for the unused parameter:

0x00000045:     DW_TAG_formal_parameter                                                                             
                  DW_AT_location    (0x00000000:                                                                    
                     [0x0000000100003f1c, 0x0000000100003f20): DW_OP_reg0 W0                                        
                     [0x0000000100003f20, 0x0000000100003f24): DW_OP_entry_value(DW_OP_reg0 W0), DW_OP_stack_value) 
                  DW_AT_abstract_origin (0x00000067 "unused1")

whereas previously it was,

0x00000045:     DW_TAG_formal_parameter                       
                  DW_AT_abstract_origin (0x00000061 "unused1")

Based on that debug info it looks like the patch might've improved things - the 'previous' description has no location, the new one has a location (if it's correct - is there evidence it's incorrect?)

What was the expected behavior of the test? What's the new behavior? Oh, I can read the assertion now.

The assertion was that there is no location - but now there is a location. That looks like a good thing?

Maybe a flaw in the test? Any idea if this is a debug-info regression?

I think we can "fix" the test with the following patch:

diff --git a/lldb/test/API/functionalities/unused-inlined-parameters/main.c b/lldb/test/API/functionalities/unused-inlined-parameters/main.c
index f2ef5dcc213d..9b9f95f6c946 100644
--- a/lldb/test/API/functionalities/unused-inlined-parameters/main.c
+++ b/lldb/test/API/functionalities/unused-inlined-parameters/main.c
@@ -7,6 +7,7 @@ __attribute__((always_inline)) void f(void *unused1, int used, int unused2) {
 }
 
 int main(int argc, char **argv) {
-  f(argv, 42, 1);
+  char *undefined;
+  f(undefined, 42, 1);
   return 0;
-}
Michael137 added a comment.EditedAug 25 2022, 12:52 AM

I think we can "fix" the test with the following patch:

diff --git a/lldb/test/API/functionalities/unused-inlined-parameters/main.c b/lldb/test/API/functionalities/unused-inlined-parameters/main.c
index f2ef5dcc213d..9b9f95f6c946 100644
--- a/lldb/test/API/functionalities/unused-inlined-parameters/main.c
+++ b/lldb/test/API/functionalities/unused-inlined-parameters/main.c
@@ -7,6 +7,7 @@ __attribute__((always_inline)) void f(void *unused1, int used, int unused2) {
 }
 
 int main(int argc, char **argv) {
-  f(argv, 42, 1);
+  char *undefined;
+  f(undefined, 42, 1);
   return 0;
-}

Made the change to the test. Confirmed it passes with and without the patch. Feel free to push again.

Thanks!

Colleagues, thank you for the discussion.

@aprantl @Michael137 Thank you very much for the patch and the test changes and confirmation. I've pushed the patch again.

This triggers failed asserts for me:

$ cat repro.c 
static float strtof(char *, char *) {}
void a() { strtof(a, 0); }
$ clang -target x86_64-w64-mingw32 -w -c repro.c -O3
clang: ../lib/Analysis/CGSCCPassManager.cpp:958: updateCGAndAnalysisManagerForPass(llvm::LazyCallGraph&, llvm::LazyCallGraph::SCC&, llvm::LazyCallGraph::Node&, llvm::CGSCCAnalysisManager&, llvm::CGSCCUpdateResult&, llvm::FunctionAnalysisManager&, bool)::<lambda(llvm::Function&)>: Assertion `RefereeN && "Visited function should already have an associated node"' failed.

@mstorsjo Thank you very much for the information. Unfortunately, our tests didn't catch this problem.

I've reproduced this on Windows even w/o mingw. Some time is required for triaging.

dyung added a subscriber: dyung.Aug 26 2022, 3:25 AM

We also saw this assert on our Windows build, and it also can be reproduced in Linux:

$ cat test2.c
static char *getenv(char *) {}
void foo() { getenv(""); }
$ ~/src/upstream/879f5118fc74657e4a5c4eff6810098e1eed75ac-linux/bin/clang -c -O3 test2.c                                  
test2.c:1:27: warning: omitting the parameter name in a function definition is a C2x extension [-Wc2x-extensions]                                                                                                                                   
static char *getenv(char *) {}                                                                                                                                                                                                                      
                          ^                                                                                                                                                                                                                         
test2.c:1:30: warning: non-void function does not return a value [-Wreturn-type]                                                                                                                                                                    
static char *getenv(char *) {}                                                                                                                                                                                                                      
                             ^                                                                                            
clang: /home/dyung/src/upstream/llvm_clean_git/llvm/lib/Analysis/CGSCCPassManager.cpp:958: updateCGAndAnalysisManagerForPass(llvm::LazyCallGraph&, llvm::LazyCallGraph::SCC&, llvm::LazyCallGraph::Node&, llvm::CGSCCAnalysisManager&, llvm::CGSCCUpdateResult&, llvm::FunctionAnalysisManager&, bool)::<lambda(llvm::Function&)>: Assertion `RefereeN && "Visited function should already have an associated node"' failed.

The key seems to be the special function getenv(). If I rename the function, the crash does not occur.

dyung added a comment.Aug 26 2022, 3:37 AM

We also saw this assert on our Windows build, and it also can be reproduced in Linux:

$ cat test2.c
static char *getenv(char *) {}
void foo() { getenv(""); }
$ ~/src/upstream/879f5118fc74657e4a5c4eff6810098e1eed75ac-linux/bin/clang -c -O3 test2.c                                  
test2.c:1:27: warning: omitting the parameter name in a function definition is a C2x extension [-Wc2x-extensions]                                                                                                                                   
static char *getenv(char *) {}                                                                                                                                                                                                                      
                          ^                                                                                                                                                                                                                         
test2.c:1:30: warning: non-void function does not return a value [-Wreturn-type]                                                                                                                                                                    
static char *getenv(char *) {}                                                                                                                                                                                                                      
                             ^                                                                                            
clang: /home/dyung/src/upstream/llvm_clean_git/llvm/lib/Analysis/CGSCCPassManager.cpp:958: updateCGAndAnalysisManagerForPass(llvm::LazyCallGraph&, llvm::LazyCallGraph::SCC&, llvm::LazyCallGraph::Node&, llvm::CGSCCAnalysisManager&, llvm::CGSCCUpdateResult&, llvm::FunctionAnalysisManager&, bool)::<lambda(llvm::Function&)>: Assertion `RefereeN && "Visited function should already have an associated node"' failed.

The key seems to be the special function getenv(). If I rename the function, the crash does not occur.

Note that our internal builder found this while trying to build compiler-rt/lib/profile/InstrProfilingFile.c using the newly built compiler. If you think this might take a while to debug, could you please revert the change while you investigate?

psamolysov added a comment.EditedAug 26 2022, 3:45 AM

No problem, I've reverted the commit while I need some time to build clang with the reverted commit even to make it clear the commit is guilty.

I'm sorry. It's very interesting, in @mstorsjo case, a function from the standard C library is used: strtof. When I renamed it in the test into strtof2, the problem has becomes not reproducible.

@Michael137 It looks like some DWARF generation/debugger test can fail after the revertion.

Noted, thanks for the heads up

AFAIK, the only test that'd now fail on the debug-info side is: https://reviews.llvm.org/D132664

reduced:

define void @a() {
entry:
  %call = call float @strtof(ptr noundef null, ptr noundef null)
  ret void
}

define internal float @strtof(ptr noundef %0, ptr noundef %1) nounwind {
entry:
  ret float 0.0
}

./build/rel/bin/opt -passes='inline,argpromotion' -disable-output /tmp/b.ll

likely something to do with how the CGSCC pass manager handles lib function (see isKnownLibFunction in LazyCallGraph.cpp)

aeubanks reopened this revision.Aug 26 2022, 2:20 PM
This revision is now accepted and ready to land.Aug 26 2022, 2:20 PM

@aeubanks Thank you for the investigation! I believe this patch can be re-landed after D132764 is committed.

This revision was automatically updated to reflect the committed changes.
This comment was removed by zequanwu.
aeubanks added a comment.EditedSep 1 2022, 8:53 AM

I've run into https://github.com/llvm/llvm-project/issues/56503 after this patch so I've reverted this for now while I'm fixing that. Sorry for this patch exposing so many pre-existing issues. Will reland after fixing.

that was fixed and I've relanded the patch

hopefully there aren't any more CGSCC issues this uncovers

@aeubanks Thank you very much for the re-landing.