This is an archive of the discontinued LLVM Phabricator instance.

[clang][NewPM] Add -fno-experimental-new-pass-manager to tests
ClosedPublic

Authored by leonardchan on Jun 11 2019, 12:02 PM.

Details

Summary

As per the discussion on D58375, we disable test that have optimizations under the new PM. This patch adds -fno-experimental-new-pass-manager to RUNS that:

  • Already run with optimizations (-O1 or higher) that were missed in D58375.
  • Explicitly test new PM behavior along side some new PM RUNS, but are missing this flag if new PM is enabled by default.
  • Specify -O without the number. Based on getOptimizationLevel(), it seems the default is 2, and the IR appears to be the same when changed to -O2, so update the test to explicitly say -O2 and provide -fno-experimental-new-pass-manager`.

Diff Detail

Repository
rL LLVM

Event Timeline

leonardchan created this revision.Jun 11 2019, 12:02 PM

I understand the change to explicitly say -O2. I also understand the change to add an explicit -fno-experimental-new-pass-manager to a RUN line when we have another RUN line that explicitly uses -fexperiemntal-new-pass-manager.

But for many of these cases, there is no new-PM RUN line in the test. If we're going to fix one RUN line to a particular pass manager, we should fix both IMO unless there is something quite odd about the IR being tested that makes the result of optimizinng be weirdly different between the two pass managers. That would be somewhat surprising, so I kinda want to know if we actually see that so often to understand what is causing this divergence.

In general, Clang tests shouldn't be checking such complex optimizations that the differences between the pass managers really manifests.....

Herald added a project: Restricted Project. · View Herald TranscriptJun 14 2019, 10:46 AM

I did some more digging and found the following differences between PMs for each test, and they seem to all differ and can be fixed for different reasons.

CodeGen/aggregate-assign-call.c: The new PM on -O1 codegen produces the do/while loop differently but still emits the lifetime start/end intrinsics around temporary variables, which is what the test checks for. Here we can just check for the instructions while ignoring the branches generated.

CodeGen/arm_acle.c: A bunch of instructions seem to have been combined into a call to llvm.fshl.i32, so we just check for those under the new PM.

CodeGen/available-externally-suppress.c: available_externally was not emitted during -O2 -flto runs when it should still be retained for link time inlining purposes. This can be fixed by checking that we aren't LTOPrelinking when adding the EliminateAvailableExternallyPass.

CodeGen/pgo-sample.c [No new fix]: The test checks that PruneEH runs, but there doesn’t seem to be a corresponding new PM pass for it. Should there be? If there is one then we can just check for that in the debug PM dump.

CodeGen/x86_64-instrument-functions.c [No new fix]: This one's a bit complicated. The initial problem seems to be that EntryExitInstrumenterPass was not added into the pipeline to emit __cyg_profile_func_enter/exit() calls. Adding that to the pipeline seems to instrument correctly now and add these calls, but we run into a problem with inlining. root() expects 2 calls to __cyg_profile_func_enter and 2 calls to __cyg_profile_func_exit in its body,

; Function Attrs: nounwind
define i32 @root(i32 returned %x) #0 {
entry:
  %0 = tail call i8* @llvm.returnaddress(i32 0)
  tail call void @__cyg_profile_func_enter(i8* bitcast (i32 (i32)* @root to i8*), i8* %0) #2
  tail call void @__cyg_profile_func_enter(i8* bitcast (i32 (i32)* @leaf to i8*), i8* %0) #2
  tail call void @__cyg_profile_func_exit(i8* bitcast (i32 (i32)* @leaf to i8*), i8* %0) #2
  tail call void @__cyg_profile_func_exit(i8* bitcast (i32 (i32)* @root to i8*), i8* %0) #2
  ret i32 %x
}

but we get only 1 call

; Function Attrs: norecurse nounwind readnone
define i32 @root(i32 returned %x) local_unnamed_addr #0 {
entry:
  %0 = call i8* @llvm.returnaddress(i32 0)
  call void @__cyg_profile_func_enter(i8* bitcast (i32 (i32)* @root to i8*), i8* %0)
  %1 = call i8* @llvm.returnaddress(i32 0)
  call void @__cyg_profile_func_exit(i8* bitcast (i32 (i32)* @root to i8*), i8* %1)
  ret i32 %x
}

I suspect this is due to the leaf() being inlined into root() even though inlining should happen after it. The legacy EntryExitInstrumenter pass seems to be added to the legacy FunctionPassManager first before all others passes. Under the legacy PM, when this pass runs, root() should look like:

; Function Attrs: nounwind
define i32 @root(i32 %x) #1 {
entry:
  %x.addr = alloca i32, align 4
  store i32 %x, i32* %x.addr, align 4, !tbaa !2
  %0 = load i32, i32* %x.addr, align 4, !tbaa !2
  %call = call i32 @leaf(i32 %0)  ; this call is not yet inlined
  ret i32 %call
}

but root() looks like this in the new PM pass:

; Function Attrs: norecurse nounwind readnone
define i32 @root(i32 returned %x) local_unnamed_addr #1 {
entry:
  ret i32 %x
}

CodeGenCXX/auto-var-init.cpp: Auto var initialization is performed differently under new PM than legacy. With initialization for structs with default values, the legacy PM will store the stream of 0xAAs, then initialize the members with default values in a constructor, whereas the new PM will just store the correct value for the whole struct right away without initialization. In one case, it seems that ctor function was also inlined. Fixed by check accounting for these checks separately.

CodeGenCXX/conditional-temporaries.cpp: The new pm seems to be unable to statically determine the return value of some functions. For now I just add separate checks for the new PM IR.

CodeGenCXX/member-function-pointer-calls.cpp: Same as CodeGenCXX/conditional-temporaries.cpp. In this case, the new PM codegen also contains calls to lifetime start/end intrinsics. Perhaps this time it could be due to the intrinsics not being optimized out?

CodeGenObjC/os_log.m: Legacy test expects argument of ptrtoint and some functions to be a specific type and argument, though the new PM codegen uses a different type and argument which are also valid since @llvm.objc.retainAutoreleasedReturnValue(val) always returns val.

CodeGenObjCXX/os_log.mm: A function seems to have been inlined under the new PM. Here we just prevent inlining for the new PM run since the test expects this specific function call.

Misc/pr32207.c [No new fix]: The banner is something that appears only in the legacy PM. I think the test only checks that the banner is printed and not so much that the particular pass is run.

I did some more digging and found the following differences between PMs for each test, and they seem to all differ and can be fixed for different reasons.

Awesome, and sorry this is such a big effort, but I think these are all going to end up being pretty interesting root causes. I guess yay, Clang's testing is doing its job?

CodeGen/aggregate-assign-call.c: The new PM on -O1 codegen produces the do/while loop differently but still emits the lifetime start/end intrinsics around temporary variables, which is what the test checks for. Here we can just check for the instructions while ignoring the branches generated.

Weird, but makes sense.

CodeGen/arm_acle.c: A bunch of instructions seem to have been combined into a call to llvm.fshl.i32, so we just check for those under the new PM.

Same.

CodeGen/available-externally-suppress.c: available_externally was not emitted during -O2 -flto runs when it should still be retained for link time inlining purposes. This can be fixed by checking that we aren't LTOPrelinking when adding the EliminateAvailableExternallyPass.

Yikes, this is a good bug to fix!

CodeGen/pgo-sample.c [No new fix]: The test checks that PruneEH runs, but there doesn’t seem to be a corresponding new PM pass for it. Should there be? If there is one then we can just check for that in the debug PM dump.

The analog would be function-attrs and function(simplify-cfg) in some combination. Maybe this should just check that some specific thing is optimized away at -O2 rather than checking the specific details of pass pipeline construction?

CodeGen/x86_64-instrument-functions.c [No new fix]: This one's a bit complicated. The initial problem seems to be that EntryExitInstrumenterPass was not added into the pipeline to emit __cyg_profile_func_enter/exit() calls. Adding that to the pipeline seems to instrument correctly now and add these calls, but we run into a problem with inlining. root() expects 2 calls to __cyg_profile_func_enter and 2 calls to __cyg_profile_func_exit in its body,

; Function Attrs: nounwind
define i32 @root(i32 returned %x) #0 {
entry:
  %0 = tail call i8* @llvm.returnaddress(i32 0)
  tail call void @__cyg_profile_func_enter(i8* bitcast (i32 (i32)* @root to i8*), i8* %0) #2
  tail call void @__cyg_profile_func_enter(i8* bitcast (i32 (i32)* @leaf to i8*), i8* %0) #2
  tail call void @__cyg_profile_func_exit(i8* bitcast (i32 (i32)* @leaf to i8*), i8* %0) #2
  tail call void @__cyg_profile_func_exit(i8* bitcast (i32 (i32)* @root to i8*), i8* %0) #2
  ret i32 %x
}

but we get only 1 call

; Function Attrs: norecurse nounwind readnone
define i32 @root(i32 returned %x) local_unnamed_addr #0 {
entry:
  %0 = call i8* @llvm.returnaddress(i32 0)
  call void @__cyg_profile_func_enter(i8* bitcast (i32 (i32)* @root to i8*), i8* %0)
  %1 = call i8* @llvm.returnaddress(i32 0)
  call void @__cyg_profile_func_exit(i8* bitcast (i32 (i32)* @root to i8*), i8* %1)
  ret i32 %x
}

I suspect this is due to the leaf() being inlined into root() even though inlining should happen after it. The legacy EntryExitInstrumenter pass seems to be added to the legacy FunctionPassManager first before all others passes. Under the legacy PM, when this pass runs, root() should look like:

; Function Attrs: nounwind
define i32 @root(i32 %x) #1 {
entry:
  %x.addr = alloca i32, align 4
  store i32 %x, i32* %x.addr, align 4, !tbaa !2
  %0 = load i32, i32* %x.addr, align 4, !tbaa !2
  %call = call i32 @leaf(i32 %0)  ; this call is not yet inlined
  ret i32 %call
}

but root() looks like this in the new PM pass:

; Function Attrs: norecurse nounwind readnone
define i32 @root(i32 returned %x) local_unnamed_addr #1 {
entry:
  ret i32 %x
}

Yeah, I think the entry/exit pass really wants to come *before* any other pass. That should be possible even in the new PM by using an extension point?

CodeGenCXX/auto-var-init.cpp: Auto var initialization is performed differently under new PM than legacy. With initialization for structs with default values, the legacy PM will store the stream of 0xAAs, then initialize the members with default values in a constructor, whereas the new PM will just store the correct value for the whole struct right away without initialization. In one case, it seems that ctor function was also inlined. Fixed by check accounting for these checks separately.

Cool, makes sense.

CodeGenCXX/conditional-temporaries.cpp: The new pm seems to be unable to statically determine the return value of some functions. For now I just add separate checks for the new PM IR.

Interesting. Definitely the right thing for this patch. Maybe file a PR to track understanding why the new PM struggles here.

CodeGenCXX/member-function-pointer-calls.cpp: Same as CodeGenCXX/conditional-temporaries.cpp. In this case, the new PM codegen also contains calls to lifetime start/end intrinsics. Perhaps this time it could be due to the intrinsics not being optimized out?

Weird, but that's also my best guess. Again, maybe we need a bug to track why this is different.

CodeGenObjC/os_log.m: Legacy test expects argument of ptrtoint and some functions to be a specific type and argument, though the new PM codegen uses a different type and argument which are also valid since @llvm.objc.retainAutoreleasedReturnValue(val) always returns val.

CodeGenObjCXX/os_log.mm: A function seems to have been inlined under the new PM. Here we just prevent inlining for the new PM run since the test expects this specific function call.

You could also use a noinline attribute in the code to express the *specific* call that needs to be retained?

Misc/pr32207.c [No new fix]: The banner is something that appears only in the legacy PM. I think the test only checks that the banner is printed and not so much that the particular pass is run.

Yeah, this seems far enough down in the weeds that just making it PM-specific is about the only option.

clang/lib/CodeGen/BackendUtil.cpp
1135–1139 ↗(On Diff #204808)

FYI, feel free to split out the entry/exit change (and its test) to a separate patch if you want. Either way really.

llvm/lib/Passes/PassBuilder.cpp
815–818 ↗(On Diff #204808)

I'd suggest splitting this into a separate patch to LLVM and adding an LLVM-side test to cover it as well.

leonardchan updated this revision to Diff 205708.EditedJun 19 2019, 4:40 PM
leonardchan marked 3 inline comments as done.

CodeGen/pgo-sample.c [No new fix]: The test checks that PruneEH runs, but there doesn’t seem to be a corresponding new PM pass for it. Should there be? If there is one then we can just check for that in the debug PM dump.

The analog would be function-attrs and function(simplify-cfg) in some combination. Maybe this should just check that some specific thing is optimized away at -O2 rather than checking the specific details of pass pipeline construction?

For now I added a check that those 2 extra passes are run in the debug pass dump. It seems that the original test just wanted to see that a particular pass was run, so I left it at that, but I can still add a separate test/run for checking that function-attrs and function(simplify-cfg) replace invoke instructions at -O2 (which is what I think PruneEH does).

Yeah, I think the entry/exit pass really wants to come *before* any other pass. That should be possible even in the new PM by using an extension point?

Moved this into D63577.

CodeGenCXX/conditional-temporaries.cpp: The new pm seems to be unable to statically determine the return value of some functions. For now I just add separate checks for the new PM IR.

Interesting. Definitely the right thing for this patch. Maybe file a PR to track understanding why the new PM struggles here.

CodeGenCXX/member-function-pointer-calls.cpp: Same as CodeGenCXX/conditional-temporaries.cpp. In this case, the new PM codegen also contains calls to lifetime start/end intrinsics. Perhaps this time it could be due to the intrinsics not being optimized out?

Weird, but that's also my best guess. Again, maybe we need a bug to track why this is different.

Created https://bugs.llvm.org/show_bug.cgi?id=42333 to track these 2.

CodeGenObjC/os_log.m: Legacy test expects argument of ptrtoint and some functions to be a specific type and argument, though the new PM codegen uses a different type and argument which are also valid since @llvm.objc.retainAutoreleasedReturnValue(val) always returns val.

CodeGenObjCXX/os_log.mm: A function seems to have been inlined under the new PM. Here we just prevent inlining for the new PM run since the test expects this specific function call.

You could also use a noinline attribute in the code to express the *specific* call that needs to be retained?

Hmm noinline doesn't seem to be working in this case. The call is made to a code generated builtin function (__os_log_helper).

clang/lib/CodeGen/BackendUtil.cpp
1135–1139 ↗(On Diff #204808)

Separated this into D63577

llvm/lib/Passes/PassBuilder.cpp
815–818 ↗(On Diff #204808)

Separated this into D63580 and added an LLVM test in that one

just a minor comment on one of these...

clang/test/CodeGen/pgo-sample.c
10 ↗(On Diff #205708)

s/PrunEH/PruneEH/

12–14 ↗(On Diff #205708)

The DAG worries me a bit ... The point here is to check that we remove EH before attaching sample profile data, and with the DAG it isn't clear that this happens *before*.

leonardchan marked 3 inline comments as done.
leonardchan added inline comments.
clang/test/CodeGen/pgo-sample.c
12–14 ↗(On Diff #205708)

It turns out that SampleProfileLoaderPass was added before either of these. I moved this into a separate patch that addresses it since it comes with a small clang + llvm change (D63626).

This revision is now accepted and ready to land.Jun 20 2019, 7:32 PM
This revision was automatically updated to reflect the committed changes.