This is an archive of the discontinued LLVM Phabricator instance.

[Pipeline] Adjust PostOrderFunctionAttrs placement in simplification pipeline
ClosedPublic

Authored by aeubanks on Mar 2 2023, 6:56 PM.

Details

Summary

We can infer more attribute information once functions are fully
simplified, so move the PostOrderFunctionAttrs pass after the function
simplification pipeline. However, just doing this can impact
simplification of recursive functions since function simplification
takes advantage of function attributes of callees (some LLVM tests are
actually impacted by this), so keep a copy of PostOrderFunctionAttrs
before the function simplification pipeline that only runs on recursive
functions.

For example, this fixes the small regression noticed in https://reviews.llvm.org/D128830.

This requires some restructuring of the CGSCC NoRerun feature. We need
to cache the ShouldNotRunFunctionPassesAnalysis analysis after the
simplification is done, which now is after the second
PostOrderFunctionAttrs run, rather than after the function
simplification pipeline.

Compile time impact:
https://llvm-compile-time-tracker.com/compare.php?from=33cf40122279342b50f92a3a53f5c185390b6018&to=1bb2a07875634e508a6bdf2ca1b130f55510f060&stat=instructions:u

Compile time increase from unconditionally running the first PostOrderFunctionAttrs:
https://llvm-compile-time-tracker.com/compare.php?from=1bb2a07875634e508a6bdf2ca1b130f55510f060&to=f4f87e89cc7a35c64e3a103a8036192a84ae002b&stat=instructions:u

Diff Detail

Event Timeline

aeubanks created this revision.Mar 2 2023, 6:56 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 2 2023, 6:56 PM
aeubanks requested review of this revision.Mar 2 2023, 6:56 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 2 2023, 6:56 PM
aeubanks edited the summary of this revision. (Show Details)Mar 2 2023, 7:01 PM
nikic added a comment.EditedMar 3 2023, 12:18 AM

Basically LGTM, I was experimenting with the same change recently. I'd like to have a PhaseOrdering test that demonstrates the benefit of this change more clearly though. (And of course, this needs updates to pipeline tests.)

aeubanks updated this revision to Diff 502200.Mar 3 2023, 11:25 AM

fix pipeline tests
add phase ordering test

nikic added inline comments.Mar 3 2023, 12:59 PM
llvm/include/llvm/Transforms/IPO/FunctionAttrs.h
57

We also need to support this in pipeline parsing/printing I believe.

aeubanks updated this revision to Diff 502306.Mar 3 2023, 4:20 PM

add pipeline text parsing

nikic accepted this revision.Mar 4 2023, 12:45 AM

LGTM

This revision is now accepted and ready to land.Mar 4 2023, 12:45 AM
This revision was landed with ongoing or failed builds.Mar 6 2023, 9:02 AM
This revision was automatically updated to reflect the committed changes.
cfang added a subscriber: cfang.Jun 15 2023, 10:31 AM
cfang added inline comments.
llvm/lib/Passes/PassBuilderPipelines.cpp
838

We found that "nofree" attribute will greatly affect the earlyCSE pass in function simplification. In our case, missing nofree essentially disabled earlyCSE which resulted in a huge performance regression (register pressure). Other optimizations like GVN in function simplification may also be affected by function attributes.

So is it possible that we just run this pass twice? Thanks.

aeubanks added inline comments.Jun 15 2023, 11:06 AM
llvm/lib/Passes/PassBuilderPipelines.cpp
838

Can you give an example where EarlyCSE depends on function attributes? That doesn't really make sense to me, function attributes should really be used by callers of the function, not the function itself.

cfang added inline comments.Jun 15 2023, 2:45 PM
llvm/lib/Passes/PassBuilderPipelines.cpp
838

EarlyCSE can make use of the attributes of the functions it calls.
The pass manager should be running passes such that the attributes
of callees are known before running optimizations on the callers

aeubanks added inline comments.Jun 15 2023, 3:03 PM
llvm/lib/Passes/PassBuilderPipelines.cpp
838

yes the pass manager already takes care of visiting callees before callers with the CGSCCPassManager. the only case where we need to run function-attrs before the function simplification pipeline is when you have recursive functions, which this patch handles.

so I'm still not understanding where you'd be seeing a regression

cfang added inline comments.Jun 15 2023, 4:46 PM
llvm/lib/Passes/PassBuilderPipelines.cpp
838

MainCGPipeline.addPass(PostOrderFunctionAttrsPass(/*SkipNonRecursive*/ false));

I switch SkipNonRecursive between false and true , and compare IR around the following earlyCSE pass in function simplification pipeline:

FPM.addPass(EarlyCSEPass(true /* Enable mem-ssa. */));
  1. Before this pass, the only difference is that when SkipNonRecursive == false; there is a nofree function attribute, but I am not clear about the difference in the callee. Some of them seem losing the "convergent" attribute
  1. After this earlyCSE pass, we see a huge IR difference, which I don't know the reason yet. But the difference is due to SkipNonRecursive flip.

So something goes not what we think it should be.

cfang added inline comments.Jun 23 2023, 12:20 PM
llvm/lib/Passes/PassBuilderPipelines.cpp
838

addArgumentAttrs(Nodes.SCCNodes, Changed);

Looks to me this above function was not called to deduce function argument attributes.
For our case, missing WriteOnly/ReadOnly attributes affects memorySSA for earlyCSE, and we have observed may laods are not CSEd.

nikic added inline comments.Jun 23 2023, 12:27 PM
llvm/lib/Passes/PassBuilderPipelines.cpp
838

Please provide a PhaseOrdering test.

cfang added a comment.Jun 23 2023, 5:49 PM

llvm/lib/Passes/PassBuilderPipelines.cpp
838

Here is a simple test case where argpromotion is a pass between the first function attribute recognition and function simplification.
These argument attributes will affect optimizations like earlyCSE.

  • func-arg-attrs.ll -----------------------------------------

; RUN: opt < %s -O3 -print-after=argpromotion -disable-output 2>&1 | FileCheck %s

define i32 @func_arg_attrs(ptr %p1) {
; CHECK-LABEL: define i32 @func_arg_attrs
; CHECK-SAME: (ptr nocapture readonly %p1) local_unnamed_addr #{{[0-9]+}} {
;

%v1 = load i32, ptr %p1
ret i32 %v1

}

nikic added inline comments.Jun 24 2023, 12:40 AM
llvm/lib/Passes/PassBuilderPipelines.cpp
838

A PhaseOrdering test is a minimal test that shows an end-to-end optimization difference for the whole -O3 pipeline before and after the change.

It's obvious that this changes the inferred attributes at specific pipeline positions, we are interested in how that affects the overall optimization pipeline.

cfang added inline comments.Jun 25 2023, 12:05 PM
llvm/lib/Passes/PassBuilderPipelines.cpp
838

It is apparent that the missing argument attribute due to this patch will affect many optimization in the function simplifications.
Yes we can try to reduce the giant test case for a simple one, but it is not possible to cover all cases that will be affected. I hope you won't ask for all cases that will be affected by argument attrs. Thanks.

nikic added inline comments.Jun 25 2023, 12:24 PM
llvm/lib/Passes/PassBuilderPipelines.cpp
838

No, this is not apparent at all. The inferred attributes are, to the most part, not relevant to the optimization of the function itself, only its callers. That's why it is sufficient to infer them at the end of function simplification for non-recursive functions.

Clearly, this does not hold up for your particular case, and we would like to understand why that is. However, your description of the issue has not been helpful in the least, and we are unlikely to make progress here without a test case.

cfang added a comment.Jun 25 2023, 4:13 PM

llvm/lib/Passes/PassBuilderPipelines.cpp
838

Fair enough. Out case shows that the argument attributes will affect early-cse in the function simplification pipeline.
Here is a test case that show the early-cse difference:

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-attributes --version 2
; RUN: opt < %s -O3 -print-before=speculative-execution -S -disable-output 2>&1 | FileCheck %s

define i32 @arg_readonly_early_cse(ptr noalias %addr.i, ptr noalias %otheraddr) {
; CHECK: Function Attrs: mustprogress nofree norecurse nounwind willreturn
; CHECK-LABEL: define i32 @arg_readonly_early_cse
; CHECK-SAME: (ptr noalias nocapture readonly [[ADDR_I:%.*]], ptr noalias nocapture readnone [[OTHERADDR:%.*]]) local_unnamed_addr #[[ATTR0:[0-9]+]] {
; CHECK-NEXT: [[A:%.*]] = load i32, ptr [[ADDR_I]], align 4
; CHECK-NEXT: fence acquire
; CHECK-NEXT: ret i32 0
;

%a = load i32, ptr %addr.i, align 4
fence acquire
%a2 = load i32, ptr %addr.i, align 4
%res = sub i32 %a, %a2
ret i32 %res

}

aeubanks added inline comments.Jun 26 2023, 10:25 AM
llvm/lib/Passes/PassBuilderPipelines.cpp
838
define i32 @arg_readonly_early_cse(ptr noalias %addr.i, ptr noalias %otheraddr) {
  %a = load i32, ptr %addr.i, align 4
  fence acquire
  %a2 = load i32, ptr %addr.i, align 4
  %res = sub i32 %a, %a2
  ret i32 %res
}

opt -p function-attrs,instcombine optimizes most of it away but opt -p instcombine doesn't (same with early-cse<memssa>).

this boils down to this checking noalias + readonly on arguments.

we could unconditionally perform argument attribute inference in the first function-attrs pass

cfang added inline comments.Jun 26 2023, 12:47 PM
llvm/lib/Passes/PassBuilderPipelines.cpp
838

we could unconditionally perform argument attribute inference in the first function-attrs pass

This looks reasonable to me. Thanks!

nikic added inline comments.Jun 26 2023, 1:05 PM
llvm/lib/Passes/PassBuilderPipelines.cpp
838

Thanks for the test case! The core issue here seems to be a discrepancy between FunctionAttrs inference capabilities and AA ModRef modelling for fences. The first two question I would ask are:

  1. Is the inference performed by FunctionAttrs actually correct? Apparently the inference of the readonly/writeonly argument attributes ignores fences completely, which seems rather unlikely to be correct. AA models fences as ModRef-ing all memory, and FunctionAttrs memory attribute inference does the same.
  1. If FunctionAttrs is actually correct, can AA be improved to return the same result? I would expect this to be along the lines of "if the memory location is an unescaped identified local object, the fence cannot modref the location".

If the answer to one of those questions is "yes", we should fix the corresponding pass. If the answer to both is "no", we can look into recovering this attribute inference.

cfang added a subscriber: arsenm.Jun 26 2023, 2:17 PM
cfang added inline comments.Jun 26 2023, 2:23 PM
llvm/lib/Passes/PassBuilderPipelines.cpp
838

Your patch claims the function argument attributes won't affect the optimizations in function simplification pipeline. The test case just provided one example to show that earlyCSE depends on some of the attributes. If you claim is not correct, I think the best way is to add back the argument attribute inference.

aeubanks added inline comments.Jun 27 2023, 5:41 PM
llvm/lib/Passes/PassBuilderPipelines.cpp
838

Is the inference performed by FunctionAttrs actually correct? Apparently the inference of the readonly/writeonly argument attributes ignores fences completely, which seems rather unlikely to be correct. AA models fences as ModRef-ing all memory, and FunctionAttrs memory attribute inference does the same.

this treats fences as only Ref-ing constant memory, which seems to align with the LangRef:

noalias
This indicates that memory locations accessed via pointer values based on the argument or return value are not also accessed, during the execution of the function, via pointer values not based on the argument or return value. This guarantee only holds for memory locations that are modified, by any means, during the execution of the function.

I've sent out https://reviews.llvm.org/D153932 to make AA better about fences

cfang added a comment.Jul 10 2023, 3:57 PM

cfang added inline comments.Jul 10 2023, 4:04 PM
llvm/lib/Passes/PassBuilderPipelines.cpp
838

The attached arg-attrs-test.ll is another test case that shows the effect of the kernel argument attribute on earlyCSE. Personally I believe, unless you can prove the argument attributes won't affect anything in function simplification pipeline, you can not ignore the argument attribute recognition pass before function simplification.

It is not appropriate to try to find and fix all tests that may be affected.

Thanks for help.

nikic added inline comments.Jul 11 2023, 2:18 AM
llvm/lib/Passes/PassBuilderPipelines.cpp
838

Thanks for the additional test case. The problem here is that we hit the MemorySSA clobber walk limit. For readonly noalias arguments we'll instead have liveOnEntry as the defining access, which bypasses the limit.

I think the different limit behavior is sufficient motivation to perform argument attr inference in the early FuncAttrs runs. I checked, and the compile-time impact is pretty low: http://llvm-compile-time-tracker.com/compare.php?from=74258f4189e2b6bacabd40ee6f588fd9d1b37741&to=7f7484aee497cdae817b4aa24aed8046f430a816&stat=instructions%3Au

cfang added inline comments.Jul 11 2023, 8:57 PM
llvm/lib/Passes/PassBuilderPipelines.cpp
838

Performing argument attr inference in the early FuncAttrs runs is sufficient to recover the performance issue we encountered. Thank much!

cfang added inline comments.Jul 14 2023, 9:55 AM
llvm/lib/Passes/PassBuilderPipelines.cpp
838

@nikic are you going to put that patch up for review?

Hi, @nikic and @aeubanks

Now that we agree performing argument attr inference in the early FuncAttrs runs,
can you plan to move ahead to make the change soon? Thanks.

Hi, @nikic and @aeubanks

Now that we agree performing argument attr inference in the early FuncAttrs runs,
can you plan to move ahead to make the change soon? Thanks.

@cfang could you take over the patch?