Page MenuHomePhabricator

[ThinLTO] Add noRecurse and noUnwind thinlink function attribute propagation
ClosedPublic

Authored by modimo on Aug 17 2017, 3:55 PM.

Details

Summary

Thinlink provides an opportunity to propagate function attributes across modules, enabling additional propagation opportunities.

This change propagates (currently default off, turn on with disable-thinlto-funcattrs=1) noRecurse and noUnwind based off of function summaries of the prevailing functions in bottom-up call-graph order. Testing on clang self-build:

  1. There's a 35-40% increase in noUnwind functions due to the additional propagation opportunities.
  2. Throughput is measured at 10-15% increase in thinlink time which itself is 1.5% of E2E link time.

Implementation-wise this adds the following summary function attributes:

  1. noUnwind: function is noUnwind
  2. mayThrow: function contains a non-call instruction that Instruction::mayThrow returns true on (e.g. windows SEH instructions)
  3. hasUnknownCall: function contains calls that don't make it into the summary call-graph thus should not be propagated from (e.g. indirect for now, could add no-opt functions as well)

Testing:
Clang self-build passes and 2nd stage build passes check-all
ninja check-all with newly added tests passing

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Remove llvm/test/ThinLTO/X86/weak_externals.ll from diffs

modimo added a comment.EditedAug 24 2021, 5:31 PM

Checking build timing in release mode Clang self-build. Looking at purely thinlink timing:

ModeTime (s)
base2.254
base + propagation2.556
noinline8.870
noinline + propagation10.215

So 13% in base and 15% with noinline which seems reasonable for what it's doing.

@tejohnson Indirect calls are not captured in FunctionSummaries in CallGraph or in a flag form saying they exist. Also looks like speculative candidates for ICP do make it on the graph. For this analysis we need to bail out on indirect calls so I think the simplest way is to add a flag indicating the presence of them (In FunFlags?). As for the speculative candidates, it's probably not too big of a deal.

Good point on indirect calls. Rather than add a bit to the summary, can the flags just be set conservatively in any function containing an indirect call when we build the summaries initially? I think that would get the same effect. For speculative devirtualization aka ICP, we will still be left with a fallback indirect call, so it would still need to be treated conservatively. The extra edges added for ICP promotion candidates shouldn't be a problem or affect this.

Note that with class hierarchy analysis we can do better for virtual calls, e.g. when -fwhole-program-vtables is enabled for whole program devirtualization and we have the necessary whole program visibility on vtables. We could potentially integrate WPD decision here. Even if we can't find a single devirtualization target, we can compute the set of all possible targets of virtual calls during the thin link and could use that information to basically merge the attributes from all possible targets. But probably best to leave that as a future refinement as it will take some additional work to get that hooked up. We'd still need to be conservative for virtual calls when we don't have the necessary type info (when -fwhole-program-vtables not enabled or we don't have whole program visibility on the vtable defs), or for non-virtual indirect calls.

llvm/lib/Transforms/IPO/FunctionAttrs.cpp
378

I'm not sure how this case could be happening as we haven't actually done the importing that would create these new available externally copies yet - that happens in the LTO backends, during the thin link we just add them to import lists.

383

There is no prevailing copy presumably because the prevailing copy is in a native library being linked? I think these cases can be handled conservatively.

408

You can just unconditionally do the getBaseObject call on the GVS without casting to AliasSummary. For non-AliasSummary it will just return itself.

414

I think this checking for virtual calls will only work if -fwhole-program-vtables is enabled for whole program devirtualization or CFI. Otherwise we don't have the type tests that cause this to get populated. This also won't detect non-virtual indirect calls.

442

Since the linker which invokes this should have been passed all objects to link, bitcode and native, it can do symbol resolution across all of them. So if there is an overriding native strong symbol, it should see that and the bitcode resolution would be non-prevailing and all bitcode copies marked dead (in computeDeadSymbols). So I think the weak any and linkonce any case can take the prevailing copy.

modimo marked 3 inline comments as done.Aug 26 2021, 9:31 PM

Good point on indirect calls. Rather than add a bit to the summary, can the flags just be set conservatively in any function containing an indirect call when we build the summaries initially? I think that would get the same effect.

That could have an issue where A calls {indirect, B} and A gets propagated upon from B without knowledge that the indirect call exists. Right now I've got a FunFlags hasUnknownCall which marks these as non-propagatable.

For speculative devirtualization aka ICP, we will still be left with a fallback indirect call, so it would still need to be treated conservatively. The extra edges added for ICP promotion candidates shouldn't be a problem or affect this.

Ah good point. I was thinking it may pessimize the propagation because of having to process all of these edges this is a no-go because of the fallback.

Note that with class hierarchy analysis we can do better for virtual calls, e.g. when -fwhole-program-vtables is enabled for whole program devirtualization and we have the necessary whole program visibility on vtables. We could potentially integrate WPD decision here. Even if we can't find a single devirtualization target, we can compute the set of all possible targets of virtual calls during the thin link and could use that information to basically merge the attributes from all possible targets. But probably best to leave that as a future refinement as it will take some additional work to get that hooked up. We'd still need to be conservative for virtual calls when we don't have the necessary type info (when -fwhole-program-vtables not enabled or we don't have whole program visibility on the vtable defs), or for non-virtual indirect calls.

Agreed, it's an engineering problem more than anything. I ran an optimistic build (same revisions as before, release + noinline) where indirect and virtual calls were assumed to always propagate (thinlto_prop_optimistic) and the effect in Clang self-build at least is not too large:

thinlto_base/

"dwarfehprepare.NumCleanupLandingPadsRemaining": 217515,
"dwarfehprepare.NumNoUnwind": 299126,
"dwarfehprepare.NumUnwind": 332785,

thinlto_prop/

"dwarfehprepare.NumCleanupLandingPadsRemaining": 158372,
"dwarfehprepare.NumNoUnwind": 420918,
"dwarfehprepare.NumUnwind": 210870,

thinlto_prop_optimistic/

"dwarfehprepare.NumCleanupLandingPadsRemaining": 154958,
"dwarfehprepare.NumNoUnwind": 425893,
"dwarfehprepare.NumUnwind": 205889,

(425893-420918)/(420918-299126) = 4% boost over being conservative and correct. This may change in real workloads though so I added a thinlto-funcattrs-optimistic-indirect flag for easy measurement.

llvm/lib/Transforms/IPO/FunctionAttrs.cpp
378

I added the test funcattrs-prop-exported-internal.ll that demonstrates this. The summary has its internal linkage converted to external in thinLTOResolvePrevailingInIndex then converted to AvailableExternally in thinLTOResolvePrevailingGUID. Currently being handled conservatively since a prevailing copy doesn't exist.

383

Yeah the prevailing copy is in the native binary.

This is a C++ specific feature which has ODR and these are already being propagated/inlined from in pre-link. The current thinlink propagation implementation is conservative because a prevailing copy doesn't exist. Currently being handled conservatively since a prevailing copy doesn't exist.

414

I see. I added hasUnknownCall as an explicit flag for all indirect calls that should capture both cases.

442

That makes it much easier and everything folds into the prevailing case! Changed and added a test for it.

modimo updated this revision to Diff 369024.Aug 26 2021, 9:32 PM
modimo marked 2 inline comments as done.

Use prevailing for linkonce/weak. Add hasUnknownCall to model virtual and indirect calls

modimo updated this revision to Diff 369158.Aug 27 2021, 12:06 PM

Check for CachedAttributes count in map rather than value so conservative results are not re-calculated

Good point on indirect calls. Rather than add a bit to the summary, can the flags just be set conservatively in any function containing an indirect call when we build the summaries initially? I think that would get the same effect.

That could have an issue where A calls {indirect, B} and A gets propagated upon from B without knowledge that the indirect call exists. Right now I've got a FunFlags hasUnknownCall which marks these as non-propagatable.

Ah, because there isn't a conservative setting of the flag...which raises a larger issue (but maybe I am completely missing something) - how do we distinguish between the NoUnwind summary flag not being set because we don't know yet (in which case we want the propagation from callees), vs because it cannot be NoUnwind (because there is a throw in the function)? Do we need to have a second flag indicating that a function contains a mayThrow instruction (other than calls, which are being handled by the propagation)?

Good point on indirect calls. Rather than add a bit to the summary, can the flags just be set conservatively in any function containing an indirect call when we build the summaries initially? I think that would get the same effect.

That could have an issue where A calls {indirect, B} and A gets propagated upon from B without knowledge that the indirect call exists. Right now I've got a FunFlags hasUnknownCall which marks these as non-propagatable.

Ah, because there isn't a conservative setting of the flag...which raises a larger issue (but maybe I am completely missing something) - how do we distinguish between the NoUnwind summary flag not being set because we don't know yet (in which case we want the propagation from callees), vs because it cannot be NoUnwind (because there is a throw in the function)? Do we need to have a second flag indicating that a function contains a mayThrow instruction (other than calls, which are being handled by the propagation)?

Only call instructions can throw (what ultimately performs the throw operation is an opaque call to __cxa_throw()) which simplifies the problem. If all calls are known, we only need to examine the callees for accurate propagation.

Good point on indirect calls. Rather than add a bit to the summary, can the flags just be set conservatively in any function containing an indirect call when we build the summaries initially? I think that would get the same effect.

That could have an issue where A calls {indirect, B} and A gets propagated upon from B without knowledge that the indirect call exists. Right now I've got a FunFlags hasUnknownCall which marks these as non-propagatable.

Ah, because there isn't a conservative setting of the flag...which raises a larger issue (but maybe I am completely missing something) - how do we distinguish between the NoUnwind summary flag not being set because we don't know yet (in which case we want the propagation from callees), vs because it cannot be NoUnwind (because there is a throw in the function)? Do we need to have a second flag indicating that a function contains a mayThrow instruction (other than calls, which are being handled by the propagation)?

Only call instructions can throw (what ultimately performs the throw operation is an opaque call to __cxa_throw()) which simplifies the problem. If all calls are known, we only need to examine the callees for accurate propagation.

What about the other instruction checks done in Instruction::mayThrow i.e. http://llvm-cs.pcc.me.uk/lib/IR/Instruction.cpp#592?

modimo added a comment.EditedSep 9 2021, 8:14 PM

Good point on indirect calls. Rather than add a bit to the summary, can the flags just be set conservatively in any function containing an indirect call when we build the summaries initially? I think that would get the same effect.

That could have an issue where A calls {indirect, B} and A gets propagated upon from B without knowledge that the indirect call exists. Right now I've got a FunFlags hasUnknownCall which marks these as non-propagatable.

Ah, because there isn't a conservative setting of the flag...which raises a larger issue (but maybe I am completely missing something) - how do we distinguish between the NoUnwind summary flag not being set because we don't know yet (in which case we want the propagation from callees), vs because it cannot be NoUnwind (because there is a throw in the function)? Do we need to have a second flag indicating that a function contains a mayThrow instruction (other than calls, which are being handled by the propagation)?

Only call instructions can throw (what ultimately performs the throw operation is an opaque call to __cxa_throw()) which simplifies the problem. If all calls are known, we only need to examine the callees for accurate propagation.

What about the other instruction checks done in Instruction::mayThrow i.e. http://llvm-cs.pcc.me.uk/lib/IR/Instruction.cpp#592?

Good point! CleanupReturnInst and CatchSwitchInst are windows SEH specific representations for asynchronous exceptions but definitely should be covered for correctness. For ResumeInst it's the "return" of a landing pad and in order for a landing pad to be reachable AFAIK an invoke must exist so is captured by the call graph. I'll add a scan for Instruction::mayThrow in summary building. Having a mayThrow flag or making NoUnwind a tri-state flag in the summary makes sense to me to capture this case.

As a side note to why there's a check for ResumeInst at all: an invoke instructions actually never has "mayThrow" set. I haven't delved too deep but this could be changed to check for invokes instead since a dead landing pad at attribute inference time can lead to pessimization of NoUnwind in cases I've seen (alternatively, making sure CFG opts run before this to make sure this is pruned away).

modimo updated this revision to Diff 372398.Sep 13 2021, 9:47 PM

Add mayThrow flag

Ok thanks. I need to go through the propagation code and tests again more closely now, but one question/suggestion below in the meantime.

llvm/include/llvm/IR/ModuleSummaryIndex.h
580

Now that we have MayThrow, can we avoid a separate hasUnknownCall bool and just conservatively set MayThrow true in that case?

Ok thanks. I need to go through the propagation code and tests again more closely now, but one question/suggestion below in the meantime.

Thanks!

llvm/include/llvm/IR/ModuleSummaryIndex.h
580

hasUnknownCall is used for norecurse and other future flags as well to stop propagation.

tejohnson added inline comments.Sep 16 2021, 6:14 PM
llvm/include/llvm/IR/ModuleSummaryIndex.h
580

Ah that makes sense.

Thanks for your patience, finally had a chance to go through everything much more carefully. Looks good, mostly a bunch of small or nitpicky final suggestions. The main comment/question of significance relates to where hasUnknownCall is being set currently.

Patch title and summary need an update.

clang/test/CodeGen/thinlto-funcattr-prop.ll
15

This is checking the summary generated by opt, not the result of the llvm-lto2 run.

llvm/include/llvm/IR/ModuleSummaryIndex.h
575

No Unwind needs a comment. And probably a note that it will be updated by function attr propagation. Depends on how we want to handle inline asm calls and other cases that currently set this true below (see my comment there).

580

nit, maybe change this to hasIndirectCall which I think is more specific?

llvm/include/llvm/LTO/LTO.h
26

Is this needed?

llvm/lib/Analysis/ModuleSummaryAnalysis.cpp
379

Should this be moved below the following checks for inline asm and direct calls? (Not sure what the direct calls case is given that we handle direct calls to "known functions" above though).

If it should stay where it is and treat the below cases as unknown, probably should add tests for them.

llvm/lib/Transforms/IPO/FunctionAttrs.cpp
336

Suggest renaming calculateDefinitiveAttributes and CachedAttributes to something like calculatePrevailingSummary and CachedPrevailingSummary which are more accurate now.

478

You've already set InferredFlags.NoUnwind to false above this loop in the case where MayThrow was set on the CallerSummary.

493

I think you can remove this and the below setNoUnwind() call on CachedAttributes[V] since presumably this points to one of the function summaries we update in the below loop.

525

nit: suggest "Insert propagated function attributes from the Index ..."

531

Consider consolidating this function with thinLTOResolvePrevailingInModule, to reduce the number of walks of the module and lookups into the DefinedGlobals map.

llvm/test/ThinLTO/X86/funcattrs-prop-indirect.ll
3 ↗(On Diff #372398)

Have a second version that tests with -thinlto-funcattrs-optimistic-indirect? I don't see a test for that option anywhere. Or maybe just remove that option - is it really needed?

9 ↗(On Diff #372398)

Perhaps this CHECK-NOT should just look for "Function Attrs:" as you do in some other tests below, in case some other attr is ever added to the IR that isn't related to this propagation, which could allow the CHECK-NOT to succeed for the wrong reasons?

llvm/test/ThinLTO/X86/funcattrs-prop.ll
9

Since linkonce and weak are interposable, it isn't really correct to say that individual callers may optimize using different copies (we try to prevent this in the compiler since the are interposable).

30

s/recursing/throwing/ on these 2 comments?

64

Suggest putting comments above this one and call_weak_may_unwind below to indicate why one gets the nounwind and the other doesn't (i.e. that the thin link command above selects as prevailing the nounwind version of linkonce_may_unwind from b.ll and the may throw version of weak_may_unwind from c.ll)

77

For clarity on what this is actually testing, suggest renaming these as ATTR_NOUNWIND and ATTR_MAYTHROW, or something like that (they are both norecurse, so the current name is a little misleading as to what is being checked).

tejohnson added inline comments.Sep 22 2021, 10:45 AM
llvm/include/llvm/IR/ModuleSummaryIndex.h
575

Woops, the second sentence here "Depends on how we want to handle inline asm calls and other cases that currently set this true below (see my comment there)." I meant to add to the comment further below about the name of hasUnknownCall.

modimo updated this revision to Diff 374431.Sep 22 2021, 8:10 PM
modimo marked 23 inline comments as done.

Address feedback, rename funcattrs-prop-indirect.ll to funcattrs-prop-unknown.ll

clang/test/CodeGen/thinlto-funcattr-prop.ll
15

Fixed.

llvm/include/llvm/IR/ModuleSummaryIndex.h
580

My thinking is that the flag is a catch-all for blocking propagation and could conceivably be set for other reasons. It also matches the existing usage in FunctionAttrs.cpp for local propagation which also sets this for functions that are OptNone.

llvm/include/llvm/LTO/LTO.h
26

Yeah, thinLTOPropagateFunctionAttrs resides in FunctionAttrs.h and runThinLTO calls it to propagate.

llvm/lib/Analysis/ModuleSummaryAnalysis.cpp
379

Any call that isn't emitted to the summary CallGraphEdges is a hole in propagation knowledge.

Direct calls case is from https://reviews.llvm.org/D40056 which is handling:

; Test calls that aren't handled either as direct or indirect.
call void select (i1 icmp eq (i32* @global, i32* null), void ()* @f, void ()* @g)()

Neat that select can be consolidated into a call, though I wonder if it should be allowed given it could be canonicalized to be another IR instruction above it and maybe eliminate this edge case.

Tangent aside, since in all these cases the call isn't part of the static callgraph HasUnknownCall needs to be set for correctness. Tests added in funcattrs-prop-unknown.ll (replacing funcattrs-prop-indirect.ll since we're handling more than just indirect here).

llvm/lib/Transforms/IPO/FunctionAttrs.cpp
478

Good catch, this case should be querying CalleeSummary MayThrow.

493

Makes sense, removed. I like keeping the stats/debug tracking around though.

531

Good idea, merged and renamed thinLTOResolvePrevailingInModule to thinLTOFinalizeInModule

llvm/test/ThinLTO/X86/funcattrs-prop-indirect.ll
3 ↗(On Diff #372398)

Good point, option removed.

llvm/test/ThinLTO/X86/funcattrs-prop.ll
9

True, this comment is a left-over from the first interpretation of linkage models which have since been fixed with your help :). I'll update this to reflect that only prevailing matters here.

modimo retitled this revision from [ThinLTO] Add norecurse function attribute propagation to [ThinLTO] Add noRecurse and noUnwind thinlink function attribute propagation.Sep 22 2021, 8:34 PM
modimo edited the summary of this revision. (Show Details)

Few follow ups below.

clang/test/CodeGen/thinlto-funcattr-prop.ll
17

I believe this corresponds to call_extern - why aren't we getting noRecurse and noUnwind propagated here?

(also, suggest adding a comment above each of these summaries as to what function name they correspond to)

llvm/include/llvm/Transforms/IPO/FunctionImport.h
232

Suggest either removing default since you are always passing this argument, or default it to true and stop passing it in the places where it is true (since generally we want this to be true except in a few ThinLTOCodeGenerator.cpp locations that are testing specific things that don't involve propagation). Some preference for the former option (removing default), to make sure any new callers that get added think through the appropriate value.

llvm/lib/Transforms/IPO/FunctionAttrs.cpp
479

Please make sure one of the may throw propagation tests would fail without this fix (i.e. when it was checking the caller's maythrow setting).

llvm/lib/Transforms/IPO/FunctionImport.cpp
1139

Can this be merged with updateLinkage so we only do the DefinedGlobals lookup once per symbol?

llvm/test/ThinLTO/X86/funcattrs-prop.ll
7

Nit, line length

modimo updated this revision to Diff 374723.Sep 23 2021, 9:17 PM
modimo marked 3 inline comments as done.

Address follow-ups

clang/test/CodeGen/thinlto-funcattr-prop.ll
17

Tracing through llvm-lto2 the index is written out by CombinedIndexHook before the rest of thinlink including attribute propagation takes place. The attributes do end up successfully getting propagated, I'll add a check for that in the *1.promote.bc which shows the outcome of the attributes being propagated.

Good idea, added the function name that correspond to each summary.

llvm/lib/Transforms/IPO/FunctionAttrs.cpp
479

Thinking more on why this didn't manifest strange behavior: because of the BU order of call-graph traversal any callee that has mayThrow will have its inferred noUnwind set to false above. Checking again in the caller is redundant because the noUnwind property of the callee will be determined by its value of noUnwind only. I think removing this check completely makes sense.

I can think of a scenario where there are mayThrow instructions but the function is still marked noUnwind (noexcept function with a throw in it) but in that case it is safe to propagate upwards because any exception will fail to escape this callee and so checking mayThrow would actually be a pessimization. I added a case in funcattrs-prop-maythrow.ll to illustrate this.

llvm/lib/Transforms/IPO/FunctionImport.cpp
1139

Sure, merged.

tejohnson accepted this revision.Sep 23 2021, 10:19 PM

lgtm (one minor comment issue noted below). Thanks!

clang/test/CodeGen/thinlto-funcattr-prop.ll
17

Incomplete sentence, seems to be missing the rest of the explanation about when it is written.

This revision is now accepted and ready to land.Sep 23 2021, 10:19 PM
modimo updated this revision to Diff 374934.Sep 24 2021, 12:49 PM
modimo marked an inline comment as done.

Complete explanation in thinlto-funcattr-prop.ll, also fix up diff to contain all changes.

clang/test/CodeGen/thinlto-funcattr-prop.ll
17

Nice catch, sentence is now complete.

This revision was landed with ongoing or failed builds.Sep 27 2021, 12:28 PM
This revision was automatically updated to reflect the committed changes.

Thanks for the thorough review @tejohnson! I'll do additional validation on FB code that uses exceptions and if that all looks good I'll send up a change to turn this default on with the findings.