This is an archive of the discontinued LLVM Phabricator instance.

[ThinLTO] Add noRecurse and noUnwind thinlink function attribute propagation
ClosedPublic

Authored by modimo on Aug 17 2017, 3:55 PM.

Details

Summary

Thinlink provides an opportunity to propagate function attributes across modules, enabling additional propagation opportunities.

This change propagates (currently default off, turn on with disable-thinlto-funcattrs=1) noRecurse and noUnwind based off of function summaries of the prevailing functions in bottom-up call-graph order. Testing on clang self-build:

  1. There's a 35-40% increase in noUnwind functions due to the additional propagation opportunities.
  2. Throughput is measured at 10-15% increase in thinlink time which itself is 1.5% of E2E link time.

Implementation-wise this adds the following summary function attributes:

  1. noUnwind: function is noUnwind
  2. mayThrow: function contains a non-call instruction that Instruction::mayThrow returns true on (e.g. windows SEH instructions)
  3. hasUnknownCall: function contains calls that don't make it into the summary call-graph thus should not be propagated from (e.g. indirect for now, could add no-opt functions as well)

Testing:
Clang self-build passes and 2nd stage build passes check-all
ninja check-all with newly added tests passing

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
ncharlie added inline comments.Oct 13 2017, 11:49 AM
lib/Transforms/IPO/FunctionImport.cpp
438

This is currently failing a test case that uses printf. Since printf doesn't have a summary associated with it (i.e. the SummaryList is empty), I can't determine if it recurses and have to fail out early.

Is there some spot I should add code to create a FunctionSummary for external functions?

tejohnson added inline comments.Oct 13 2017, 12:20 PM
lib/Transforms/IPO/FunctionImport.cpp
438

How is printf handled in the full LTO case? Does it have attributes indicating that it is no recurse? Otherwise I think treating it conservatively is the only option.

ncharlie added inline comments.Oct 13 2017, 12:56 PM
lib/Transforms/IPO/FunctionImport.cpp
438

In FullLTO it's able to determine that printf doesn't recurse by analyzing the callsite (http://llvm-cs.pcc.me.uk/lib/Transforms/IPO/FunctionAttrs.cpp#1068). Maybe I could modify ModuleSummaryAnalysis so it adds inserts the flags into the FunctionSummary while it still has access to the CallSite?

tejohnson added inline comments.Oct 13 2017, 1:08 PM
lib/Transforms/IPO/FunctionImport.cpp
438

So then that means something is able to mark the external function call as NoRecurse. Maybe because it is a known libcall.

The question is what would you mark in the summary during ModuleSummaryAnalysis? Specifically, if the same function containing the call to a NoRecurse external function contained another call to a different function (external or not) that is not NoRecurse, the caller's function summary flags will conservatively not say NoRecurse.

So in that case it does seem useful to have function summaries for external functions that contain this info. However, that might have effects in other places. My suggestion is to be conservative now, and we can think about making that enhancement as a follow-on. (Unless you had something different in mind - let me know as I haven't thought through this too closely for awhile)

ncharlie updated this revision to Diff 128852.Jan 6 2018, 11:08 AM

Skip functions that don't have a FunctionSummary. Return an empty root if a callgraph root can't be found.

ncharlie added inline comments.Jan 6 2018, 11:09 AM
lib/Transforms/IPO/FunctionImport.cpp
438

Ok, I've added logic to skip over functions that don't have a FunctionSummary associated.

Sorry again for the delay on the reviews, not sure how I missed that they were ready. Feel free to ping me much more frequently if I don't respond - weekly is good!

include/llvm/IR/ModuleSummaryIndex.h
662 ↗(On Diff #128852)

Remove braces. Also, does this need to be part of D36311?

lib/LTO/LTO.cpp
1051

ThinLTOCodeGenerator is used by ld64 and other linkers that use the legacy LTO interface, and can be tested using llvm-lto. We should have these calls for both LTO APIs. I think earlier you had them only in ThinLTOCodeGenerator, and I asked to add them here, but they got removed from there in the process. If you can put it back then your test can add llvm-lto based testing lines as well. Unfortunately we still have these two LTO interfaces and need to keep them in sync.

lib/Transforms/IPO/FunctionImport.cpp
432

Add a TODO here to consider adding summaries for external nodes to hold function attribute information.

440

I assume we skip when NoRecurse is already true because it means it was already analyzed? If so, add comment to that effect.

446

Isn't a node with a GUID of 0 the dummy root node (as per comment above)? Would we ever have an edge to it?

457

Above we already returned if this flag was true, so I think this if check is unnecessary.

modimo added a subscriber: modimo.Jun 1 2021, 4:49 PM

@tejohnson Do you know more about the status of this patch/@ncharlie? I'd like to help push this forward and can take over if they're no longer available.

@tejohnson Do you know more about the status of this patch/@ncharlie? I'd like to help push this forward and can take over if they're no longer available.

The patch has been dormant since @ncharlie hasn't been active in LLVM for awhile, and no one else has had a chance to pick it up. Would be great if you would like to take it over and push it forward.

modimo added a comment.Jun 2 2021, 6:13 PM

@tejohnson Do you know more about the status of this patch/@ncharlie? I'd like to help push this forward and can take over if they're no longer available.

The patch has been dormant since @ncharlie hasn't been active in LLVM for awhile, and no one else has had a chance to pick it up. Would be great if you would like to take it over and push it forward.

Can do, I'll send out an updated revision. Should I directly edit this diff or create a new one linked back?

@tejohnson Do you know more about the status of this patch/@ncharlie? I'd like to help push this forward and can take over if they're no longer available.

The patch has been dormant since @ncharlie hasn't been active in LLVM for awhile, and no one else has had a chance to pick it up. Would be great if you would like to take it over and push it forward.

Can do, I'll send out an updated revision. Should I directly edit this diff or create a new one linked back?

Hmm, maybe best if you do a Commandeer Revision and take this one over, that way we keep the comment history.

modimo commandeered this revision.Jun 7 2021, 11:43 PM
modimo added a reviewer: ncharlie.
modimo updated this revision to Diff 350508.Jun 7 2021, 11:46 PM

Rebased, cleaned up and updated implementation. Also fixed up all the tests to be correct. Note that currently lld/COFF tests fail from LLD_IN_TEST=2 which runs tests in memory twice. I checked that this also fails from doing SCC traversal/update from synthetic function entry counts so it's not exclusive to my changes. I'm looking into this but sending this out in the mean time for review.

Herald added a project: Restricted Project. · View Herald TranscriptJun 7 2021, 11:46 PM
modimo updated this revision to Diff 350509.Jun 8 2021, 12:00 AM

@tejohnson
I was looking at the initial implementation and looking at purely the front entry of the SummaryList seems suspect given:

  1. There could be multiple entries from linkonce_odr
  2. The first entry isn't guaranteed to be a FunctionSummary

Looking around I saw that SafeStack (https://github.com/llvm/llvm-project/commit/47552a614a8c95e1817d83755a4a6a2508da7f8a) took these into account with findCalleeFunctionSummary which to me makes more sense as an implementation. For function attribute propagation, we can also handle linkonce_odr naively by doing a conservative union of the attributes.

More aggressively, propagating based off of the prevailing definition would be more powerful/simpler but there's some nuance shown in the linkonce_functionattrs_comdat.ll test added. If we propagated the prevailing definition g would be marked as norecurse. However, if the local copy of f is inlined into g that attribute would be incorrect and AFAICT currently thinLTO doesn't enforce using the prevailing definition everywhere presumably because it will break code that relies on using the local copy.

I'll definitely measure the compile time cost of this. Wanted to get some feedback on if this path makes sense first.

modimo updated this revision to Diff 351606.Jun 11 2021, 7:18 PM
modimo edited the summary of this revision. (Show Details)

Clean up logic around bailing out. Separate out thinLTO statistics from general statistics. Run clang-format.

Gathering some stats on Clang release thinlto self-build:
"function-attrs.NumThinLTONoRecurse": 1542018,

Which isn't unexpected given that the vast majority of functions are norecurse. The counting statistics here are a bit skewed because they are done per summary updated and with operating on weak symbols it may make more sense to count once per ValueInfo.

Notably no other statistics shifted with enabling/disabling propagation and a search through LLVM shows only globalopt processInternalGlobal actually uses norecurse ATM. Stats-wise this doesn't fire in either case so it makes sense why codegen doesn't shift.

In that sense, propagating norecurse doesn't seem very valuable. I then tested out nounwind as a better candidate given that currently propagation halts on de-refinable/weak functions when done per-module. Processing in the index, we know every definition and if there's any external definitions meaning propagation can be done confidently through these functions.

Gathering some stats I added (D104161) on a release thinlto -fno-inline build (inlining duplicates landing pads which makes comparisons harder):

thinlto/

"dwarfehprepare.NumFunctionsProcessed": 630099,
"dwarfehprepare.NumCleanupLandingPads": 214916,
"dwarfehprepare.NumNoUnwind": 299357,

thinlto_withpropagation/

"dwarfehprepare.NumFunctionsProcessed": 629976,
"dwarfehprepare.NumCleanupLandingPads": 160635,
"dwarfehprepare.NumNoUnwind": 412014,

Shows a 412014/299357=38% increase in NoUnwind functions and a 160635/214916=25% decrease in the number of cleanup landing pads which is a considerable reduction in EH code.

Measuring performance of Thin Link on Clang self-build I'm seeing a pretty nasty hit of 10s baseline -> 24s with propagation. I'm investigating because as-is this is way too much.

Sorry for not being responsive. I've been out of office but will be back Monday. I skimmed through your notes, thanks for all the stats, it looks like nounwind is a good direction. Regarding linkonce_odr, I would think you should be able to take the union of their attributes, since they should be interchangeable. There should not be a requirement to use the local copy for linkonce_odr. After propagation, wouldn't their attributes be the same (i.e. regardless of inlining, since the callee attributes should presumably propagate up into the callee)? How much difference would it cause on your statistics and on the compile time to not use the prevailing copy?

Sorry for not being responsive. I've been out of office but will be back Monday.

No worries!

I skimmed through your notes, thanks for all the stats, it looks like nounwind is a good direction. Regarding linkonce_odr, I would think you should be able to take the union of their attributes, since they should be interchangeable.

Looking into this more my example in linkonce_functionattrs_comdat.ll is UB given that it's violating language ODR with 2 functionally different definitions. I think under the langref we're safe to propagate off of any single copy of a linkonce_odr however practically speaking taking the union of all of them may be safer and also handle cases where different attributes appear due to de-refinement.

There should not be a requirement to use the local copy for linkonce_odr. After propagation, wouldn't their attributes be the same (i.e. regardless of inlining, since the callee attributes should presumably propagate up into the callee)?

In the original approach without taking the union a different copy could propagate "norecurse" to the caller but the local definition could actually recurse. If this is inlined, we'll get a caller with the "norecurse" attribute incorrectly which in reality does recurse. Taking the conservative union fixes this issue.

Sorry for not being responsive. I've been out of office but will be back Monday.

No worries!

Thanks again for working on this! Finally had a chance to go through the whole patch.

I skimmed through your notes, thanks for all the stats, it looks like nounwind is a good direction. Regarding linkonce_odr, I would think you should be able to take the union of their attributes, since they should be interchangeable.

Looking into this more my example in linkonce_functionattrs_comdat.ll is UB given that it's violating language ODR with 2 functionally different definitions. I think under the langref we're safe to propagate off of any single copy of a linkonce_odr however practically speaking taking the union of all of them may be safer and also handle cases where different attributes appear due to de-refinement.

There should not be a requirement to use the local copy for linkonce_odr. After propagation, wouldn't their attributes be the same (i.e. regardless of inlining, since the callee attributes should presumably propagate up into the callee)?

In the original approach without taking the union a different copy could propagate "norecurse" to the caller but the local definition could actually recurse. If this is inlined, we'll get a caller with the "norecurse" attribute incorrectly which in reality does recurse. Taking the conservative union fixes this issue.

One possibility is to just look at the copy the linker chose as prevailing. We can pass in the isPrevailing callback (see the call to thinLTOInternalizeAndPromoteInIndex just before your new calls to thinLTOPropagateFunctionAttrs).

I have some other ideas listed in the embedded comments for hopefully making it a bit more efficient, and some other comments/questions.

llvm/lib/Transforms/IPO/FunctionAttrs.cpp
101 ↗(On Diff #351606)

Might be good to commit this off by default at first, and enable for the new tests. Then it will be easier to do more extensive testing (correctness, compile time, performance), e.g. for our internal apps.

334 ↗(On Diff #351606)

In general there needs to be better comments below for the various cases. I see, this looks to be cloned from the StackSafety version, which unfortunately did not undergo a code review before commit and I missed until now... Some questions below that you'll probably need to investigate.

343 ↗(On Diff #351606)

We should never have !hasAliasee() here. That should only be true in a couple special cases which don't apply here (in the backends when reading a partial index file emitted for distributed ThinLTO, or when building summaries when reading llvm assembly).

348 ↗(On Diff #351606)

If the linkage is local and we have more than one summary for this guid we can probably just quit early - that should be a weird corner case that can be handled conservatively, i.e. by not propagating. Normally we expect that local symbols from different modules will have different guid's and therefore ValueInfos because the guid is computed by prepending the module path. We could have a guid alias if there wasn't any distinguishing path when each file was compiled, but that should be rare and we can just punt.

351 ↗(On Diff #351606)

What if the other summaries are seen after this external linkage summary in the list?

Presumably these cases would be when we have a strong def that overrides weak definitions. In either case the external linkage symbol would presumably have been prevailing, and we can probably assert on that fact here if we pass in the isPrevailing callback.

355 ↗(On Diff #351606)

For weak and the below available externally and linkonce cases, we presumably could:

  • pick the first one if ODR
  • pick the prevailing copy in all cases

I think? We can pass in the isPrevailing callback (see the calls to thinLTOInternalizeAndPromoteInIndex just before the calls you added to thinLTOPropagateFunctionAttrs).

365 ↗(On Diff #351606)

Can this handling be folded into the above loop so that we don't have to walk the list of summaries again? I.e. a lambda called for each summary before adding to the list of summaries.

Also, I think the whole while loop could be replaced with something like:

if (FunctionSummary *FS = dyn_cast<FunctionSummary>(S->getBaseObject()))
  ResolvedSummaries.push_back(FS);
else
  return {};

See my specific notes below about some of the cases currently being handled.

372 ↗(On Diff #351606)

We should never have !hasAliasee() here. That should only be true in a couple special cases which don't apply here (in the backends when reading a partial index file emitted for distributed ThinLTO, or when building summaries when reading llvm assembly).

375 ↗(On Diff #351606)

I don't understand what would cause this case.

410 ↗(On Diff #351606)

It seems like we would do the same thing here many times for a frequently called function. Can we save some info in a lazily built map and reuse it when already computed? I.e. either the result of getFunctionSummaries or even better a bool of whether any of those summaries might recurse.

422 ↗(On Diff #351606)

Can this be folded into the above loop, i.e. where it is currently doing the std::for_each and adding to calleeSummaries? Just update calleesMightRecurse there instead of adding to a new set and walking again here.

428 ↗(On Diff #351606)

Instead of adding the new setFFlags and doing the checking first here, perhaps just add an interface to set the NoRecurse flag on S and call it unconditionally? Eventually setters can be added for other flags as needed.

modimo marked 7 inline comments as done.Jul 7 2021, 12:26 AM
modimo added inline comments.
llvm/lib/Transforms/IPO/FunctionAttrs.cpp
101 ↗(On Diff #351606)

Sure, flipped.

334 ↗(On Diff #351606)

Agreed, I've added summarizing comments on what we're doing here.

343 ↗(On Diff #351606)

Makes sense, changed.

348 ↗(On Diff #351606)

That makes sense, punting on it.

365 ↗(On Diff #351606)

I think the original intention from the unreviewed code is that peel through indirection layers of AS->AS->FS. I suspect this isn't too common, I can try adding some diagnostic code to see how many indirections we need as part of the patch.

375 ↗(On Diff #351606)

This and the rest of the functions are being re-implemented from top-down.

modimo updated this revision to Diff 356881.Jul 7 2021, 12:47 AM
modimo marked an inline comment as done.

I'm headed out on vacation until Aug. 1st so sending an update of WIP-ish revision. Will definitely pick this up again when I get back :D

Appreciate the review and the explanations @tejohnson. I did a full refactoring of how we calculate attributes and added a caching layer.

I added WIP nounwind propagation so folks can look it over and try it out, known deficiencies:

  1. thinLTO version hasn't been updated for the new field. Will definitely do that (or split it out from NoRecurse as a separate change?)
  2. testing needs to be cleaned up for this field

Also:

  1. Need to evaluate timetrace to see if the caching strategy is better for build time
  2. Add quite a bit more testing to validate how we're handling all the different linkage scenarios

For reference on nounwind propagation now:
Build compiler with this change on top of b16400449fc763fdae2d2ce809ce61c88acb6684 building cd0a1226b50081e86eb75a89d01e8782423971a0

thinlto/

"dwarfehprepare.NumCleanupLandingPadsRemaining": 216754,
"dwarfehprepare.NumNoUnwind": 300957,
"dwarfehprepare.NumUnwind": 332785,

thinlto_withpropagation/

"dwarfehprepare.NumCleanupLandingPadsRemaining": 90401,
"dwarfehprepare.NumNoUnwind": 513744,
"dwarfehprepare.NumUnwind": 119360,

Thanks - I know you are still working on this, but I had a few comments so far. I haven't had a chance to test it yet. Unfortunately, the nounwind propagation shouldn't do much on our side as we disable exceptions internally.

clang/test/CodeGen/thinlto-distributed-cfi-devirt.ll
39 ↗(On Diff #356881)

Do we expect the new noUnwind flag to show up here somewhere?

clang/test/CodeGen/thinlto-distributed-cfi.ll
27 ↗(On Diff #356881)

Ditto

llvm/include/llvm/IR/ModuleSummaryIndex.h
592 ↗(On Diff #356881)

The capitalization here is off - the LLVM asm parsing expects lower camel case.

llvm/lib/Transforms/IPO/FunctionAttrs.cpp
354 ↗(On Diff #356881)

Of the below cases, 1 can happen and we should just do something conservative. 2 and 3 should not and we can assert.

373 ↗(On Diff #356881)

When does this happen?

405 ↗(On Diff #356881)

If for some reason (GUID alias due to local name without paths or other rare case) we get a non FunctionSummary, just early return {} and remove a level of nesting?

408 ↗(On Diff #356881)

This can happen for the reasons mentioned in your comment above. No need for an error message, just early return {} for conservative behavior.

421 ↗(On Diff #356881)

This should never happen. The linker should already have given a multiply defined symbol error.

443 ↗(On Diff #356881)

Won't we still have a copy marked prevailing? Wondering if the weak linkage cases can all be merged.

445 ↗(On Diff #356881)

I don't understand this case. If there is no prevailing symbol in the IR for this GUID then presumably it should not have been marked live. Are you seeing this kick in?

453 ↗(On Diff #356881)

In all of the cases here, other than the hasNonODR case which I don't understand (yet), we should have a single prevailing FunctionSummary. Can we just cache that, rather than e.g. copying all of its callees?

568 ↗(On Diff #356881)

ReadNone and ReadOnly aren't getting propagated yet afaict, so probably add a note to that effect here.

576 ↗(On Diff #356881)

Nit: suggest simplifying all of these to something like:
if (FS->fflags().NoRecurse && !F.doesNotRecurse())

F.setDoesNotRecurse();
modimo marked 13 inline comments as done.Aug 20 2021, 7:53 PM
modimo added inline comments.
clang/test/CodeGen/thinlto-distributed-cfi-devirt.ll
39 ↗(On Diff #356881)

Yeah. I changed the printing logic back to not print anything when no flags are set since quite a few tests depended on this behavior so these tests ultimately remain unchanged. I added a new test here to explicitly check that the flags from propagation show up correctly.

llvm/include/llvm/IR/ModuleSummaryIndex.h
592 ↗(On Diff #356881)

Changing back to camel case, also added parsing code for noUnwind.

llvm/lib/Transforms/IPO/FunctionAttrs.cpp
373 ↗(On Diff #356881)

Good question. I looked further into exactly what triggers this in clang self-build and there are summaries which are AvailableExternally but have no Prevailing copy. I added more about this in the source comments but TL;DR these end up being edge cases that can be ignored.

Changed the logic to go conservative in these cases.

405 ↗(On Diff #356881)

Sounds good, removed

443 ↗(On Diff #356881)

Yeah, there will still be a copy that's prevailing. Reading through the linkage descriptions again and also those in FunctionImportGlobalProcessing::getLinkage:

  1. I think with External/WeakODR/LinkOnceODR once the prevailing is found use that copy
  1. For Weak/LinkOnce even with a prevailing copy I don't know if the copy ultimately used will be prevailing. I'm wondering if a native definition could be the victor in which case we just can't propagate off these functions.

WDYT about (2)? For C++ at least these don't seem to really exist and testing with Clang self-build I'm not seeing this kick in anywhere. I added a flag to specifically disable this case so it's easy to test out the differences.

445 ↗(On Diff #356881)

Same answer as above:

Good question. I looked further into exactly what triggers this in clang self-build and there are summaries which are AvailableExternally but have no Prevailing copy. I added more about this in the comments but TL;DR these end up being edge cases that can be ignored.

Changed the logic to go conservative in these cases.

453 ↗(On Diff #356881)

Good idea, I'll do that. The combined case I suspect will be quite rare in C++ code so the cache only captures the FunctionSummaries and leaves generating a merged callee graph at use time.

modimo updated this revision to Diff 367937.Aug 20 2021, 7:58 PM
modimo marked 3 inline comments as done.
modimo edited the summary of this revision. (Show Details)

Adding more test cases and changed logic around weak linkages

Herald added a project: Restricted Project. · View Herald TranscriptAug 20 2021, 7:58 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript

@tejohnson Indirect calls are not captured in FunctionSummaries in CallGraph or in a flag form saying they exist. Also looks like speculative candidates for ICP do make it on the graph. For this analysis we need to bail out on indirect calls so I think the simplest way is to add a flag indicating the presence of them (In FunFlags?). As for the speculative candidates, it's probably not too big of a deal.

Thanks - I know you are still working on this, but I had a few comments so far. I haven't had a chance to test it yet. Unfortunately, the nounwind propagation shouldn't do much on our side as we disable exceptions internally.

I've been iterating on this with Clang self-build with exceptions enabled. Once this gets into a good state logically I'll start testing on some of our internal workloads which generally enable exceptions and report back on the findings.

modimo updated this revision to Diff 368174.Aug 23 2021, 12:33 PM

Minor test fixups

modimo updated this revision to Diff 368178.Aug 23 2021, 12:38 PM

Remove llvm/test/ThinLTO/X86/weak_externals.ll from diffs

modimo added a comment.EditedAug 24 2021, 5:31 PM

Checking build timing in release mode Clang self-build. Looking at purely thinlink timing:

ModeTime (s)
base2.254
base + propagation2.556
noinline8.870
noinline + propagation10.215

So 13% in base and 15% with noinline which seems reasonable for what it's doing.

@tejohnson Indirect calls are not captured in FunctionSummaries in CallGraph or in a flag form saying they exist. Also looks like speculative candidates for ICP do make it on the graph. For this analysis we need to bail out on indirect calls so I think the simplest way is to add a flag indicating the presence of them (In FunFlags?). As for the speculative candidates, it's probably not too big of a deal.

Good point on indirect calls. Rather than add a bit to the summary, can the flags just be set conservatively in any function containing an indirect call when we build the summaries initially? I think that would get the same effect. For speculative devirtualization aka ICP, we will still be left with a fallback indirect call, so it would still need to be treated conservatively. The extra edges added for ICP promotion candidates shouldn't be a problem or affect this.

Note that with class hierarchy analysis we can do better for virtual calls, e.g. when -fwhole-program-vtables is enabled for whole program devirtualization and we have the necessary whole program visibility on vtables. We could potentially integrate WPD decision here. Even if we can't find a single devirtualization target, we can compute the set of all possible targets of virtual calls during the thin link and could use that information to basically merge the attributes from all possible targets. But probably best to leave that as a future refinement as it will take some additional work to get that hooked up. We'd still need to be conservative for virtual calls when we don't have the necessary type info (when -fwhole-program-vtables not enabled or we don't have whole program visibility on the vtable defs), or for non-virtual indirect calls.

llvm/lib/Transforms/IPO/FunctionAttrs.cpp
383 ↗(On Diff #368178)

I'm not sure how this case could be happening as we haven't actually done the importing that would create these new available externally copies yet - that happens in the LTO backends, during the thin link we just add them to import lists.

388 ↗(On Diff #368178)

There is no prevailing copy presumably because the prevailing copy is in a native library being linked? I think these cases can be handled conservatively.

413 ↗(On Diff #368178)

You can just unconditionally do the getBaseObject call on the GVS without casting to AliasSummary. For non-AliasSummary it will just return itself.

419 ↗(On Diff #368178)

I think this checking for virtual calls will only work if -fwhole-program-vtables is enabled for whole program devirtualization or CFI. Otherwise we don't have the type tests that cause this to get populated. This also won't detect non-virtual indirect calls.

443 ↗(On Diff #356881)

Since the linker which invokes this should have been passed all objects to link, bitcode and native, it can do symbol resolution across all of them. So if there is an overriding native strong symbol, it should see that and the bitcode resolution would be non-prevailing and all bitcode copies marked dead (in computeDeadSymbols). So I think the weak any and linkonce any case can take the prevailing copy.

modimo marked 3 inline comments as done.Aug 26 2021, 9:31 PM

Good point on indirect calls. Rather than add a bit to the summary, can the flags just be set conservatively in any function containing an indirect call when we build the summaries initially? I think that would get the same effect.

That could have an issue where A calls {indirect, B} and A gets propagated upon from B without knowledge that the indirect call exists. Right now I've got a FunFlags hasUnknownCall which marks these as non-propagatable.

For speculative devirtualization aka ICP, we will still be left with a fallback indirect call, so it would still need to be treated conservatively. The extra edges added for ICP promotion candidates shouldn't be a problem or affect this.

Ah good point. I was thinking it may pessimize the propagation because of having to process all of these edges this is a no-go because of the fallback.

Note that with class hierarchy analysis we can do better for virtual calls, e.g. when -fwhole-program-vtables is enabled for whole program devirtualization and we have the necessary whole program visibility on vtables. We could potentially integrate WPD decision here. Even if we can't find a single devirtualization target, we can compute the set of all possible targets of virtual calls during the thin link and could use that information to basically merge the attributes from all possible targets. But probably best to leave that as a future refinement as it will take some additional work to get that hooked up. We'd still need to be conservative for virtual calls when we don't have the necessary type info (when -fwhole-program-vtables not enabled or we don't have whole program visibility on the vtable defs), or for non-virtual indirect calls.

Agreed, it's an engineering problem more than anything. I ran an optimistic build (same revisions as before, release + noinline) where indirect and virtual calls were assumed to always propagate (thinlto_prop_optimistic) and the effect in Clang self-build at least is not too large:

thinlto_base/

"dwarfehprepare.NumCleanupLandingPadsRemaining": 217515,
"dwarfehprepare.NumNoUnwind": 299126,
"dwarfehprepare.NumUnwind": 332785,

thinlto_prop/

"dwarfehprepare.NumCleanupLandingPadsRemaining": 158372,
"dwarfehprepare.NumNoUnwind": 420918,
"dwarfehprepare.NumUnwind": 210870,

thinlto_prop_optimistic/

"dwarfehprepare.NumCleanupLandingPadsRemaining": 154958,
"dwarfehprepare.NumNoUnwind": 425893,
"dwarfehprepare.NumUnwind": 205889,

(425893-420918)/(420918-299126) = 4% boost over being conservative and correct. This may change in real workloads though so I added a thinlto-funcattrs-optimistic-indirect flag for easy measurement.

llvm/lib/Transforms/IPO/FunctionAttrs.cpp
383 ↗(On Diff #368178)

I added the test funcattrs-prop-exported-internal.ll that demonstrates this. The summary has its internal linkage converted to external in thinLTOResolvePrevailingInIndex then converted to AvailableExternally in thinLTOResolvePrevailingGUID. Currently being handled conservatively since a prevailing copy doesn't exist.

388 ↗(On Diff #368178)

Yeah the prevailing copy is in the native binary.

This is a C++ specific feature which has ODR and these are already being propagated/inlined from in pre-link. The current thinlink propagation implementation is conservative because a prevailing copy doesn't exist. Currently being handled conservatively since a prevailing copy doesn't exist.

419 ↗(On Diff #368178)

I see. I added hasUnknownCall as an explicit flag for all indirect calls that should capture both cases.

443 ↗(On Diff #356881)

That makes it much easier and everything folds into the prevailing case! Changed and added a test for it.

modimo updated this revision to Diff 369024.Aug 26 2021, 9:32 PM
modimo marked 2 inline comments as done.

Use prevailing for linkonce/weak. Add hasUnknownCall to model virtual and indirect calls

modimo updated this revision to Diff 369158.Aug 27 2021, 12:06 PM

Check for CachedAttributes count in map rather than value so conservative results are not re-calculated

Good point on indirect calls. Rather than add a bit to the summary, can the flags just be set conservatively in any function containing an indirect call when we build the summaries initially? I think that would get the same effect.

That could have an issue where A calls {indirect, B} and A gets propagated upon from B without knowledge that the indirect call exists. Right now I've got a FunFlags hasUnknownCall which marks these as non-propagatable.

Ah, because there isn't a conservative setting of the flag...which raises a larger issue (but maybe I am completely missing something) - how do we distinguish between the NoUnwind summary flag not being set because we don't know yet (in which case we want the propagation from callees), vs because it cannot be NoUnwind (because there is a throw in the function)? Do we need to have a second flag indicating that a function contains a mayThrow instruction (other than calls, which are being handled by the propagation)?

Good point on indirect calls. Rather than add a bit to the summary, can the flags just be set conservatively in any function containing an indirect call when we build the summaries initially? I think that would get the same effect.

That could have an issue where A calls {indirect, B} and A gets propagated upon from B without knowledge that the indirect call exists. Right now I've got a FunFlags hasUnknownCall which marks these as non-propagatable.

Ah, because there isn't a conservative setting of the flag...which raises a larger issue (but maybe I am completely missing something) - how do we distinguish between the NoUnwind summary flag not being set because we don't know yet (in which case we want the propagation from callees), vs because it cannot be NoUnwind (because there is a throw in the function)? Do we need to have a second flag indicating that a function contains a mayThrow instruction (other than calls, which are being handled by the propagation)?

Only call instructions can throw (what ultimately performs the throw operation is an opaque call to __cxa_throw()) which simplifies the problem. If all calls are known, we only need to examine the callees for accurate propagation.

Good point on indirect calls. Rather than add a bit to the summary, can the flags just be set conservatively in any function containing an indirect call when we build the summaries initially? I think that would get the same effect.

That could have an issue where A calls {indirect, B} and A gets propagated upon from B without knowledge that the indirect call exists. Right now I've got a FunFlags hasUnknownCall which marks these as non-propagatable.

Ah, because there isn't a conservative setting of the flag...which raises a larger issue (but maybe I am completely missing something) - how do we distinguish between the NoUnwind summary flag not being set because we don't know yet (in which case we want the propagation from callees), vs because it cannot be NoUnwind (because there is a throw in the function)? Do we need to have a second flag indicating that a function contains a mayThrow instruction (other than calls, which are being handled by the propagation)?

Only call instructions can throw (what ultimately performs the throw operation is an opaque call to __cxa_throw()) which simplifies the problem. If all calls are known, we only need to examine the callees for accurate propagation.

What about the other instruction checks done in Instruction::mayThrow i.e. http://llvm-cs.pcc.me.uk/lib/IR/Instruction.cpp#592?

modimo added a comment.EditedSep 9 2021, 8:14 PM

Good point on indirect calls. Rather than add a bit to the summary, can the flags just be set conservatively in any function containing an indirect call when we build the summaries initially? I think that would get the same effect.

That could have an issue where A calls {indirect, B} and A gets propagated upon from B without knowledge that the indirect call exists. Right now I've got a FunFlags hasUnknownCall which marks these as non-propagatable.

Ah, because there isn't a conservative setting of the flag...which raises a larger issue (but maybe I am completely missing something) - how do we distinguish between the NoUnwind summary flag not being set because we don't know yet (in which case we want the propagation from callees), vs because it cannot be NoUnwind (because there is a throw in the function)? Do we need to have a second flag indicating that a function contains a mayThrow instruction (other than calls, which are being handled by the propagation)?

Only call instructions can throw (what ultimately performs the throw operation is an opaque call to __cxa_throw()) which simplifies the problem. If all calls are known, we only need to examine the callees for accurate propagation.

What about the other instruction checks done in Instruction::mayThrow i.e. http://llvm-cs.pcc.me.uk/lib/IR/Instruction.cpp#592?

Good point! CleanupReturnInst and CatchSwitchInst are windows SEH specific representations for asynchronous exceptions but definitely should be covered for correctness. For ResumeInst it's the "return" of a landing pad and in order for a landing pad to be reachable AFAIK an invoke must exist so is captured by the call graph. I'll add a scan for Instruction::mayThrow in summary building. Having a mayThrow flag or making NoUnwind a tri-state flag in the summary makes sense to me to capture this case.

As a side note to why there's a check for ResumeInst at all: an invoke instructions actually never has "mayThrow" set. I haven't delved too deep but this could be changed to check for invokes instead since a dead landing pad at attribute inference time can lead to pessimization of NoUnwind in cases I've seen (alternatively, making sure CFG opts run before this to make sure this is pruned away).

modimo updated this revision to Diff 372398.Sep 13 2021, 9:47 PM

Add mayThrow flag

Ok thanks. I need to go through the propagation code and tests again more closely now, but one question/suggestion below in the meantime.

llvm/include/llvm/IR/ModuleSummaryIndex.h
580 ↗(On Diff #372398)

Now that we have MayThrow, can we avoid a separate hasUnknownCall bool and just conservatively set MayThrow true in that case?

Ok thanks. I need to go through the propagation code and tests again more closely now, but one question/suggestion below in the meantime.

Thanks!

llvm/include/llvm/IR/ModuleSummaryIndex.h
580 ↗(On Diff #372398)

hasUnknownCall is used for norecurse and other future flags as well to stop propagation.

tejohnson added inline comments.Sep 16 2021, 6:14 PM
llvm/include/llvm/IR/ModuleSummaryIndex.h
580 ↗(On Diff #372398)

Ah that makes sense.

Thanks for your patience, finally had a chance to go through everything much more carefully. Looks good, mostly a bunch of small or nitpicky final suggestions. The main comment/question of significance relates to where hasUnknownCall is being set currently.

Patch title and summary need an update.

clang/test/CodeGen/thinlto-funcattr-prop.ll
14 ↗(On Diff #372398)

This is checking the summary generated by opt, not the result of the llvm-lto2 run.

llvm/include/llvm/IR/ModuleSummaryIndex.h
575 ↗(On Diff #372398)

No Unwind needs a comment. And probably a note that it will be updated by function attr propagation. Depends on how we want to handle inline asm calls and other cases that currently set this true below (see my comment there).

580 ↗(On Diff #372398)

nit, maybe change this to hasIndirectCall which I think is more specific?

llvm/include/llvm/LTO/LTO.h
26 ↗(On Diff #372398)

Is this needed?

llvm/lib/Analysis/ModuleSummaryAnalysis.cpp
379 ↗(On Diff #372398)

Should this be moved below the following checks for inline asm and direct calls? (Not sure what the direct calls case is given that we handle direct calls to "known functions" above though).

If it should stay where it is and treat the below cases as unknown, probably should add tests for them.

llvm/lib/Transforms/IPO/FunctionAttrs.cpp
342 ↗(On Diff #372398)

Suggest renaming calculateDefinitiveAttributes and CachedAttributes to something like calculatePrevailingSummary and CachedPrevailingSummary which are more accurate now.

484 ↗(On Diff #372398)

You've already set InferredFlags.NoUnwind to false above this loop in the case where MayThrow was set on the CallerSummary.

499 ↗(On Diff #372398)

I think you can remove this and the below setNoUnwind() call on CachedAttributes[V] since presumably this points to one of the function summaries we update in the below loop.

531 ↗(On Diff #372398)

nit: suggest "Insert propagated function attributes from the Index ..."

537 ↗(On Diff #372398)

Consider consolidating this function with thinLTOResolvePrevailingInModule, to reduce the number of walks of the module and lookups into the DefinedGlobals map.

llvm/test/ThinLTO/X86/funcattrs-prop-indirect.ll
3 ↗(On Diff #372398)

Have a second version that tests with -thinlto-funcattrs-optimistic-indirect? I don't see a test for that option anywhere. Or maybe just remove that option - is it really needed?

9 ↗(On Diff #372398)

Perhaps this CHECK-NOT should just look for "Function Attrs:" as you do in some other tests below, in case some other attr is ever added to the IR that isn't related to this propagation, which could allow the CHECK-NOT to succeed for the wrong reasons?

llvm/test/ThinLTO/X86/funcattrs-prop.ll
8 ↗(On Diff #372398)

Since linkonce and weak are interposable, it isn't really correct to say that individual callers may optimize using different copies (we try to prevent this in the compiler since the are interposable).

29 ↗(On Diff #372398)

s/recursing/throwing/ on these 2 comments?

63 ↗(On Diff #372398)

Suggest putting comments above this one and call_weak_may_unwind below to indicate why one gets the nounwind and the other doesn't (i.e. that the thin link command above selects as prevailing the nounwind version of linkonce_may_unwind from b.ll and the may throw version of weak_may_unwind from c.ll)

76 ↗(On Diff #372398)

For clarity on what this is actually testing, suggest renaming these as ATTR_NOUNWIND and ATTR_MAYTHROW, or something like that (they are both norecurse, so the current name is a little misleading as to what is being checked).

tejohnson added inline comments.Sep 22 2021, 10:45 AM
llvm/include/llvm/IR/ModuleSummaryIndex.h
575 ↗(On Diff #372398)

Woops, the second sentence here "Depends on how we want to handle inline asm calls and other cases that currently set this true below (see my comment there)." I meant to add to the comment further below about the name of hasUnknownCall.

modimo updated this revision to Diff 374431.Sep 22 2021, 8:10 PM
modimo marked 23 inline comments as done.

Address feedback, rename funcattrs-prop-indirect.ll to funcattrs-prop-unknown.ll

clang/test/CodeGen/thinlto-funcattr-prop.ll
14 ↗(On Diff #372398)

Fixed.

llvm/include/llvm/IR/ModuleSummaryIndex.h
580 ↗(On Diff #372398)

My thinking is that the flag is a catch-all for blocking propagation and could conceivably be set for other reasons. It also matches the existing usage in FunctionAttrs.cpp for local propagation which also sets this for functions that are OptNone.

llvm/include/llvm/LTO/LTO.h
26 ↗(On Diff #372398)

Yeah, thinLTOPropagateFunctionAttrs resides in FunctionAttrs.h and runThinLTO calls it to propagate.

llvm/lib/Analysis/ModuleSummaryAnalysis.cpp
379 ↗(On Diff #372398)

Any call that isn't emitted to the summary CallGraphEdges is a hole in propagation knowledge.

Direct calls case is from https://reviews.llvm.org/D40056 which is handling:

; Test calls that aren't handled either as direct or indirect.
call void select (i1 icmp eq (i32* @global, i32* null), void ()* @f, void ()* @g)()

Neat that select can be consolidated into a call, though I wonder if it should be allowed given it could be canonicalized to be another IR instruction above it and maybe eliminate this edge case.

Tangent aside, since in all these cases the call isn't part of the static callgraph HasUnknownCall needs to be set for correctness. Tests added in funcattrs-prop-unknown.ll (replacing funcattrs-prop-indirect.ll since we're handling more than just indirect here).

llvm/lib/Transforms/IPO/FunctionAttrs.cpp
484 ↗(On Diff #372398)

Good catch, this case should be querying CalleeSummary MayThrow.

499 ↗(On Diff #372398)

Makes sense, removed. I like keeping the stats/debug tracking around though.

537 ↗(On Diff #372398)

Good idea, merged and renamed thinLTOResolvePrevailingInModule to thinLTOFinalizeInModule

llvm/test/ThinLTO/X86/funcattrs-prop-indirect.ll
3 ↗(On Diff #372398)

Good point, option removed.

llvm/test/ThinLTO/X86/funcattrs-prop.ll
8 ↗(On Diff #372398)

True, this comment is a left-over from the first interpretation of linkage models which have since been fixed with your help :). I'll update this to reflect that only prevailing matters here.

modimo retitled this revision from [ThinLTO] Add norecurse function attribute propagation to [ThinLTO] Add noRecurse and noUnwind thinlink function attribute propagation.Sep 22 2021, 8:34 PM
modimo edited the summary of this revision. (Show Details)

Few follow ups below.

clang/test/CodeGen/thinlto-funcattr-prop.ll
16 ↗(On Diff #374431)

I believe this corresponds to call_extern - why aren't we getting noRecurse and noUnwind propagated here?

(also, suggest adding a comment above each of these summaries as to what function name they correspond to)

llvm/include/llvm/Transforms/IPO/FunctionImport.h
224 ↗(On Diff #374431)

Suggest either removing default since you are always passing this argument, or default it to true and stop passing it in the places where it is true (since generally we want this to be true except in a few ThinLTOCodeGenerator.cpp locations that are testing specific things that don't involve propagation). Some preference for the former option (removing default), to make sure any new callers that get added think through the appropriate value.

llvm/lib/Transforms/IPO/FunctionAttrs.cpp
480 ↗(On Diff #374431)

Please make sure one of the may throw propagation tests would fail without this fix (i.e. when it was checking the caller's maythrow setting).

llvm/lib/Transforms/IPO/FunctionImport.cpp
1110 ↗(On Diff #374431)

Can this be merged with updateLinkage so we only do the DefinedGlobals lookup once per symbol?

llvm/test/ThinLTO/X86/funcattrs-prop.ll
6 ↗(On Diff #374431)

Nit, line length

modimo updated this revision to Diff 374723.Sep 23 2021, 9:17 PM
modimo marked 3 inline comments as done.

Address follow-ups

clang/test/CodeGen/thinlto-funcattr-prop.ll
16 ↗(On Diff #374431)

Tracing through llvm-lto2 the index is written out by CombinedIndexHook before the rest of thinlink including attribute propagation takes place. The attributes do end up successfully getting propagated, I'll add a check for that in the *1.promote.bc which shows the outcome of the attributes being propagated.

Good idea, added the function name that correspond to each summary.

llvm/lib/Transforms/IPO/FunctionAttrs.cpp
480 ↗(On Diff #374431)

Thinking more on why this didn't manifest strange behavior: because of the BU order of call-graph traversal any callee that has mayThrow will have its inferred noUnwind set to false above. Checking again in the caller is redundant because the noUnwind property of the callee will be determined by its value of noUnwind only. I think removing this check completely makes sense.

I can think of a scenario where there are mayThrow instructions but the function is still marked noUnwind (noexcept function with a throw in it) but in that case it is safe to propagate upwards because any exception will fail to escape this callee and so checking mayThrow would actually be a pessimization. I added a case in funcattrs-prop-maythrow.ll to illustrate this.

llvm/lib/Transforms/IPO/FunctionImport.cpp
1110 ↗(On Diff #374431)

Sure, merged.

tejohnson accepted this revision.Sep 23 2021, 10:19 PM

lgtm (one minor comment issue noted below). Thanks!

clang/test/CodeGen/thinlto-funcattr-prop.ll
17 ↗(On Diff #374723)

Incomplete sentence, seems to be missing the rest of the explanation about when it is written.

This revision is now accepted and ready to land.Sep 23 2021, 10:19 PM
modimo updated this revision to Diff 374934.Sep 24 2021, 12:49 PM
modimo marked an inline comment as done.

Complete explanation in thinlto-funcattr-prop.ll, also fix up diff to contain all changes.

clang/test/CodeGen/thinlto-funcattr-prop.ll
17 ↗(On Diff #374723)

Nice catch, sentence is now complete.

This revision was landed with ongoing or failed builds.Sep 27 2021, 12:28 PM
This revision was automatically updated to reflect the committed changes.

Thanks for the thorough review @tejohnson! I'll do additional validation on FB code that uses exceptions and if that all looks good I'll send up a change to turn this default on with the findings.