This is an archive of the discontinued LLVM Phabricator instance.

[ThinLTO] Import readonly vars with refs
ClosedPublic

Authored by evgeny777 on Oct 29 2019, 6:16 AM.

Details

Summary

This patch allows importing readonly variables which have initializer referencing other functions or variables. Referenced functions (or vars) are promoted but not imported.

This allows two things to be done:

  • Replacing indirect calls with direct calls to functions referenced in initializer
  • Additional DCE with -Wl,-mllvm,-function-sections -Wl,-mllvm,-data-sections, -Wl,--gc-sections linker flags

Diff Detail

Event Timeline

evgeny777 created this revision.Oct 29 2019, 6:16 AM

I can see why this is a good idea, but do you have numbers for this optimization? How much code can be inlined? Benchmark for speedup? code size? compile time impact?

lib/Transforms/IPO/FunctionImport.cpp
282 ↗(On Diff #226884)

I will just make function to be type bool hasRefsPreventingImport(const GlobalVarSummary*);. The callsite already has isa check so it will be much better to type casting there.

I can see why this is a good idea, but do you have numbers for this optimization? How much code can be inlined?

This patch doesn't directly affect inlining, though it can reduce number of IR instructions and make inline cost of a function lower.
LLVM test suite shows noticeable code size reduction of Bullet test case (~1%), because deallocation function pointer sFreeFunc is imported with initializer
and load instruction gets eliminated in few destructors (indirect call is converted to direct call). On the whole the change introduced by this patch is quite small
and I wouldn't expect much from it. It would probably work better for C programs, where function pointers in structures is common OO pattern.

Also it may become better with better readonly'ness detection (I have plans for it as well).

Benchmark for speedup?

I've observed slightly better execution time of MultiSource test suite over several runs, but this is most likely some sort of caching issue.

code size?

See above. There are minor size improvements for other MultiSource benchmarks, but too small to be considered.

compile time impact?

Not observed. Typical program contains few global variables (compared to number of functions), and very few of them are readonly. Among those few have non-trivial initializer.

evgeny777 updated this revision to Diff 227099.Oct 30 2019, 7:57 AM

Addressed. See comment above for performance insights

This revision is now accepted and ready to land.Oct 30 2019, 10:45 AM
tejohnson requested changes to this revision.Oct 30 2019, 11:27 AM

Thanks for working on this! I had been thinking about adding this recently but didn't have a chance yet. I have a few suggestions below. One thing that should be considered for a follow-up enhancement is to consider the refs themselves for importing (similar to the way we follow chains of calls when doing function importing).

lib/Transforms/IPO/FunctionImport.cpp
283 ↗(On Diff #227099)

The name of the maybeReadOnly is a little confusing here, although I know we expect that analysis to have finalized this flag by now. Can you add a flag to the summary index to indicate that the attribute propagation has completed (similar to the WithGlobalValueDeadStripping flag that indicates liveness analysis is complete), and then add a helper on the summary to return the maybeReadOnly value that asserts the flag is set, and change the uses sites to use that instead? E.g. something like a readOnly() method, and probably writeOnly() one for completeness.

318 ↗(On Diff #227099)

I think it would be better to call this from within canImportGlobalVar to keep the correctness checks centralized.

This revision now requires changes to proceed.Oct 30 2019, 11:27 AM
evgeny777 updated this revision to Diff 227288.Oct 31 2019, 8:58 AM

Addressed.

Adding new flag to summary index allowed better handling cases when actual dead symbol computation was skipped for some reason. See modifications in processGlobalForThinLTO and updated test cases.

One thing that should be considered for a follow-up enhancement is to consider the refs themselves for importing (similar to the way we follow chains of calls when doing function importing).

This has to be done with caution, because importing everything actually makes things worse due to promotion in source modules.
One thing which is worth doing is calculating references from initializer to constant object, e.g:

static const struct A { ... } objA = { ... };
struct B { A *pA; } objB = { &objA };

and import such references.

Another thing which seems also worth doing is computing functions which doesn't modify their pointer arguments. Calling such functions with
global object pointer doesn't violate readonly property. Also this could be a base for some other optimizations (like argument promotion).

tejohnson added inline comments.Nov 1 2019, 7:20 AM
lib/Bitcode/Reader/BitcodeReader.cpp
5853 ↗(On Diff #227288)

I don't follow the second check, since the WithGlobalValueDeadStripping flag was added to the index before attribute propagation was supported.

lib/IR/ModuleSummaryIndex.cpp
201 ↗(On Diff #227288)

I had forgotten that this was called during attribute propagation. Won't guarding this check with WithAttributePropagation do the wrong thing during importing if we somehow skipped attribute propagation (doesn't look like that can happen today, but let's say we add a debugging option in the future to allow that to be skipped). It would be safer to have this method take a flag that indicates whether we are doing this during attribute propagation, and if not then return true if !WithAttributePropagation.

205 ↗(On Diff #227288)

Update comment

lib/Transforms/IPO/FunctionImport.cpp
865 ↗(On Diff #227288)

Where/how does this happen? There are two callsites, with one using an empty set of preserved symbols (old LTO API), but they are for the old and new LTO APIs, and we would never be calling both.

It does look like this can potentially be called multiple times when testing through llvm-lto (because the testing facilities there like crossModuleImport(), internalize(), etc) will be called once per input file, but I don't see it otherwise being called twice. If this is just because of llvm-lto testing, can you update the comment to indicate that this is called multiple times when testing via the old LTO API (I don't see it happening via the new LTO API, but maybe I am missing something).

test/ThinLTO/X86/local_name_conflict.ll
15 ↗(On Diff #227288)

I assume this is what you were referring to in your comments about the change to processGlobalForThinLTO for cases when actual dead symbol computation was skipped for some reason, but I don't see why we would be skipping the dead symbol computation here - llvm-lto's import action invokes the old LTO crossModuleImport() which ultimately calls computeDeadSymbolsWithConstProp, and I'm not sure why that would do the attribute propagation but not the dead symbol computation.

evgeny777 marked 3 inline comments as done.Nov 1 2019, 9:02 AM
evgeny777 added inline comments.
lib/IR/ModuleSummaryIndex.cpp
201 ↗(On Diff #227288)

Makes sense.

lib/Transforms/IPO/FunctionImport.cpp
865 ↗(On Diff #227288)

computeDeadSymbols is currently called multiple times from llvm-lto with empty GUIDPreservedSymbols (see emit_imports.ll) and can *potemtially* be called with non-empty GUIDPreservedSymbols second time (although *currently* this doesn't happen). However if we add some boolean flag, like IsCalledFromPropagation to canImportGlobalVar then this doesn't really matter much.

test/ThinLTO/X86/local_name_conflict.ll
15 ↗(On Diff #227288)

This is because on some occasions WithGlobalValueDeadStripping can be false and WithAttibutePropagation can be true. One of such cases is when GUIDPreservedSymbols is empty. In such case we consider all symbols live. Having all symbols live doesn't however mean that some of the can't be readonly. That's why we see some additional internalizations happening.

evgeny777 marked an inline comment as done.Nov 5 2019, 1:56 AM
evgeny777 added inline comments.
lib/Bitcode/Reader/BitcodeReader.cpp
5853 ↗(On Diff #227288)

We have constant propagation in index since version 5 and if value of true of WithGlobalValueDeadStripping flag means constant propagation has run. I tend to remove this check for the sake of simplicity though.

evgeny777 updated this revision to Diff 227851.Nov 5 2019, 5:35 AM

Addressed and rebased

Herald added a project: Restricted Project. · View Herald TranscriptNov 5 2019, 5:35 AM
tejohnson accepted this revision.Nov 5 2019, 8:45 AM

LGTM with some minor comment changes suggested. Thanks!

llvm/lib/IR/ModuleSummaryIndex.cpp
175–177

document constant param

llvm/lib/Transforms/IPO/FunctionImport.cpp
314

document const param

864

This comment is stale and can presumably be removed now, as the corresponding setWithAttributePropagation(false) has been removed, right?

This revision is now accepted and ready to land.Nov 5 2019, 8:45 AM
This revision was automatically updated to reflect the committed changes.