This is an archive of the discontinued LLVM Phabricator instance.

[ThinLTO] More efficient export computation (NFC)
ClosedPublic

Authored by tejohnson on Feb 2 2020, 10:09 AM.

Details

Summary

A recent change to enable more importing of global variables with
references exposed some efficiency issues with export computation.
See D73724 for more information and detailed analysis.

The first was specific to variable importing. The code was marking every
copy of a referenced value (from possibly thousands of files in the case
of linkonce_odr) as exported, and we only need to mark the copy in the
module containing the variable def being imported as exported. The
reason is that this is tracking what values are newly exported as a
result of importing. Anything that was defined in another module and
simply used in the exporting module is already exported, and would have
been identified by the caller (e.g. the LTO API implementations).

The second issue is that the code was re-adding previously exported
values (along with all references). It is easy to identify when a
variable was already imported into the same module (via the
import list insert call return value), and we already did this for
function importing. However, what we weren't doing for either function
or variable importing was avoiding a re-insertion when it was previously
exported into a different importing module. The reason we couldn't do
this is there was no way of telling from the export list whether it was
previously inserted there because its definition was exported (in which
case we already marked all its references as exported) from when it was
inserted there because it was referenced by another exported value (in
which case we haven't yet inserted its own references).

To address this we can restructure the way the export list is
constructed. This patch only adds the actual imported definitions
(variable or function) to the export list for its module during the
import computation. After import computation is complete, where we were
already post-processing the export list we go ahead and add all
references made by those exported values to the export list.

These changes speed up the thin link not only with constant variable
importing enabled, but also without (due to the efficiency improvement
in function importing).

Some thin link user time measurements for one large application, average
of 5 runs:

With constant variable importing enabled:

  • without this patch: 479.5s
  • with this patch: 74.6s

Without constant variable importing enabled:

  • without this patch: 80.6s
  • with this patch: 70.3s

Note I have not re-enabled constant variable importing here, as I would
like to do additional compile time measurements with these fixes first.

Diff Detail

Event Timeline

tejohnson created this revision.Feb 2 2020, 10:09 AM
evgeny777 added inline comments.Feb 3 2020, 2:42 AM
llvm/lib/Transforms/IPO/FunctionImport.cpp
680

On the average this will iterate over N/2 summaries in EI.getSummaryList(). Can this be replaced with:

auto *S = DefinedGVSummaries[EI.getGUID()];

? Probably the assetion above can be removed after that.

691

nit: you don't need {} for if statement

tejohnson marked 3 inline comments as done.Feb 3 2020, 8:45 AM
tejohnson added inline comments.
llvm/lib/Transforms/IPO/FunctionImport.cpp
680

Great idea! I measured this change and there might have been a small <1s speedup, although that's within the noise range.

tejohnson updated this revision to Diff 242094.Feb 3 2020, 8:46 AM
tejohnson marked an inline comment as done.

Address comments

evgeny777 accepted this revision.Feb 3 2020, 8:53 AM

LGTM, thanks!

This revision is now accepted and ready to land.Feb 3 2020, 8:53 AM
This revision was automatically updated to reflect the committed changes.