This is an archive of the discontinued LLVM Phabricator instance.

[ThinLTO] Only compute imports for current module in FunctionImport pass
ClosedPublic

Authored by tejohnson on Apr 10 2016, 1:43 PM.

Details

Summary

The function import pass was computing all the imports for all the
modules in the index, and only using the imports for the current module.
Change this to instead compute only for the given module. This means
that the exports list can't be populated, but they weren't being used
anyway.

Note that if we want to use the export lists to control linkage changes,
it would be better to compute them a single time in the thin link step
and mark any required changes in the index.

Diff Detail

Repository
rL LLVM

Event Timeline

tejohnson updated this revision to Diff 53183.Apr 10 2016, 1:43 PM
tejohnson retitled this revision from to [ThinLTO] Only compute imports for current module in FunctionImport pass.
tejohnson updated this object.
tejohnson added a reviewer: mehdi_amini.
tejohnson added a subscriber: llvm-commits.
mehdi_amini edited edge metadata.Apr 10 2016, 2:09 PM

About your note: ComputeCrossModuleImport() *is* the API I am calling in the Thin-link step.

The export list is gonna be needed to drive a "smart" promotion/renaming, so I expect it to be needed at the same stage you need the import list.
What will be the need for this?

lib/Transforms/IPO/FunctionImport.cpp
313 ↗(On Diff #53183)

Should be refactored with ModuleSummaryIndex::collectDefinedFunctionsPerModule() after D18908, the code looks close.

In D18945#396592, @joker.eph wrote:

About your note: ComputeCrossModuleImport() *is* the API I am calling in the Thin-link step.

Right, I haven't changed that callsite and that module will also be called from the gold-plugin.

The export list is gonna be needed to drive a "smart" promotion/renaming, so I expect it to be needed at the same stage you need the import list.
What will be the need for this?

The callsite in the function importer (see the last change in this file) uses the new interface. It doesn't need to recompute all imports for all modules. Should reduce compile time in the back ends that use this pass.

In D18945#396592, @joker.eph wrote:

About your note: ComputeCrossModuleImport() *is* the API I am calling in the Thin-link step.

Right, I haven't changed that callsite and that module will also be called from the gold-plugin.

Just to clarify, it can be called from the gold plugin to compute the imports and exports to:

  • guide importing (when the backend threads are launched from gold, similar to what you are doing in the libLTO case
  • to optimize linkage (potentially marking changes in the index for the distributed backend case)
  • to determine how to stage bitcode input files onto distributed build machines.

The export list is gonna be needed to drive a "smart" promotion/renaming, so I expect it to be needed at the same stage you need the import list.
What will be the need for this?

The callsite in the function importer (see the last change in this file) uses the new interface. It doesn't need to recompute all imports for all modules. Should reduce compile time in the back ends that use this pass.

This is particularly for the distributed build case where the importing is done in the back end processes.

This is particularly for the distributed build case where the importing is done in the back end processes.

This is the part I'm not convinced yet, because of the part where I wrote "The export list is gonna be needed to drive a "smart" promotion/renaming, so I expect it to be needed at the same stage you need the import list."

In D18945#396617, @joker.eph wrote:

This is particularly for the distributed build case where the importing is done in the back end processes.

This is the part I'm not convinced yet, because of the part where I wrote "The export list is gonna be needed to drive a "smart" promotion/renaming, so I expect it to be needed at the same stage you need the import list."

Note you can still do this from libLTO where you are calling ComputeCrossModuleImport and renameModuleForThinLTO directly (a single time per link).

For the distributed back end case, I want to avoid doing redundant work in thousands of back ends, which it currently is doing unnecessarily. As mentioned, the smart promotion could be done by invoking the full ComputeCrossModuleImport in the thin link step and marking the updated linkages in the index for the backends to adjust.

Note you can still do this from libLTO where you are calling ComputeCrossModuleImport and renameModuleForThinLTO directly (a single time per link).

(the once per link is obviously meant for the former, not the per-module renameModuleForThinLTO...)

I suspect the most efficient implementation for a distributed is that the ThinLink would provide for each file:

  • the list of imports (module id + list of functions)
  • the list of export

This is exactly what you need to send to the backend, not the index itself.

In D18945#396620, @joker.eph wrote:

I suspect the most efficient implementation for a distributed is that the ThinLink would provide for each file:

  • the list of imports (module id + list of functions)
  • the list of export

This is exactly what you need to send to the backend, not the index itself.

In some ways, but I'd like to enable more flexibility in the backends (locals have to be handled suitably conservatively if we promote based on the link time decisions).

If we eventually decide to go that route, passing this info from the link step to the backends requires additional/new plumbing (maybe write out a subset of the index with just the desired imports, e.g.). In that case, the invocation of ComputeCrossModuleImportForModule here goes away completely in that model, and I don't see a need to compute the lists for all modules redundantly until then.

lib/Transforms/IPO/FunctionImport.cpp
313 ↗(On Diff #53183)

Meaning refactor this out into a ModuleSummaryIndex method as you did with the version of this loop from ComputeCrossModuleImport()? Sure, I can do that. Note this version is different than the one from ComputeCrossModuleImport (which D18908 pulls out into collectDefinedFunctionsPerModule) in that it is looking for a specific module path, and is only building a std::map, not StringMap<std::map>.

tejohnson updated this revision to Diff 53392.Apr 12 2016, 7:41 AM
tejohnson edited edge metadata.

Refactor function info map build into collectDefinedFunctionsForModule
method on the ModuleSummaryIndex, similar to the refactoring done in
D18908 for the full-index case.

To give an idea of the impact of this change: I have a large application
containing 13719 source modules, and a 114M module summary index with
2283915 function summaries and 1166281 globalvar init summaries.
For a single source module's ThinLTO distributed backend compile step,
the Function Importing pass time before this change was 46.7s, and after
this change was only 0.3s. And that ~46s cost was being paid by all
13719 source modules!

mehdi_amini accepted this revision.Apr 12 2016, 11:58 AM
mehdi_amini edited edge metadata.

LGTM.

This revision is now accepted and ready to land.Apr 12 2016, 11:58 AM
This revision was automatically updated to reflect the committed changes.