This is an archive of the discontinued LLVM Phabricator instance.

[ThinLTO] Do metadata linking during batch function importing
ClosedPublic

Authored by tejohnson on Jan 21 2016, 11:55 AM.

Details

Summary

Since we are currently not doing incremental importing there is
no need to link metadata as a postpass. The module linker will
only link in the imported subroutines due to the functionality
added by r256003.

(Note that the metadata postpass linking functionalitiy is still
used by llvm-link, and may be needed here in the future if a more
incremental strategy is adopted.)

Diff Detail

Repository
rL LLVM

Event Timeline

tejohnson updated this revision to Diff 45576.Jan 21 2016, 11:55 AM
tejohnson retitled this revision from to [ThinLTO] Do metadata linking during batch function importing.
tejohnson updated this object.
tejohnson added a reviewer: mehdi_amini.
tejohnson added a subscriber: llvm-commits.
mehdi_amini added inline comments.Jan 21 2016, 12:54 PM
lib/Transforms/IPO/FunctionImport.cpp
308 ↗(On Diff #45576)

I'd question why did we load them with lazy metadata in the first place? If we have materialize the metadata anyway, we can as well load them up-front.

tejohnson added inline comments.Jan 21 2016, 1:11 PM
lib/Transforms/IPO/FunctionImport.cpp
308 ↗(On Diff #45576)

I think delaying it until here should save memory - otherwise we have all metadata for all source modules in memory at once. Materializing it here just before we link, after which the SrcModule is destroyed, should mean we only have one source module worth of metadata materialized and in memory at once (other than what is linked into the dest module, which will be a subset of the metadata, even smaller once I stop linking in the other things hanging off the DICompileUnit).

mehdi_amini added inline comments.Jan 21 2016, 1:42 PM
lib/Transforms/IPO/FunctionImport.cpp
308 ↗(On Diff #45576)

I'm fine with leaving the call to materializeMetadata as it will turn to be a no-op if they were loaded up-front.
Can you reflect this in the comment above? Just prepending an "If" at the beginning looks good to me.

mehdi_amini added inline comments.Jan 21 2016, 1:44 PM
lib/Transforms/IPO/FunctionImport.cpp
308 ↗(On Diff #45576)

Actually, I have another idea: why not load metadata on demand when materializing a function?
This seems quite easy to achieve, and would lead us to a state where we would have everything we need materialized, and nothing more.

tejohnson added inline comments.Jan 21 2016, 2:04 PM
lib/Transforms/IPO/FunctionImport.cpp
308 ↗(On Diff #45576)

That has the same disadvantages as not lazy loading in the first place - since we will materialize each function when we decide to import it (in order to look for external calls), we will end up materializing all metadata for all source modules we're importing from and holding them in memory at the same time.

tejohnson updated this revision to Diff 45592.Jan 21 2016, 2:07 PM
  • Update comments.
mehdi_amini added inline comments.Jan 21 2016, 2:17 PM
lib/Transforms/IPO/FunctionImport.cpp
308 ↗(On Diff #45592)

Not at the same scale: my point is to materialize only the metatadata referenced by the function being materialized, so it is not the same as materializing *all* the metadata: if we import one function from a module we would be able to materialize only the needed metadata.

tejohnson added inline comments.Jan 21 2016, 2:25 PM
lib/Transforms/IPO/FunctionImport.cpp
308 ↗(On Diff #45592)

Oh I see. I thought you were suggesting that I move this call to materializeMetadata into the materialize() call for a function.

What you are suggesting would be interesting, but right now isn't supported as metadata materialization involves parsing all of the module level metadata. Presumably with the approach you are describing it would need to be parsed for each function materialized and somehow only the records of interest would be recorded? The metadata is correlated to references by the index into the MetadataList, which is populated while parsing. You won't even know which metadata values you need until you parse the function, at which point you have already created the forward references. Sorry if I am missing something.

mehdi_amini edited edge metadata.Jan 21 2016, 2:37 PM

What I have in mind is to rememberAndSkipMetadata() when parsing the module. But embedded the implementation in BitcodeReaderMetadataList so that whenever we call getValueFwdRef we don't create a placeholder (if the ID is one of the module-level constant) but parse the metadata on-demand. I should be able to craft something quickly.

In D16424#332826, @joker.eph wrote:

What I have in mind is to rememberAndSkipMetadata() when parsing the module. But embedded the implementation in BitcodeReaderMetadataList so that whenever we call getValueFwdRef we don't create a placeholder (if the ID is one of the module-level constant) but parse the metadata on-demand. I should be able to craft something quickly.

Note that the metadata value ids are assigned implicitly based on the order of the metadata records. Will you only parse the needed records, skipping the intermediate ones, while counting metadata value ids that would be assigned by skipped MD records that assign values (all but METADATA_NAME)? You'd need to go back and parse the skipped records on backwards references within the metadata. Otherwise, you'd have to parse all MD records up through the ones needed for the imported function (presumably leading to overhead from saving them, unless you can remove the ones that weren't actually needed by the imported function - but then they may need to be reparsed later for a different imported function).

In the meantime, should we put this patch in, since it is a simple change and should improve your situation give the current bulk importing? Most of this is needed regardless of whether you are able to parse just the needed metadata.

mehdi_amini accepted this revision.Jan 21 2016, 4:14 PM
mehdi_amini edited edge metadata.

LGTM: strictly better than before.

This revision is now accepted and ready to land.Jan 21 2016, 4:14 PM
This revision was automatically updated to reflect the committed changes.