This is an archive of the discontinued LLVM Phabricator instance.

[ThinLTO] Avoid temporaries when loading global decl attachment metadata
ClosedPublic

Authored by tejohnson on Sep 19 2020, 10:30 AM.

Details

Summary

When performing ThinLTO importing, the metadata loader attempts to lazy
load, by building an index. However, module level global decl attachment
metadata was being parsed early while building the index, since the
associated (module level) global values aren't materialized on demand.
This results in the creation of forward reference temporary metadatas,
which are expensive.

Normally, these module level global values don't have much attached
metadata. However, in the case of -fwhole-program-vtables (e.g. for
whole program devirtualization), the vtables may have many attached type
metadatas. This was resulting in very slow performance when performing
ThinLTO importing with the default lazy loading.

This patch restructures the handling of these global decl attachment
records, delaying their parsing until after the lazy loading index has
been built. Then the parser can use the interface that loads from the
index, which resolves forward references immediately instead of creating
expensive temporaries.

For one ThinLTO backend that imports from modules containing huge
numbers of vtables and associated types, I measured the following
compile times for the metadata materialization during function
importing, rounded to nearest second:

No -fwhole-program-vtables:

Lazy loading on (head):  1s
Lazy loading off (head): 3s
Lazy loading on (patch): 1s

With -fwhole-program-vtables:

Lazy loading on (head):  440s
Lazy loading off (head): 4s
Lazy loading on (patch): 2s

Diff Detail

Event Timeline

tejohnson created this revision.Sep 19 2020, 10:30 AM
tejohnson requested review of this revision.Sep 19 2020, 10:30 AM

I realized after intentionally suppressing the new method that there was no regression test that broke. It turns out that the lazy loading index is only emitted if there is a certain number of metadatas, which the tests that exercise this code apparently don't have. Therefore, I added an option to drop the threshold for one of the ThinLTO WPD tests. This exposed a bug: Because parseGlobalObjectAttachment now resolves forward references via the index, the index cursor has moved when it returns, and we weren't parsing all of the global decl attachment metadatas. Fixed by saving and restoring the current position around that call.

My compile time test didn't expose this because it was NDEBUG. I confirmed that it now parses all the expected metadata records. The metadata materialization time under -fwhole-program-vtables goes from 1s up to 2s. Will update the summary accordingly.

tejohnson edited the summary of this revision. (Show Details)Sep 19 2020, 4:11 PM
tejohnson updated this revision to Diff 292996.Sep 19 2020, 4:11 PM

Add test options that exercise code, fix exposed bug.

evgeny777 added inline comments.Sep 21 2020, 4:01 AM
llvm/lib/Bitcode/Reader/MetadataLoader.cpp
894

So this is not called during ThinLTO import, is it?

2115

Why is this change needed?

tejohnson added inline comments.Sep 21 2020, 7:44 AM
llvm/lib/Bitcode/Reader/MetadataLoader.cpp
894

It is only called during ThinLTO importing. See the if conditions for the block of code at its callsite (line 989)

2115

Because getMDNodeFwdRefOrNull unconditionally creates a forward reference with temporary when the requested metadata is not yet loaded. Whereas getMetadataFwdRefOrLoad will use the lazy loading index built by lazyLoadModuleMetadataBlock to perform the load rather than create a temporary. With the code before this patch, that could not be done as the global decl attachment metadatas were being parsed also by lazyLoadModuleMetadataBlock, so the index wasn't available. And without making the change to this callsite, the patch doesn't have any measureable effect since we won't use the index even when we have it (which I discovered when I first made the change to restructure into loadGlobalDeclAttachments but didn't change this callsite).

evgeny777 accepted this revision.Sep 21 2020, 8:36 AM

LGTM with nits

llvm/lib/Bitcode/Reader/MetadataLoader.cpp
916

nit: no need for {} here

929

nit: this is hard to read. May be change to somthing like this?

if (MaybeCode.get() != bitc::METADATA_GLOBAL_DECL_ATTACHMENT) {
      assert(NumGlobalDeclAttachSkipped == NumGlobalDeclAttachParsed);
      return true;
}

// Parse stuff here
2115

Thanks, now clear

This revision is now accepted and ready to land.Sep 21 2020, 8:36 AM
tejohnson marked 2 inline comments as done.Sep 22 2020, 8:31 PM
tejohnson updated this revision to Diff 293633.Sep 22 2020, 8:31 PM

Address comments

This revision was landed with ongoing or failed builds.Sep 22 2020, 8:40 PM
This revision was automatically updated to reflect the committed changes.

This caused a -fsanitize=cfi issue in thinltoBackend. I reverted the patch temporarily and shared a detailed reproduce with Teresa

% clang ...
fatal error: error in backend: Invalid abbrev number
tejohnson reopened this revision.Sep 25 2020, 10:07 AM

PTAL

I have a fix for the issue that required a revert. We shouldn't be using the lazy loading IndexCursor to load the global decl attachment metadata. That cursor is initialized with the necessary abbrev ids from the module level metadata block at the end of the lazy loading index setup, so that you can use those abbrev ids when jumping into random locations, as noted in the comment above its declaration:

/// Cursor associated with the lazy-loading of Metadata. This is the easy way
/// to keep around the right "context" (Abbrev list) to be able to jump in
/// the middle of the metadata block and load any record.

This is only exposed in limited situations, since there are almost no abbrev ids used in the metadata block (basically only debug location records), and you need a debug location that used in certain ways and shared by multiple functions so that it ends up in the module level metadata block for lazy loading. I beefed up the test case to contain metadata on the imported module that exposes the failure without this fix.

This revision is now accepted and ready to land.Sep 25 2020, 10:07 AM

Fix bug and add test case.

tejohnson requested review of this revision.Sep 25 2020, 10:09 AM

Ping - could someone take a quick look at the update and approve the recommit?

ping, @evgeny777 can you take a quick look at the change. It's a pretty minor fix.

This revision is now accepted and ready to land.Oct 12 2020, 1:27 AM