This is an archive of the discontinued LLVM Phabricator instance.

[ThinLTO] Efficiency fix for writing type id records in per-module indexes
ClosedPublic

Authored by tejohnson on Aug 27 2018, 4:40 PM.

Details

Summary

In D49565/r337503, the type id record writing was fixed so that only
referenced type ids were emitted into each per-module index for ThinLTO
distributed builds. However, this still left an efficiency issue: each
per-module index checked all type ids for membership in the referenced
set, yielding O(M*N) performance (M indexes and N type ids).

This patch changes the writer so that when writing per-module indexes
we instead walk the referenced set, and write the corresponding type id
summary. Since the references are GUIDs but the type id map is indexed
by string, we utilize a map from GUID to type id map entry. This map was
already being created for in-process backends (for caching purposes),
so I moved it to the ThinLTO backend base class and pass it to the
bitcode index writer.

For writing combined indexes we don't build this map and fall back to
the old behavior, which minimized the number of callsites into the index
writer that required changes (but added an assert to check that we
always have the new map when writing per-module indexes where the
performance is going to be an issue).

For a large internal application, this reduced the thin link time by
almost 15%.

Diff Detail

Repository
rL LLVM

Event Timeline

tejohnson created this revision.Aug 27 2018, 4:40 PM
tejohnson added inline comments.Aug 27 2018, 4:44 PM
include/llvm/IR/ModuleSummaryIndex.h
744 ↗(On Diff #162777)

@pcc - do you remember why you used a vector as the map value type? Presumably there should be a 1-1 ratio between the GUIDs computed from type ids and the type id map entry, but is this to handle the potential for GUID conflicts?

pcc added a comment.Aug 29 2018, 1:52 PM

Did you consider changing the type id mapping data structure in ModuleSummaryIndex instead of creating a new one? Since we now use a mapping from guids in several places it seems sensible to do that. I think that it would work to change the type of TypeIdMap to something like:

DenseMap<GlobalValue::GUID, TinyPtrVector<std::pair<std::string, TypeIdSummary>>>
include/llvm/IR/ModuleSummaryIndex.h
744 ↗(On Diff #162777)

Yes, it was for guid conflicts.

In D51330#1218260, @pcc wrote:

Did you consider changing the type id mapping data structure in ModuleSummaryIndex instead of creating a new one? Since we now use a mapping from guids in several places it seems sensible to do that. I think that it would work to change the type of TypeIdMap to something like:

DenseMap<GlobalValue::GUID, TinyPtrVector<std::pair<std::string, TypeIdSummary>>>

Sure, I will do that. Looking at the use sites, I don't think we technically need to keep the name. The only place where we seem to really need it is when writing out the type id summaries to assembly, but it is nice from a readability perspective to have the name. It likely isn't going to be high overhead to keep both, so we might as well.

tejohnson added inline comments.Aug 30 2018, 2:05 PM
include/llvm/IR/ModuleSummaryIndex.h
744 ↗(On Diff #162777)

These should be rare and from what I can tell, if we only track one type id per conflicting GUID (since we will also have the type id string to disambiguate), the worst that will happen is that we wouldn't get a resolution and it would result in conservative behavior. Is that correct? Or do we need to track them all for correct behavior?

I realized when looking at changing this that currently the assembly writer will not behave well in the presence of conflicts, since the slot ID for the summary entry is assigned by GUID.

If we can get by with it, the implementation would be simpler if we simply tracked one (and like I said, disambiguate via the std::string saved in the value).

tejohnson added inline comments.Aug 31 2018, 10:29 AM
include/llvm/IR/ModuleSummaryIndex.h
744 ↗(On Diff #162777)

My new patch does away with the vector of summaries for each GUID. The interface to get a summary and the clients are changed to handle this - which I believe is conservative. If my interpretation of this as being conservative is incorrect please let me know.

Change the TypeIdMap in summary to be indexed by GUID.

A few tests changed because the order of serialization changed a bit
(iterating in GUID order instead of string order).

pcc added inline comments.Aug 31 2018, 11:21 AM
include/llvm/IR/ModuleSummaryIndex.h
744 ↗(On Diff #162777)

There's no conservative behaviour for type id resolutions, they need to be correct or we will miscompile (e.g. by forming a CFI check incorrectly).

tejohnson added inline comments.Aug 31 2018, 11:36 AM
include/llvm/IR/ModuleSummaryIndex.h
744 ↗(On Diff #162777)

Ok. I assumed that the return of the Unsat kind at the start of LowerTypeTestsModule::importTypeId when there is not a type id summary in the index would lead to conservative checking in that scenario. What situation is that handling?

pcc added inline comments.Aug 31 2018, 12:23 PM
include/llvm/IR/ModuleSummaryIndex.h
744 ↗(On Diff #162777)

Unsat is for handling the case where there is a call with no possible targets. For example a vcall where all vtables of that type have been dead stripped. In that case we compile the check so that it always fails. The reason why we return unsat when there are no type id summaries is that the absence of any vtables with a type id would lead to no summary for that type id being created.

tejohnson added inline comments.Aug 31 2018, 12:54 PM
include/llvm/IR/ModuleSummaryIndex.h
744 ↗(On Diff #162777)

I see, thanks for the explanation.

I suppose I should first fix the assembly printing for type ids so that the slot is based on the string and not on its GUID. Should be straightforward - I'll fix that first separately and update this patch to go to a vector of type summaries per GUID in the map afterwards.

tejohnson added inline comments.Sep 4 2018, 2:05 PM
include/llvm/IR/ModuleSummaryIndex.h
744 ↗(On Diff #162777)

Nevermind on this - I realized I needed a GUID-to-typeid map to fix this, so I decided to incorporate it into the change I was making here to the map type.

tejohnson updated this revision to Diff 163909.Sep 4 2018, 2:05 PM

Change to use a map to a vector of matching type id, summary pairs

vitalybuka added inline comments.Sep 24 2018, 1:51 PM
include/llvm/IR/ModuleSummaryIndex.h
760 ↗(On Diff #163909)

Have you considered multimap?

using TypeIdSummaryMapTy =
    std::multimap<GlobalValue::GUID, std::pair<std::string, TypeIdSummary>>;

Not sure, but maybe code could be a little bit simpler with it.

lib/IR/AsmWriter.cpp
2926 ↗(On Diff #163909)

So now we can have multiple entries here. Would be nice to reach this with a test?

2969 ↗(On Diff #163909)

Test?

vitalybuka accepted this revision.Sep 24 2018, 2:09 PM

LGTM regardless of your decision about my comment.

include/llvm/IR/ModuleSummaryIndex.h
26 ↗(On Diff #163909)

unused?

This revision is now accepted and ready to land.Sep 24 2018, 2:09 PM
tejohnson marked an inline comment as done.Sep 25 2018, 11:52 AM
tejohnson added inline comments.
include/llvm/IR/ModuleSummaryIndex.h
760 ↗(On Diff #163909)

Changed the code to use multimap.

lib/IR/AsmWriter.cpp
2926 ↗(On Diff #163909)

This requires identifying two strings that result in an MD5 collision, which should be extremely rare. I found an example on the web but it is in hex and does not convert to a valid string. So I'm not really sure how to test this.

2969 ↗(On Diff #163909)

Ditto.

tejohnson marked an inline comment as done.

Use multimap

This revision was automatically updated to reflect the committed changes.