This is an archive of the discontinued LLVM Phabricator instance.

Apply summary-based dead stripping to regular LTO modules with summaries.
ClosedPublic

Authored by pcc on Jun 5 2017, 5:22 PM.

Details

Summary

If a regular LTO module has a summary index, then instead of linking
it into the combined regular LTO module right away, add it to the
combined summary index and associate it with a special module that
represents the combined regular LTO module.

Any such modules are linked during LTO::run(), at which time we use
the results of summary-based dead stripping to control whether to
link prevailing symbols.

Depends on D33921

Event Timeline

pcc created this revision.Jun 5 2017, 5:22 PM
tejohnson added inline comments.Jun 9 2017, 9:02 PM
llvm/include/llvm/IR/ModuleSummaryIndex.h
694

The patch adds a couple different new types named Module - is it possible to use something else rather than overloading that name?

697

Update description of return value.

llvm/include/llvm/LTO/LTO.h
287

Can you define this structure here in the header? I think it would be clearer.

llvm/lib/Bitcode/Reader/BitcodeReader.cpp
724

I'm a bit confused about the relationship between ModulePath and Mod. It seems from the code in addThisModule() that Mod is being used to save on insertion attempts into the table, or is there another reason? If just an optimization, can that be committed separately?

I don't quite understand the last part of the comment "if it is expecting to read a per-module summary index". Do you mean it is only filled in when we are (in the process of) reading a per-module summary index?

4695

I take it we were always guaranteed to have ModuleId == 0 when we called addThisModulePath?

Also, previously we would repeatedly try to re-add the module path - is saving the result of addModule in the reader and checking it here just an optimization, or required because of the other changes?

5509

It wasn't clear to me when I first looked at this part of the patch why the ModuleIdentifier on the BitcodeModule wouldn't be used, as it is in getSummary below (or was here before). I think it would be clearer to pass in a flag indicating that this is a regularLTO summary, and then here either use the ModuleIdentifier or "", depending on the flag.

llvm/lib/LTO/LTO.cpp
605

I think the old comment is more accurate (we can link it as available_externally, but only need to if it is undefined). Perhaps just add that we will decide later when we link this module, based on whether it is undefined.

tejohnson added inline comments.Jun 10 2017, 7:15 AM
llvm/lib/LTO/LTO.cpp
509

Why not just link all of the regular LTO objects during runRegularLTO? I guess you still need to keep track of which had summaries though. How much of addRegularLTO could then be deferred until runRegularLTO (i.e. keeping track of the resolution info, like we do for ThinLTO, but delaying building the Module, computing Keep, etc)? It seems conceptually simpler to do all the linking in one place.

pcc added inline comments.Jun 12 2017, 2:32 PM
llvm/include/llvm/IR/ModuleSummaryIndex.h
694

I will rename the other one to AddedModule.

697

Will do

llvm/include/llvm/LTO/LTO.h
287

Will do

llvm/lib/Bitcode/Reader/BitcodeReader.cpp
724

I think I originally wanted this to work by changing addModule so that it would always either create a module or error out rather than possibly returning an existing one, and then have the client always create the module on its own (so that it could control whether or not to create a module for each input file, and as a side effect we would get the optimization that you mention), but then I thought that that wouldn't necessarily work because the pointer could become invalidated by the StringMap (although now that I look at StringMap's rehash implementation I don't think that could actually happen). So I ended up moving the Module creation back into the reader.

I'd be fine with changing this back to be more like how it was and we can think about changing the API at a later point.

Regarding the comment, I mean that the caller (i.e. readSummary) will fill this field in as it is expecting to read a per-module summary index. I'll make this comment clearer if I make this change in a followup.

4695

Yes, if !Mod at this point we must have been called from getSummary for which the current behaviour is that the module id will be 0 if it is a per-module summary.

It's more of a pure optimization at this point, so I'll revert it as I mentioned on the other comment.

5509

I did it this way because I wanted LTO to handle regular LTO module summaries on its own (well, as much as possible I suppose). With that suggestion, we would move more of the handling into the bitcode reader, which I'm not entirely opposed to, but let me know what you think of that rationale.

llvm/lib/LTO/LTO.cpp
509

I think the first thing I tried was something along those lines, but the code turned out to be less clean in my opinion, precisely because there was more state to keep track of. I guess it's just a matter of opinion, but this seems cleaner to me, as there turned out to be a relatively clean boundary between preparing to link a module and linking it.

605

Will do

pcc marked 6 inline comments as done.Jun 12 2017, 5:16 PM
pcc added inline comments.
llvm/include/llvm/IR/ModuleSummaryIndex.h
697

Done (in D34124).

pcc updated this revision to Diff 102266.Jun 12 2017, 5:16 PM

Address review comments

eugenis added inline comments.Jun 13 2017, 2:08 PM
llvm/lib/LTO/LTO.cpp
512

Move this to ModuleSummaryIndex? I'm adding the same helper function to LowerTypeTests in D34168.

tobiasvk added inline comments.
llvm/lib/Bitcode/Reader/BitcodeReader.cpp
5659–5662

So this will be a bit misleading if you change it as you suggested in D34156 to check just for ThinLTO. Can we rename it to hasThinLTOSummary, or maybe even get rid of it and instead create a getBitcodeLTOInfo(MemoryBufferRef) function so clients can figure things out directly from the LTOInfo?

pcc updated this revision to Diff 102615.Jun 14 2017, 3:38 PM
pcc marked 2 inline comments as done.
  • Address review comments
pcc added inline comments.Jun 14 2017, 3:38 PM
llvm/lib/Bitcode/Reader/BitcodeReader.cpp
5659–5662

Replaced with a getBitcodeLTOInfo function.

pcc updated this revision to Diff 102616.Jun 14 2017, 3:43 PM
  • Remove the old IsLiveByGUID function
tejohnson added inline comments.Jun 14 2017, 3:48 PM
llvm/lib/Bitcode/Reader/BitcodeReader.cpp
5509

Ok, but please document the fact that we are ignoring the ModuleIdentifier because of client-dependent behavior around the desired module path in the combined index.

llvm/lib/LTO/LTO.cpp
509

Ok. But is this affected by the change being discussed in D34156, where my understanding is that all regular LTO modules will have summaries?

pcc updated this revision to Diff 102620.Jun 14 2017, 4:18 PM
pcc marked an inline comment as done.
  • Add a comment explaining why we aren't using ModuleIdentifier
pcc marked an inline comment as done.Jun 14 2017, 4:18 PM
pcc added inline comments.
llvm/lib/LTO/LTO.cpp
509

After that change lands all LTO modules produced by clang will have summaries, so we will normally be following the new code path that defers the link step. But we will still need to be able to read modules without summaries for backwards compatibility reasons and in order to handle bitcode created by non-clang producers.

tejohnson accepted this revision.Jun 15 2017, 8:29 AM

LGTM

llvm/lib/LTO/LTO.cpp
509

I suspect we may eventually want to reorganize this so that regularLTO operates more like ThinLTO (read the summary and symbols here only, read the module later during runRegularLTO). But let's go forward with this approach for now, since it is a smaller delta compared to the current approach.

This revision is now accepted and ready to land.Jun 15 2017, 8:29 AM
This revision was automatically updated to reflect the committed changes.