This is an archive of the discontinued LLVM Phabricator instance.

[ThinLTO] Move summary computation from BitcodeWriter to new pass
ClosedPublic

Authored by tejohnson on Apr 4 2016, 9:41 AM.

Details

Summary

This is the first step in also serializing the index out to LLVM
assembly.

The per-module summary written to bitcode is moved out of the bitcode
writer and to a new analysis pass (ModuleSummaryIndexWrapperPass).
The pass itself uses a new builder class to compute index, and the
builder class is used directly in places where we don't have a pass
manager (e.g. llvm-as).

Because we are computing summaries outside of the bitcode writer, we no
longer can use value ids created by the bitcode writer's
ValueEnumerator. This required changing the reference graph edge type
to use a new ValueInfo class holding a union between a GUID (combined
index) and Value* (permodule index). The Value* are converted to the
appropriate value ID during bitcode writing.

There are a couple of NFC changes in here that I will commit separately
ahead of time (e.g. new getGlobalValueInfo() helper method).

Diff Detail

Repository
rL LLVM

Event Timeline

tejohnson updated this revision to Diff 52570.Apr 4 2016, 9:41 AM
tejohnson retitled this revision from to [ThinLTO] Move summary computation from BitcodeWriter to new pass.
tejohnson updated this object.
tejohnson added a reviewer: mehdi_amini.
tejohnson added a subscriber: llvm-commits.
tejohnson updated this revision to Diff 52695.Apr 5 2016, 8:35 AM

Merge with upstream HEAD. This pulls in the NFC changes I committed separately.

mehdi_amini edited edge metadata.Apr 5 2016, 12:37 PM

I think this introduces a distinction between the module level summary and the combined index, both in the memory representation and the usage (the client would deal differently with both).
I wonder if it is not a sign that some more restructuring could make it cleaner, with a separate class for the combined index and the module summary one.

Also I'm not totally sure why we're still using value id in the module summary case and GUID for the combined index?

include/llvm/IR/ModuleSummaryIndex.h
76 ↗(On Diff #52695)

getGUID might be clearer at call sites?

In D18763#392671, @joker.eph wrote:

I think this introduces a distinction between the module level summary and the combined index, both in the memory representation and the usage (the client would deal differently with both).
I wonder if it is not a sign that some more restructuring could make it cleaner, with a separate class for the combined index and the module summary one.

There's so much overlap between them that I don't know that separating them would be helpful.

Note that in the current upstream implementation I am actually writing value ids into the summary entries populated during bitcode writing in the per module case, they are essentially standing in for GUIDs in the per-module case, and no type change was required since they are unsigned ints and therefore the same uint64_t field works for either these value ids or the GUIDs in the combined index case.

Here I had to make the different representation in the per-module vs combined cases explicit since we don't yet have the value ids from the ValueEnumerator available.

Also I'm not totally sure why we're still using value id in the module summary case and GUID for the combined index?

Note that they *both* use a value id in the bitcode representation. In both cases the value ids are assigned by the bitcode writer:

  • In the per-module case this value id is assigned by the ValueEnumerator which is essentially private to the bitcode writer, and the value ids used for writing the summary must use those (since the summary leverages the same VST used by the rest of the bitcode).
  • In the combined case the value ids used in the combined VST are assigned to each GUID via a map populated as we write the combined summary entries. As mentioned on IRC yesterday, this is to be more compact that using a much larger GUID in all of the reference graph entries.

It doesn't make sense to force creation of a GUID in the per-module case since we somehow need to get back to the Value* to access the real value ID used in the VST from the ValueEnumerator.

include/llvm/IR/ModuleSummaryIndex.h
76 ↗(On Diff #52695)

Good idea

tejohnson updated this revision to Diff 52787.Apr 6 2016, 6:49 AM
tejohnson edited edge metadata.
  • Rename getId to getGUID
mehdi_amini added inline comments.Apr 6 2016, 10:41 PM
include/llvm/Analysis/ModuleSummaryAnalysis.h
40 ↗(On Diff #52787)

Pass argument requires some good documentation I think.

include/llvm/IR/ModuleSummaryIndex.h
123 ↗(On Diff #52787)

I'm not super enthusiastic about this: we're leaking the bitcode implementation details here. I don't have a great suggestion at that time unfortunately :(

lib/Analysis/ModuleSummaryAnalysis.cpp
124 ↗(On Diff #52787)

I think my representation of aliases will make this obsolete, just as all the "clone()" method on the GlobalValueInfo right?

126 ↗(On Diff #52787)

Using a pass as an argument is really not nice here, and also it is never passed at call site.
Can you pas instead a std::function<BlockFrequencyInfo *(Function &F)> that returns either a pointer to the BFI for the function F, or nullptr if not available?

144 ↗(On Diff #52787)

Why is this const_cast needed?

198 ↗(On Diff #52787)

I think you intended to pass an extra argument here.

lib/Bitcode/Writer/BitcodeWriterPass.cpp
25 ↗(On Diff #52787)

Here as well, extra argument.

tejohnson added inline comments.Apr 7 2016, 9:10 AM
include/llvm/Analysis/ModuleSummaryAnalysis.h
40 ↗(On Diff #52787)

Will change to call back and document.

include/llvm/IR/ModuleSummaryIndex.h
123 ↗(On Diff #52787)

Only in the sense that it motivates the need for a union. The constituent types (Value* and GUID) are not bitcode specific.

lib/Analysis/ModuleSummaryAnalysis.cpp
124 ↗(On Diff #52787)

Yes, this should go away along with the clone() support. I can remove all of this since your follow on will presumably go in shortly after this one.

126 ↗(On Diff #52787)

The not passing was inadvertent as you point out below. But I like the callback approach better.

144 ↗(On Diff #52787)

Because the BlockFrequencyInfo (or BranchProbabilityInfo, can't remember which one) takes a non-const F. When called through the pass the Module is not const. But when this class is constructed from llvm-as the Module is constant so the Module pointer is constant in the builder.

198 ↗(On Diff #52787)

Yes.

lib/Bitcode/Writer/BitcodeWriterPass.cpp
25 ↗(On Diff #52787)

No, this one not derived from Pass. As an aside, I have no idea in the new pass manager, which I believe is what BitcodeWriterPass is, how to do the equivalent of getAnalysis<>. So right now it will construct the BFI as it does when called via llvm-as.

mehdi_amini added inline comments.Apr 7 2016, 3:40 PM
include/llvm/IR/ModuleSummaryIndex.h
123 ↗(On Diff #52787)

Oh for some reason I thought we will end up with the value ID, pointers are fine.

lib/Analysis/ModuleSummaryAnalysis.cpp
144 ↗(On Diff #52787)

Ok I update BFI upstream, but DomTree seems a lot more involved...

You should be able to update to:

LoopInfo LI{DominatorTree(const_cast<Function &>(F))};
BranchProbabilityInfo BPI{F, LI};
BFIPtr = llvm::make_unique<BlockFrequencyInfo>(F, BPI, LI);
lib/Bitcode/Writer/BitcodeWriterPass.cpp
25 ↗(On Diff #52787)

ok

tejohnson added inline comments.Apr 7 2016, 10:18 PM
lib/Analysis/ModuleSummaryAnalysis.cpp
144 ↗(On Diff #52787)

Just saw that, thanks! Will do.

mehdi_amini added inline comments.Apr 8 2016, 9:33 AM
lib/Bitcode/Writer/LLVMBuild.txt
22 ↗(On Diff #52787)

I think it is worth mentioning in the commit message/description that this fixes this "dependency issue".

tejohnson added inline comments.Apr 8 2016, 2:07 PM
lib/Bitcode/Writer/LLVMBuild.txt
22 ↗(On Diff #52787)

Will do.

tejohnson updated this revision to Diff 53088.Apr 8 2016, 2:10 PM

Address review comments

Note that for now I have ripped out all the alias summary emission.
This will be added back with D18836, and it avoids a bunch of unnecessary
temporary handling of cloning. I have noted in the tests where they need
to be updated.

mehdi_amini accepted this revision.Apr 8 2016, 2:27 PM
mehdi_amini edited edge metadata.

LGTM (with two inline comments)

include/llvm/Analysis/ModuleSummaryAnalysis.h
32 ↗(On Diff #53088)

Why do you need a member for BFIFtor?

lib/Analysis/ModuleSummaryAnalysis.cpp
69 ↗(On Diff #53088)

I think this is something we want to get rid of right?

This revision is now accepted and ready to land.Apr 8 2016, 2:27 PM
tejohnson added inline comments.Apr 8 2016, 10:50 PM
include/llvm/Analysis/ModuleSummaryAnalysis.h
32 ↗(On Diff #53088)

Good point, don't need it. Will clean this up before I submit.

lib/Analysis/ModuleSummaryAnalysis.cpp
69 ↗(On Diff #53088)

Yes but it has to wait until your anonymous function renaming patch is in.

This revision was automatically updated to reflect the committed changes.