This is an archive of the discontinued LLVM Phabricator instance.

Pass sample pgo flags to thinlto.
ClosedPublic

Authored by danielcdh on Dec 14 2016, 6:00 PM.

Details

Summary

ThinLTO needs to invoke SampleProfileLoader pass during link time in order to annotate profile correctly after module importing.

Event Timeline

danielcdh updated this revision to Diff 81509.Dec 14 2016, 6:00 PM
danielcdh retitled this revision from to Pass sample pgo flags to thinlto..
danielcdh updated this object.
danielcdh added reviewers: tejohnson, davidxl.
danielcdh added a subscriber: llvm-commits.
davide added a subscriber: davide.Dec 14 2016, 6:04 PM

Do we need a similar change for lld?

pcc added a subscriber: pcc.Dec 14 2016, 6:18 PM

Do we need to add the sample profile to the hash in computeCacheKey?

In D27790#623176, @pcc wrote:

Do we need to add the sample profile to the hash in computeCacheKey?

I'm concerned about this indeed.

Why is this SampleProfileLoaderPass not performed during the compile phase?

In D27790#623176, @pcc wrote:

Do we need to add the sample profile to the hash in computeCacheKey?

I was planning to have it in a separate patch as it needs to hash the profile content too. Let me know if you think it should be part of this patch.

I'm concerned about this indeed.

Why is this SampleProfileLoaderPass not performed during the compile phase?

It is invoked in compiler phase. But if the profile is collected from ThinLTO binary, i.e. there is cross-module inlines in the profiling binary (thus in profile), the profile annotation needs to happen after all these inlines happened. i.e. annotation needs to be invoked once again in the linking phase.

Thanks,
Dehao

Ouch, I see: with sample-based profiling you can't change the callgraph after the instrumentation point to have the correct information, correct? Is changing the CFG OK though? (If you have a pointer about sample-based profiling where I can find the answer to these subtle points).

Hashing the full content of the profiles would break the "fine" grain of incremental build we have now. So I'd be cautious about that, even if this is likely a marginal use-case.

Considering sample pgo profile as a weighted-forest of inline stacks, each root corresponds to a symbol in the profile binary. The job of profile preparation is to expand the IR to resemble the hot part of the forest as in profile. More details about the sample pgo can be found in http://dl.acm.org/citation.cfm?id=2854044

At ThinLTO compile step, not all nodes is expandable as they could come from other modules. That's why we need to have SampleProfileLoader invoked after all nodes are visible.

In Sample PGO, the change of profile is far less frequent than source code. But once the profile is changed, the incremental build will be broken: everything needs to be built from scratch. I guess the same applies to instrumentation based pgo?

Considering sample pgo profile as a weighted-forest of inline stacks, each root corresponds to a symbol in the profile binary. The job of profile preparation is to expand the IR to resemble the hot part of the forest as in profile. More details about the sample pgo can be found in http://dl.acm.org/citation.cfm?id=2854044

At ThinLTO compile step, not all nodes is expandable as they could come from other modules. That's why we need to have SampleProfileLoader invoked after all nodes are visible.

I have to read the citation to understand :)

In Sample PGO, the change of profile is far less frequent than source code. But once the profile is changed, the incremental build will be broken: everything needs to be built from scratch. I guess the same applies to instrumentation based pgo?

I don't think so: for instrumentation based PGO is applied during the compile phase (there is certainly a key difference that does not make it possible with the sample based one).

Considering sample pgo profile as a weighted-forest of inline stacks, each root corresponds to a symbol in the profile binary. The job of profile preparation is to expand the IR to resemble the hot part of the forest as in profile. More details about the sample pgo can be found in http://dl.acm.org/citation.cfm?id=2854044

At ThinLTO compile step, not all nodes is expandable as they could come from other modules. That's why we need to have SampleProfileLoader invoked after all nodes are visible.

I have to read the citation to understand :)

In Sample PGO, the change of profile is far less frequent than source code. But once the profile is changed, the incremental build will be broken: everything needs to be built from scratch. I guess the same applies to instrumentation based pgo?

I don't think so: for instrumentation based PGO is applied during the compile phase (there is certainly a key difference that does not make it possible with the sample based one).

Yes, it's in compile phase, but if the profile changes, the profile summary is very likely to change, which makes it necessary to recompile all functions unless the function has no profile at all. Or am I missing something?

Yes, it's in compile phase, but if the profile changes, the profile summary is very likely to change, which makes it necessary to recompile all functions unless the function has no profile at all. Or am I missing something?

Oh right, I missed your point, this seems valid to me!

tejohnson edited edge metadata.Dec 15 2016, 2:07 PM

The change looks ok to me, but I'm concerned about having the caching broken with ThinLTO + sample PGO. At least let's get that part reviewed, so these patches can go in back to back.

danielcdh updated this revision to Diff 81680.Dec 15 2016, 3:40 PM
danielcdh edited edge metadata.

Add hashing of sample profile content.

Looks pretty easy to add the hashing, so I included it in this patch.

mehdi_amini accepted this revision.Dec 15 2016, 4:15 PM
mehdi_amini added a reviewer: mehdi_amini.

LGTM.

This revision is now accepted and ready to land.Dec 15 2016, 4:15 PM
tejohnson added inline comments.Dec 15 2016, 4:24 PM
lib/LTO/LTO.cpp
135 ↗(On Diff #81680)

Will this be expensive, for large apps with large profiles? Since this is the same for each backend, can we compute once and pass down?

mehdi_amini added inline comments.Dec 15 2016, 4:29 PM
lib/LTO/LTO.cpp
135 ↗(On Diff #81680)

If we go this route (and we should), then I rather do it as a separate patch and do as I suggested originally when the config was added there: have a hash method on the config itself where we would hash early the whole config part and do it once early.

tejohnson accepted this revision.Dec 15 2016, 4:35 PM
tejohnson edited edge metadata.
tejohnson added inline comments.
lib/LTO/LTO.cpp
135 ↗(On Diff #81680)

Agreed, all the module-independent stuff should be hashed early and passed down. I suppose this is the best approach for now - maybe slow, but safe and enables this change to go in.

We can pull it out as part of the config early hashing implementation in a separate patch.

danielcdh closed this revision.Dec 16 2016, 8:59 AM