ThinLTO needs to invoke SampleProfileLoader pass during link time in order to annotate profile correctly after module importing.
Details
Diff Detail
- Build Status
Buildable 2154 Build 2154: arc lint + arc unit
Event Timeline
I'm concerned about this indeed.
Why is this SampleProfileLoaderPass not performed during the compile phase?
I was planning to have it in a separate patch as it needs to hash the profile content too. Let me know if you think it should be part of this patch.
I'm concerned about this indeed.
Why is this SampleProfileLoaderPass not performed during the compile phase?
It is invoked in compiler phase. But if the profile is collected from ThinLTO binary, i.e. there is cross-module inlines in the profiling binary (thus in profile), the profile annotation needs to happen after all these inlines happened. i.e. annotation needs to be invoked once again in the linking phase.
Thanks,
Dehao
Ouch, I see: with sample-based profiling you can't change the callgraph after the instrumentation point to have the correct information, correct? Is changing the CFG OK though? (If you have a pointer about sample-based profiling where I can find the answer to these subtle points).
Hashing the full content of the profiles would break the "fine" grain of incremental build we have now. So I'd be cautious about that, even if this is likely a marginal use-case.
Considering sample pgo profile as a weighted-forest of inline stacks, each root corresponds to a symbol in the profile binary. The job of profile preparation is to expand the IR to resemble the hot part of the forest as in profile. More details about the sample pgo can be found in http://dl.acm.org/citation.cfm?id=2854044
At ThinLTO compile step, not all nodes is expandable as they could come from other modules. That's why we need to have SampleProfileLoader invoked after all nodes are visible.
In Sample PGO, the change of profile is far less frequent than source code. But once the profile is changed, the incremental build will be broken: everything needs to be built from scratch. I guess the same applies to instrumentation based pgo?
I have to read the citation to understand :)
In Sample PGO, the change of profile is far less frequent than source code. But once the profile is changed, the incremental build will be broken: everything needs to be built from scratch. I guess the same applies to instrumentation based pgo?
I don't think so: for instrumentation based PGO is applied during the compile phase (there is certainly a key difference that does not make it possible with the sample based one).
Yes, it's in compile phase, but if the profile changes, the profile summary is very likely to change, which makes it necessary to recompile all functions unless the function has no profile at all. Or am I missing something?
The change looks ok to me, but I'm concerned about having the caching broken with ThinLTO + sample PGO. At least let's get that part reviewed, so these patches can go in back to back.
lib/LTO/LTO.cpp | ||
---|---|---|
135 ↗ | (On Diff #81680) | Will this be expensive, for large apps with large profiles? Since this is the same for each backend, can we compute once and pass down? |
lib/LTO/LTO.cpp | ||
---|---|---|
135 ↗ | (On Diff #81680) | If we go this route (and we should), then I rather do it as a separate patch and do as I suggested originally when the config was added there: have a hash method on the config itself where we would hash early the whole config part and do it once early. |
lib/LTO/LTO.cpp | ||
---|---|---|
135 ↗ | (On Diff #81680) | Agreed, all the module-independent stuff should be hashed early and passed down. I suppose this is the best approach for now - maybe slow, but safe and enables this change to go in. We can pull it out as part of the config early hashing implementation in a separate patch. |