This is an archive of the discontinued LLVM Phabricator instance.

[PGO] Refactor Value Profiling into a plugin based oracle and create a well defined API for the plugins.
ClosedPublic

Authored by w2yehia on Sep 23 2019, 8:37 AM.

Details

Summary

This PR creates a utility class called ValueProfileOracle that tells PGOInstrumentationGen and PGOInstrumentationUse what to value-profile and where to attach the profile metadata.
It then refactors logic scattered in PGOInstrumentation.cpp into two plugins that plug into the ValueProfileOracle.

Diff Detail

Repository
rL LLVM

Event Timeline

w2yehia created this revision.Sep 23 2019, 8:37 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 23 2019, 8:37 AM

In general, I like the cleanup of extracting out the value profiling handling into a uniform API. I'm not convinced it needs something as complex as the plugin interface though, as opposed to a wrapper class that has an instance of each type and invokes the appropriate run() command given the requested profile type. Although perhaps there isn't a big downside of doing it that way either, not sure of the overhead of the resulting template instantiation.

Comment below about naming conventions.

llvm/lib/Transforms/Instrumentation/ValueProfileOracle.cpp
35
w2yehia updated this revision to Diff 221943.Sep 26 2019, 7:24 AM

Changed 'VPOPluginChain_t' to 'VPOPluginChainFinal` based on Teresa's comment.

w2yehia marked an inline comment as done.Sep 26 2019, 7:25 AM

This generally looks good, but I suggest changing the name of ValueProfileOracle to ValueProfileCollector to make it explicit.

w2yehia updated this revision to Diff 222128.Sep 27 2019, 11:09 AM

I enclosed the VPOPluginChain template class in an unnamed namespace so that all instantiated member functions (and constructors) get internal linkage rather than external weak, which will hopefully give more reason to inline them and eliminate their definitions. Thanks to Hubert Tong for this suggestion.

This did not make a difference to the symbol export list inside the clang executable or inside the shared library (libLLVMInstrumentation.so.10svn) that contains ValueProfileOracle.cpp, because the build compiler (clang 8.0) I'm using was able to inline all VPOPluginChain member functions and eliminate their defs.

w2yehia updated this revision to Diff 222219.Sep 27 2019, 11:58 AM

Renamed ValueProfileOracle to ValueProfileCollector. I agree that at the moment, this utility class has a single purpose and that's to collect value profiling candidates; so the new name makes sense.

One more minor nit about naming conventions. Otherwise it looks ok to me, but wait for David to sign off.

llvm/lib/Transforms/Instrumentation/ValueProfileCollector.cpp
47 ↗(On Diff #222219)

No underscore

w2yehia updated this revision to Diff 222227.Sep 27 2019, 12:28 PM
  1. Renamed Plugin_t to PluginT, and checked that no other type name has an underscore.
  2. Renamed VPCPluginChain to PluginChain
w2yehia marked an inline comment as done.Sep 27 2019, 12:29 PM
davidxl accepted this revision.Sep 29 2019, 11:01 AM

lgtm, but please wait to see if xur@ has more comments.

This revision is now accepted and ready to land.Sep 29 2019, 11:01 AM

Teresa and David, thanks for reviewing.
Rong Xu, I'll wait for your approval.

xur added a comment.Oct 1 2019, 11:42 AM

Sorry it takes long to review this.
I have a pending patch that value profiling the expensive MOD/REM instructions. It is more tightly coupled to edge profile (than current mem-size-op and icall value profile).
The difference is that it does not have <valuesite, value> pair. The counts are stored like the edge counts.
It seems to be fine with patch as the instrumentation says in PGOInstrumenation.cpp and only the candidate collecting is extracted out.

Overall, I'm fine with this refactor/improvement.

llvm/lib/Transforms/Instrumentation/ValueProfileCollector.h
49 ↗(On Diff #222227)

Sorry, I don't follow this paragraph. What is 'total" here, the BB count for "if ...", or the total count for memcpy"?

We do need a match for the time of instrumentation and use.

58 ↗(On Diff #222227)

For value profile, we usually find the target instruction and instrument right before it. The annotation will be on that instruction.
So here InsertPT and AnnotatedInstr should be always the same.
I don't think of a need to differentiating this two.

w2yehia marked 2 inline comments as done.Oct 1 2019, 1:29 PM

Thanks Rong.
Let me know what you think.

llvm/lib/Transforms/Instrumentation/ValueProfileCollector.h
49 ↗(On Diff #222227)

Here I'm trying to demonstrate to the reader that at different points in a user program the value profile info that can be potentially collected for a particular expression are not the same.
I'm using the term 'total' in the same sense as the 3rd operand (also called "total" I think) of a value profile metadata node.
So, total=100 is the total number of data points (i.e. values) that nn took when measured at the beginning of the function foo;
and total=15 is the total number of data pointer that nn took when measured right before executing the memcpy (so inside the if branch).

I understand that the IR has to match at the time of instrumentation and the time of use.

58 ↗(On Diff #222227)

It is true that for the current two plugins (memop and ICP) the Insertion point and Annotated Instruction are the same.
But for something like loop trip count profiling (something we're experimenting with), we found it necessary to have separate instruction pointers.
To be more exact, the loop trip count value profile MD gets stored on the branch instruction of the loop latch block, but the insertion point is somewhere in the loop pre-header.

xur added inline comments.Oct 1 2019, 2:40 PM
llvm/lib/Transforms/Instrumentation/ValueProfileCollector.h
49 ↗(On Diff #222227)

OK. The profiled value should reflect the value being profiled (the InsertPt instruction).
The user should be responsible for that. For that case, if one instruments memcpy(...) , they should get total=15 etc.
If they instrument "if (b)", they should get total=100.

The use part should know do different things based on what is instrumented. I think you really meant something like
"Value profiling an expression means to track the values that this expression

takes at runtime and the frequency of each value at the instrumented instructions"
58 ↗(On Diff #222227)

OK. But in theory, the collect does not need to know the annotaedInstr as it will not be used in instrumentation. Right?

w2yehia marked 2 inline comments as done.Oct 1 2019, 5:35 PM
w2yehia added inline comments.
llvm/lib/Transforms/Instrumentation/ValueProfileCollector.h
49 ↗(On Diff #222227)

Yes. the "at the instrumented instructions" addition is necessary to make the statement more precise.
Actually, I wrote this whole paragraph to point out a subtle point that I missed myself when working on value profiling, and that is the program location where the value of an expression is recorded matters.

58 ↗(On Diff #222227)

I assume you are talking about the ValueProfileCollector when you say "the collect".
The ValueProfileCollector is used in both the PGOInstrumentationGen and PGOInstrumentationUse phases.
However, it is true that only the Value *V and Instruction *InsertPt fields are used in the PGOInstrumentationGen phase, while only the Instruction *AnnotatedInstr field is used in the PGOInstrumentationUse phase.

The reason all three members are in one struct rather than in two structs (one for each phase) is that it was natural to generate all pieces of information about a particular candidate in one place in the code, and then have each phase use what it needs from it.

xur accepted this revision.Oct 2 2019, 10:50 AM

LGTM.

Thanks for improving this.

This revision was automatically updated to reflect the committed changes.