Page MenuHomePhabricator

[PGO] differentiate FE instrumentation and IR level instrumentation profiles

Authored by xur on Dec 15 2015, 2:37 PM.



This patch uses one bit in profile version to differentiate FE instrumentation and IR level instrumentation profiles. It is based on the suggestion in RFC: PGO Late instrumentation for LLVM

PGOInstrumenation now checks this bit to make sure it's IR level instrumentation profiles. PGOInstrumenation also creates a new COMDAT variable __llvm_profile_ir_level and sets the value to 1. This variable tells the profile runtime to set the right version in the generated profiles.

llvm-profdata also handles the profiles differently: mainly for the MaxFunctionCount. For FE profile, it only needs to find the max of the entry count (the first count in the function). For IR level profile, the entry count might not be available, we will set it as the maximum block count in the profile.

Diff Detail


Event Timeline

xur updated this revision to Diff 42913.Dec 15 2015, 2:37 PM
xur retitled this revision from to [PGO] differentiate FE instrumentation and IR level instrumentation profiles.
xur updated this object.
xur added reviewers: davidxl, silvas, bogner.
xur added a subscriber: llvm-commits.
davidxl added inline comments.Dec 15 2015, 3:35 PM
681 ↗(On Diff #42913)


682 ↗(On Diff #42913)

Why bit 60?

Also I think we should reserve perhaps 8 bits in the version so that we can use the same format for other variants of profile (for instance completely strips-off name if the user does not care about it). For that, we need to define a set of masks:

#define VARIANT_MASKS_ALL 0xf000000000000000ULL

687 ↗(On Diff #42913)

Why do you need to unset it?

689 ↗(On Diff #42913)

Also remember to sync up the copy in compiler-rt of this file

109 ↗(On Diff #42913)

Text format does not have version. Make this a boolean flag.

120 ↗(On Diff #42913)

Do not introduce this interface in the base class -- Version means different things across Raw and Indexed reader.

Simply add an interface to query if it is for late instr.

141 ↗(On Diff #42913)

Ok to add this field in implementation -- add a comment that Version value here is a OR of mask and real version.

166 ↗(On Diff #42913)

Do not expose version at interface level. Just 'isIRLevelProfile'

331 ↗(On Diff #42913)

This interface needs to be changed to mask out the flag bit -- otherwise you change the semantics of the old interface (which fortunately is not yet used).

{ return GET_VERSION(Index->getVersion()); }

Add a new interface for isIRLevelProfiling.

43 ↗(On Diff #42913)

make IsIRInstr a member field of the writer so that you don't need to change interfaces here.

112 ↗(On Diff #42913)

Missing documentation.

121 ↗(On Diff #42913)

Report error if V is not 0 nor 1.

224 ↗(On Diff #42913)

Do not use 'UNSET'..., but use GET_VERSION macro I suggested.

523 ↗(On Diff #42913)


127 ↗(On Diff #42913)

Check Dest's Counts with merged values.

150 ↗(On Diff #42913)

It is actually more readable with direct IndexedInstrProf::version | IR_PROF_VARIANT_MASK

669 ↗(On Diff #42913)

may be a helper function for this.

675 ↗(On Diff #42913)

As it is shared, put the string macro in -- see other section strings.

Also add an interface in InstrProf.h to retrieve the string -- do not hard code string here.

677 ↗(On Diff #42913)

Why align 8?

679 ↗(On Diff #42913)

No literal string

120 ↗(On Diff #42913)

Simplify the code like this:

  1. introduce a interface in writer:

setIsIRLevelProfile (bool true|false); On writer construction, the value is 'unknown'. When the set interface is called, it will check if it is set and compare previous value with current value -- reports Error when incosistence is detected.

  1. the code will be just: if (std::error_code EC = Writer.setIsIRLevelProfile(Reader->isIRLevelProfile()) exitWithError("....);
silvas edited edge metadata.Dec 15 2015, 3:58 PM

llvm-profdata also handles the profiles differently: mainly for the MaxFunctionCount. For FE profile, it only needs to find the max of the entry count (the first count in the function). For IR level profile, the entry count might not be available, we will set it as the maximum block count in the profile.

If I understand correctly, the motivation is that some functions might have already been inlined when IR-level instrumentation happens, and so we assume that any individual basic block could have been from an inlined function, so to recover that "function"'s count, we take the highest basic block count. This doesn't make sense. I expect MaxFunctionCount to be the maximum function count *at the point where the instrumentation was done*. And the instrumentation (whether Clang's stuff or the IR-level stuff) puts in an entry count on every function it sees at the point where the instrumentation was done.

An example of why this matters is that many of my customers have some extremely hot functions like operator+ on a Vec4 class. A big advantage of the advantage of the IR-level instrumentation (especially early-inlining) is to be able to get rid of these functions before the instrumentation happens. I want MaxFunctionCount to represent the maximum function count *after* early inlining has cleaned up all the trivial inlinable functions. Or, to look at it from a different perspective, my clients use Vec4 as much as int; we do not want to count operator+ on Vec4 for the same reason we do not keep a count for the built-in operator+ on int, and we do not want MaxFunctionCount to be skewed by it. Early inlining is an elegant solution. I consider the current behavior a feature.

Or to put it another way, I think that my suggestion of "simply do not instrument the top 1% hottest functions" was a potential "quick fix", but the early inlining (+ other early optimizations) is a much more elegant and complete solution. Once we have characterized the benefits of early optimizations we will be able to more clearly evaluate whether we need to avoid instrumenting the hottest functions.

Right now, I think we should focus on integration into clang, since that is necessary for characterizing the effects of early optimizations so that we can decide on a good set of "early" passes to run to minimize instrumentation overhead. (I am glad to help out once it has been integrated into clang)

xur updated this revision to Diff 43189.Dec 17 2015, 3:10 PM
xur edited edge metadata.
xur marked 20 inline comments as done.

Integrated David's comments.



davidxl added inline comments.Dec 18 2015, 11:36 AM
681 ↗(On Diff #43189)

typo : uint64_t, not uint_64_t.

691 ↗(On Diff #43189)

Use existing macro: INSTR_PROF_QUOTE, no need for a new helper macro.

146 ↗(On Diff #43189)

Raw --> raw

322 ↗(On Diff #43189)

do not remove override here -- note that getVersion here is for indexed format class hierarchy (to support different indexed formats ).

Note that Version really means format version, so the masking needs to be applied here too.

Add another interface to check isIRLevelProfiling.

322 ↗(On Diff #43189)

mask out the variant bits

339 ↗(On Diff #43189)

Remove 'Note that is a unmasked version' -- it is confusing.

340 ↗(On Diff #43189)

No need for GET_VERSION here -- it needs to be applied by the callee

342 ↗(On Diff #43189)

return Index->isIRLevelProfile();

30 ↗(On Diff #43189)

PF_None --> PF_Unknown

117 ↗(On Diff #43189)

Why not skipping comments and empty lines first:

// Skip empty lines and comments.

while (!Line.is_at_end() && (Line->empty() || Line->startswith("#")))
119 ↗(On Diff #43189)

if (!Line->startsWith(":")) {

isIRLevelProfile = false;
return success();


132 ↗(On Diff #43189)

Add a test case for the error.

xur marked 10 inline comments as done.Dec 18 2015, 12:01 PM
xur added inline comments.
117 ↗(On Diff #43189)

current code allows multiple instances of ":" line and the mixing of comments in between (late ones overrides the earlier ones).
if you think we should only allow one. We can simplify the code like you suggested.

119 ↗(On Diff #43189)

same as above. current code allows multiple ":" lines.

davidxl added inline comments.Dec 18 2015, 12:07 PM
117 ↗(On Diff #43189)

We don't want that kind of flexibility for the header structure. I would even suggest making it more strict -- the header lines (if there are any) must start at the first line of the file -- but I won't insist on that.

xur updated this revision to Diff 43258.Dec 18 2015, 12:37 PM
xur marked an inline comment as done.

updated the patch with David's new comments.

davidxl edited edge metadata.Dec 21 2015, 11:11 AM

As a follow up, there should also be a change in Clang FE that checks IR level profile is not misused by FE.

120 ↗(On Diff #43258)

Missing new field initialization.

30 ↗(On Diff #43258)

Fix format.

slingn added a subscriber: slingn.Dec 21 2015, 11:31 AM
slingn added inline comments.Dec 21 2015, 11:59 AM
57 ↗(On Diff #43258)

Is it possible to determine ProfileKind (and IsIRLevel) at construction time? If it is, that seems like it would be less error prone than inferring it from IsIRLevel / leaving as PF_Unknown if setIsIRLevelProfile().

113 ↗(On Diff #43258)

If the text format is intended to be mostly human readable it would be nice to use something like ':irlevel' rather than a number since that requires referring to the code.

130 ↗(On Diff #43258)

Does this handle the case where V could not be parsed as an integer - i.e. getAsInteger() returns true to indicate an error?

668 ↗(On Diff #43258)

minor: 'to let runtime' -> 'to make the runtime'

1 ↗(On Diff #43258)

typo: flat -> flag

A few more of the same in the other *.proftext files.

xur marked 5 inline comments as done.Dec 21 2015, 12:59 PM
xur added inline comments.
57 ↗(On Diff #43258)

in current profile merge, we generate the writer before reading the profile. So it's difficult to set the ProfileKind in the construct. We could sink the code of creation of the writer into the loop that reading the profiles (after reading the first one). But I'm not sure that coding style is good.

113 ↗(On Diff #43258)

The advantage of integer is that we can encode other profile variant into one number. As of now, this is the only variant but it's likely we will have more. Also note that usually the text format is generated by llvm-profdata in which case, we do have a comment to explain the the meaning of ":1".

130 ↗(On Diff #43258)

Good catch. Fixed in the new patch.

xur updated this revision to Diff 43393.Dec 21 2015, 1:01 PM
xur edited edge metadata.
xur marked an inline comment as done.

Integrated the latest review comments from David and Nathan.



This version looks good to me, but please wait for Justin's feedback on the approach in general.

xur updated this revision to Diff 45050.Jan 15 2016, 4:20 PM

rebased the patch with recent change. Also have the following changes:
(1) directly generate __llvm_profile_raw_version variable so that no compile_rt change is needed.
(2) use ":ir or :fe" instead of a number as a ProfileKind flag in text profile.

After this patch is done, the FE also needs to be changed to recognize IR profile and do the right handling.

710 ↗(On Diff #45050)

Please rebase the patch. This part is already in tree.

162 ↗(On Diff #45050)

Change the comments to:

The value of the version field of the raw profile data header. The lower 56 bits specifies the format version and the most significant 8 bits specify the variant types of the profile.

723 ↗(On Diff #45050)

This warning message needs to be revisited later once the clang option is finalized. Please add a TODO here.

xur updated this revision to Diff 46015.Jan 26 2016, 10:50 AM
xur marked 3 inline comments as done.

rebased the patch and integrated David's most recent comments.



davidxl added inline comments.Feb 2 2016, 8:21 PM
182 ↗(On Diff #46015)

MaxFunctionCount field is now removed (replaced by profileSummary) -- so this change can now be removed.

xur updated this revision to Diff 46938.Feb 4 2016, 11:50 AM

updated patch after synced to MaxFunctionCount cleanup commits by David

Add a negative test case with error when FE based profile is used in IR level profile-use.

xur updated this revision to Diff 46959.Feb 4 2016, 1:52 PM

remove one unused function in InstrProfWrite in last revision.

The latest patch uploaded does not look complete .

xur updated this revision to Diff 46965.Feb 4 2016, 2:54 PM

re-upload the patch.

davidxl accepted this revision.Feb 5 2016, 3:32 PM
davidxl edited edge metadata.


This revision is now accepted and ready to land.Feb 5 2016, 3:32 PM
This revision was automatically updated to reflect the committed changes.