This is an archive of the discontinued LLVM Phabricator instance.

[PGO] context sensitive PGO
ClosedPublic

Authored by xur on Nov 6 2018, 12:17 PM.

Details

Summary

Current PGO profile counts are not context sensitive. The branch probabilities for the inlined functions are kept the same for all call-sites, and they might be very different from the actual branch probabilities. These suboptimal profile can greatly affect some downstream optimizations, in particular for machine basic block placement optimization.

In this patch, we propose to have a post-inline PGO instrumentation/use pass, which we called Context Sensitive PGO (CSPGO). For the users who want the best possible performance, they can perform a second round of PGO instrument/use on the top of the regular PGO. They will have two sets of profile counts. The first pass profile will be manly for inline, indirect-call promotion, and CGSCC simplification pass optimizations. The second pass profile is for post-inline optimizations and code-gen optimizations.

The typical usage models are:
// Regular PGO instrumentation and generate pass1 profile.

> clang -O2 -fprofile-generate source.c -o gen
> ./gen
> llvm-profdata merge default.*profraw -o pass1.profdata

// CSPGO instrumentation.

> clang -O2 -fprofile-use=pass1.profdata -fcs-profile-generate -o gen2
> ./gen2

// Merge two sets of profiles

> llvm-profdata merge default.*profraw pass1.profdata -o profile.profdata

// Use the combined profile. Pass manager will invoke two PGO use passes.

> clang -O2 -fprofile-use=profile.profdata -o use

The major changes are the following:

  • bump index profile version to indicate if there are context-sensitive counts.
  • change the FuncHash and reserve bits for context-sensitive counts.
  • change the ProfileSummary interface as we might have two sets of profile summaries.
  • change pass-manager (legacy and new)
  • change llvm-profdata to handle two kind of profiles.
  • change instrprofiling lowing for more aggressive register promotion for CSPGO.

Note that other than the FuncHash change, there is no effect on existing PGO pass.

This patches is functional tested using SPEC2006, with the various combinations of the following:

  • legacy-pass manager
  • new pass manager
  • thin-lto (through gold plugin)
  • thin-lto (though driver command line)
  • lto (through plugin)
  • non-lto

We performance-tested this patch with a few large Google benchmarks and saw good performance improvements.

I did not include any tests in this patch. I will add tests once the interfaces are decided in the code review.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
davidxl added inline comments.Nov 7 2018, 9:06 AM
include/llvm/ProfileData/InstrProf.h
777

nit: FlaginHash --> FlagInHash

780

FlaginHash -> FlagInHash.

902

There is probably no need to bump the version number if there is a new variant bit set in the profile data. The format of existing variant do not change.

include/llvm/ProfileData/InstrProfReader.h
148

Why? It would be nice to add support in text reader as well.

lib/Transforms/Instrumentation/PGOInstrumentation.cpp
618

This won't work well -- the non-CS FunctionHash may collide with CS hash. How do you differentiate?

Also doing this for nonCS profiling also breaks the backward compatibility -- old profile with high bits set in hash won't be found anymore. It may also create more hash conflicts.

xur added inline comments.Nov 7 2018, 10:01 AM
lib/Transforms/Instrumentation/PGOInstrumentation.cpp
618

That is the reason I bump the version number. I use the version number to tell if this is old Hash format or new Hash format.

davidxl added inline comments.Nov 7 2018, 10:10 AM
lib/Transforms/Instrumentation/PGOInstrumentation.cpp
618

Say there exists an old version profile with the 60-63 bits set for some functions. When this profile is used, the new compiler won't be able to find those functions.

Also where is the code that checks the version number?

Test cases are missing in the patch.

Hi Rong, at a high-level, I like what this patch is doing. I'll try to leave in-depth comments by next week -- please ping the review otherwise.

These suboptimal profile can greatly affect some downstream optimizations, in particular for machine basic block placement optimization.

If it's possible to get these numbers, it'd be interesting to know how the improvements from this patch compare to link-time or post-link block ordering tools (Bolt).

xur added inline comments.Nov 12 2018, 9:40 AM
lib/Transforms/Instrumentation/PGOInstrumentation.cpp
618

or this we will use one VARIANT bit in header to distinguish. AnnotateAllFunction will check if the VARIANT flag is set when reading CS profiles.
For old index profile, this won't be a problem as this bit will not be set.

This could be a problem for old raw profile. As of now, I did not change instrumentation run-time. So I could only use FuncHash to tell if this regular count or CS count. Setting of VARIANT bit in index profile relies on the new FuncHash scheme.

I can add change to profile runtime and set the VARIANT for raw profile. Then setting the VARIANT bit for index profile will not rely on check of FuncHash. This will fully backward compatible.

The new FuncHash scheme will be guarded by a check of VARIANT bit. Maybe we don't need bump the profile version.

Right now, It's only in IndexedInstrProfReader::readSummary(). I agree that we probably need to place more checks on this.

xur added a comment.Nov 12 2018, 9:40 AM
In D54175#1291940, @vsk wrote:

Hi Rong, at a high-level, I like what this patch is doing. I'll try to leave in-depth comments by next week -- please ping the review otherwise.

Sounds good. Looking forward to your reviews.

These suboptimal profile can greatly affect some downstream optimizations, in particular for machine basic block placement optimization.

If it's possible to get these numbers, it'd be interesting to know how the improvements from this patch compare to link-time or post-link block ordering tools (Bolt).

We mainly use a key google benchmark for performance evaluation. Note that this benchmark has been highly tuned. A performance gain of 0.5% is considered significant.

Our experiments shows 1.5% to 2% from CSPGO. The improvement is mainly from machine basicblock placement. CSPGO also sets the accurate function entry count (for the cases that inline are cross module). That will make the hot-text section more dense. This can be seen from instruction heat map. But the performance improvement was small.

BOLT can have similar performance boosts from basic block reordering if it's applied to regular PGO binary (i.e. ~2%)
BOLT can improve the CSPGO binary by 0.5% and that is from function splitting, which is not currently enabled in llvm.

davidxl added inline comments.Nov 12 2018, 3:53 PM
lib/Transforms/Instrumentation/PGOInstrumentation.cpp
618

I think it is better to emit variant bit in raw profile data. Relying on func hashing content sounds fragile.

The masking and adding bit in function hash is only needed for CS instrumentation and annotation. With this, the old profile (non CS) will still be valid. There is a slight downside that nonCS and CS profile can collide -- but this is very unlikely given all the inlining/cfg transformations that had happened.

vsk added inline comments.Nov 13 2018, 10:21 PM
include/llvm/Analysis/ProfileSummaryInfo.h
48

It doesn't seem necessary to add a flag to computeSummary or to add recomputeSummary to PSI's API.

Why not call ProfileSummaryInfo::invalidate when attaching CSPGO data to a Module? invalidate() could be modified to recompute the summary. That keeps the APIs a bit simpler.

include/llvm/IR/Module.h
860

I think {set, get}ProfileSummary would be safer & easier to understand if clients had to pass in a non-optional enum specifying which profile data kind they need (maybe use ProfileSummary::Kind for this?). That should make it impossible to mix-and-match CS + the wrong format.

include/llvm/IR/ProfileSummary.h
49–50

Does this really need to be in the header? Could we just take the opportunity to delete this declaration, and make the definition static?

include/llvm/Passes/PassBuilder.h
44–46

I think this could use a little more explanation (why would CSProfileGen not need ProfileUse?).

610–611

I think it'd be best to consolidate flags here, and replace {bool RunProfileGen, bool IsCS} with {enum InstrumentionKind} (or similar, there must be an enum for this already).

include/llvm/ProfileData/InstrProfReader.h
473–481

I'm not sure that the right behavior here should be to default to the non-context sensitive data. Why not either make the bool argument non-optional, or, take the max of the two?

500–508

I don't think the bool argument here should have a default value. Clients should explicitly pick which kind of summary they need.

include/llvm/ProfileData/InstrProfWriter.h
87

Using xor in this context is a bit unconventional. Could you rewrite this using ands/ors to make it easier to understand?

lib/ProfileData/InstrProfReader.cpp
771

No need to assert on the version, it's checked before readSummary is called.

771

Please write out the type of Summary here for clarity.

xur marked 3 inline comments as done.Nov 21 2018, 11:33 AM

Thanks vsk and david for the reviews. I will update the patch to integrate the comments.

include/llvm/Analysis/ProfileSummaryInfo.h
48

I agree that change invlidate() would be more clean (and better). My concern was that would change current behavior.
invalidate() be called by passmanager. Currently it does nothing, If I change that, profilesummary could be recomputed many time as most passes will invalidate all the analysis.

include/llvm/IR/Module.h
860

I will change to use an enum parameter (and no default).

include/llvm/IR/ProfileSummary.h
49–50

I think this is a good idea. the strings are only used in ProfileSummary.cpp. I will change.

include/llvm/Passes/PassBuilder.h
44–46

Sorry. This will be deleted.
I used this to work around an issue with out internal tool.

Thanks for catching this.

610–611

yes. I agree.

include/llvm/ProfileData/InstrProf.h
902

OK. no version change.

include/llvm/ProfileData/InstrProfReader.h
148

Text format is supported (mostly). I say mostly because there is not text string for CS format. But the flat is encoded in header (and in checksum).

473–481

I can change the take bool argument.

Taking max is not a good idea as the client need to know what kind of summary it uses. Mixing these two will cause wrong hot/cold attributes.

include/llvm/ProfileData/InstrProfWriter.h
87

OK. I will change to use and/or

lib/ProfileData/InstrProfReader.cpp
771

Do you mean comments?

vsk added inline comments.Nov 26 2018, 11:07 AM
include/llvm/Analysis/ProfileSummaryInfo.h
48

Would it be feasible to update any passes which inadvertently invalidate PSI to preserve it instead?

include/llvm/ProfileData/InstrProfReader.h
473–481

Sounds good.

lib/ProfileData/InstrProfReader.cpp
771

I meant 'ProfileSummary &Summary', instead of 'auto &Summary'. I think the coding guidelines suggest writing out the type explicitly when it may not be obvious from the immediate context. This is subjective though, I don't have a strong opinion about this.

xur marked 17 inline comments as done.Nov 29 2018, 11:54 AM
xur added inline comments.
include/llvm/Analysis/ProfileSummaryInfo.h
48

I found that we don't need to invalidate summary.
Normal PGO-use pass is before per-function optimization passes
and CS pass of PGO-use is in per-module optimization passes.
They do not share the analyse pass result -- All the results in
per-function optimization pass are discarded and recomputed
again in per-module optimization pass.

I will revert this change.

include/llvm/Passes/PassBuilder.h
610–611

hmm. IsCS is also used by ProfileUse. Even I combine like this, I still need two flags.

include/llvm/ProfileData/InstrProfReader.h
148

Now. "csir" is supported keyword in proftext format.

lib/Transforms/Instrumentation/PGOInstrumentation.cpp
618

Now the raw profile also emit variant.

But I still relying on funchash to tell if the profile for this function is CS or not.

I agree the chance of collision is low. I think what matter more is the summary computation. Functions with lots of select instruction will counts as CS profile and used in CS profile summary. This definitively happens but I'm not sure if matters for performance.

I lean to change the hash unconditionally. For old profiles, we will report a hash-mismatch and the users need to recollect the profile. I think this could happen for other compiler change (like changes before pgo instrumentation).

xur updated this revision to Diff 175921.Nov 29 2018, 12:08 PM

This new version integrated the review comments from David and Vsk. Noticeable changes are:
(1) Combined the following 3 field in PGOOptions struct

bool RunProfileGen;
bool RunCSProfileGen;
bool RunCSProfileUse;

into an enum. There three are mutually exclusive.

(2) Not use default arguments for a set of functions.

(3) Don't bump up index profile version

(4) CS variant flag is set in raw profile and use that to tell if the profile has CS counters.

(5) Better handle in profile text format.

(7) Revert ProfileSummary recomputation as it is not needed.

(8) Fix a bug in the change to instrprofiling promotion.

I will also update
https://reviews.llvm.org/D54177
and
https://reviews.llvm.org/D54176

xur updated this revision to Diff 176208.Nov 30 2018, 2:56 PM

Fix an issue in LegacyPassManager where CSPGO is not separated from -fprofile-use cleanly.

tejohnson added inline comments.Dec 3 2018, 10:07 AM
lib/LTO/LTOBackend.cpp
152

Presumably need a corresponding change in runOldPMPasses?

You'll also want to have a way to test via llvm-lto2. E.g. llvm-lto2 has an option for passing a sample profile.

tools/gold/gold-plugin.cpp
900

Needs tests

Also, support should be added to lld too - is there a separate patch for that?

tools/llvm-profdata/llvm-profdata.cpp
231

leftover debug output

xur marked 4 inline comments as done.Dec 3 2018, 10:51 AM
xur added inline comments.
lib/LTO/LTOBackend.cpp
152

You are right. I need to set the options for oldPM.

tools/gold/gold-plugin.cpp
900

I haven't done lld patch yet. There are too many moving parts at this stage.

tools/llvm-profdata/llvm-profdata.cpp
231

Removed.

xur updated this revision to Diff 176433.Dec 3 2018, 10:53 AM
xur marked an inline comment as done.

Integrated with Teresa's review comments.

I will add tests in a later patch.

In D54175#1317088, @xur wrote:

Integrated with Teresa's review comments.

I will add tests in a later patch.

I really think that tests for gold and for LTO should be added with this patch, since it is adding the functionality for those.

lib/LTO/LTOBackend.cpp
284

Is it deliberate that here in the old PM the RunCSIRInstr and CSIRProfile are handled regardless of whether SampleProfile is set, whereas in the new PM they are only handled when SampleProfile is not set? What should be the behavior if sample profile is specified along with CS profiling?

xur marked an inline comment as done.Dec 3 2018, 12:23 PM

As for the tests, I meant to add test cases to this patch later, not through a different patch.
I can add tests now.

lib/LTO/LTOBackend.cpp
284

It's not deliberate. I should have make the behavior consistent.
I think I would like to change new PM and make CSProfile independent of SampleProfile.
The reason is that -fprofile-use and -fprofile-sample-use are independent.
This usage does not make a lot of sense, and we should probably give some warnings. But that should be in command line handling, not here.

What do you think?

In D54175#1317243, @xur wrote:

As for the tests, I meant to add test cases to this patch later, not through a different patch.

Oh got it - that's totally fine. I misunderstood.

I can add tests now.

include/llvm/Passes/PassBuilder.h
44–46

Shouldn't this be (Action == NoAction || ...)? I.e. either we are doing no PGO action, or we better have one of the other fields set?

lib/LTO/LTOBackend.cpp
284

Yeah it would be good to make them consistent. Looking at the new PM though, I'm not sure how you can make them independent with the existing PGOOptions interface.

I noticed a similar inconsistency between new and old PM on the clang side patch, although it looks like it predates your changes (with the newPM the sample profile is only looked at if no IR profiling enabled, in the old PM it is looked at unconditionally).

xur marked 2 inline comments as done.Dec 3 2018, 3:26 PM
xur added inline comments.
include/llvm/Passes/PassBuilder.h
44–46

There are 4 possible values for Action: NoAction, IRInstr, CSIRInstr, CSIRUse
If NoAction, we need to have SampleProfileFile nonempty or ProfileUseFile nonempty. They are for sample fdo and fdo-use.
If the value is of the other 3 values, we don't check SampleProfile and ProfileUseFile.

I think this is correct but I agree the check can be improved.

lib/LTO/LTOBackend.cpp
284

SampleProfileFile, ProfileGenFile and ProfileUseFile are of different fields in PGOOptions interface. I think technically they can be independently set /checked. But it's really confusing if SampleProifle and Profile{Gen|Use}File are both on.

tejohnson added inline comments.Dec 4 2018, 6:49 AM
include/llvm/Passes/PassBuilder.h
44–46

I see. I think it would be a lot clearer if the Action was extended then to cover all the possible pgo types, it's confusing to have NoAction be set when e.g. sample profile or IR profile use is enabled.

lib/LTO/LTOBackend.cpp
284

Agree. Probably add some checking into PGOOptions that only one is set (maybe in combination with the extension to Action that I suggested above)?

xur updated this revision to Diff 177973.Dec 12 2018, 4:32 PM

Synced to newest change and fixed a few issues.

xur added a comment.Dec 12 2018, 4:37 PM

Ping.

If the reviewers are OK with the high level structures and the interfaces, I will start to add test cases.
I have also done the lld patch. I will submit a separated patch for review.

tejohnson added inline comments.Dec 13 2018, 9:08 AM
include/llvm/Passes/PassBuilder.h
44–46

Please address this comment (it's really unintuitive to have NoAction mean sample profile or IR profile use).

lib/LTO/LTOBackend.cpp
284

ping on this suggestion.

lib/Passes/PassBuilder.cpp
768

Add comment about the LTOPreLink check.

lib/Transforms/IPO/PassManagerBuilder.cpp
566

Document constant parameter (here and elsewhere). E.g. /*IsCS=*/true.

xur updated this revision to Diff 178479.Dec 17 2018, 9:36 AM

This new pass addresses the review comments from Teresa. Main changes are:
(1) used two enums in PGOOptions:

enum PGOAction { NoAction, IRInstr, IRUse, SampleUse };
enum CSPGOAction { NoCSAction, CSIRInstr, CSIRUse };

The first enum is for regular PGO. We will use one ProfileFile string to represent the profile (instrument path or profile path).
CSPGOAction can be set when PGOAction == NaAction, or PGOAction==IRUse.
for CSIRUse, the profile is the same as ProfileFile. For CSIRInstr, we need another string CSProfileGenFile to represent instrumentation file name.

NewPMDDriver already use a similar method for regular PGO.

(2) Changed the clang options handling to give error message when -fsample-profile-use is used with -fprofile-use or -fprofile-generate.

(3) added support in tools/opt/NewPMDriver.cpp. This will be used in tests.

xur updated this revision to Diff 181595.Jan 14 2019, 10:38 AM

Fixed the issue with lld (where lld is not happy with the COMDAT variables created
after LTO/ThinLTO)

Add the support of clang bootstrap for CSPGO.

Update the patch.

Tested this patch with google internal benchmarks (mostly for distributed ThinLTO).

Tested this patch with clang bootstrap for CSPGO.
(1) plain CSPGO (new and legacy pass manager).
(2) Thinlto CSPGO with plugin-opt to lld (new-experimental and legacy pass manager).
(3) Fulllto CSPGO with plugin-opt to lld (new-experimental and legacy pass manager).
CSPGO improves 1%-4% (wall time of ninja clang) upon regular PGO in clang bootstrap.
Thinlto and Fulllto is ~4% (wall time of ninja clang) in the new-experimental pass manager.
Compiling of individual files can improve up to 20%.

I'm working to add tests to this patch

Clang patch:
https://reviews.llvm.org/D54176

Compiler-rt patch:
https://reviews.llvm.org/D54177

lld patch:
https://reviews.llvm.org/D56675

Bunch of mostly nit suggestions about names and commenting, and a few questions. I like the improved profile action expression.

CMakeLists.txt
575 ↗(On Diff #181595)

nit: indentation seems off

cmake/modules/HandleLLVMOptions.cmake
816 ↗(On Diff #181595)

s/Nedd/Need

Also, I don't understand how the comment about needing to pass it to the backend compiler after ld-plugin relates to the handling below.

826 ↗(On Diff #181595)

This part seems unrelated - probably split into a different patch?

include/llvm/Passes/PassBuilder.h
50

document different values (in both enums)

60

This ordering doesn't seem to match the ordering of the checks in the assert below. In any case, rather than have the documentation just duplicate the code (it doesn't really add anything this way), it would be better to split up the following assert into multiple, with more of a plain english explanation of each one.

include/llvm/ProfileData/InstrProf.h
770

document

1016

this is just the addition of bit 60 being the CS flag, right? Maybe just say that?

include/llvm/ProfileData/InstrProfReader.h
438

Document new parameter here and elsewhere. I assume it is legal to have both a CS and a non-CS summary in this object? In that case, maybe the parameter name should be something like UseCS.

include/llvm/Transforms/Instrumentation/InstrProfiling.h
64 ↗(On Diff #181595)

document variable

include/llvm/Transforms/Instrumentation/PGOInstrumentation.h
38

document const parameter

lib/Passes/PassBuilder.cpp
520

Why do we skip size optimizations with CS?

tejohnson added inline comments.Jan 16 2019, 1:47 PM
lib/Passes/PassBuilder.cpp
772

document const parameter (here and in below call)

1014

document const parameter

1137

document const parameter (here and below call)

lib/ProfileData/InstrProf.cpp
1045 ↗(On Diff #181595)

nit - in createProfileNameVar the linkage initially set to weak any, then reset to external if supports comdat. Here you do the opposite - suggest making the logic the same in both places for consistency

lib/ProfileData/InstrProfReader.cpp
815–817

document const parameter here and in below call

lib/Transforms/IPO/PassManagerBuilder.cpp
535

s/proifle/profile/

536

Can you add a better explanation about linker issue - i.e. the linker wants to see all variables before the LTO link since it needs to resolve symbols/comdats.

568

document const parameter (here and below)

lib/Transforms/Instrumentation/InstrProfiling.cpp
276

document rationale for this equation

1010

Comment needed

lib/Transforms/Instrumentation/PGOInstrumentation.cpp
397

document

422

document

506

document

613

Probably needs a slight expansion on why that other info shouldn't affect function hash.

625

why reorder this?

1420

document const parameter. also add comment for why this is not needed when IsCS

1467

I don't see where the new parameter PSI is getting used.

1602

Here and below, needs comment about why PSI when IsCS. But see comment in annotateAllFunctions - I don't see it getting used there?

tools/llvm-profdata/llvm-profdata.cpp
701

How about replace these 4 lines with just:

if (FuncIsCS != ShowCS)

continue;
tools/opt/NewPMDriver.cpp
116

Can you do something similar to D56749 (moving and consolidating these opts between PMs)?

231

Is this still TODO? I see handling further below.

264

As an aside - theoretically someone might want to do CS profiling without regular PGO. Is it worth supporting that combination?

unittests/ProfileData/InstrProfTest.cpp
179

document const param

805

ditto

xur marked 41 inline comments as done.Jan 23 2019, 9:58 AM
xur added inline comments.
cmake/modules/HandleLLVMOptions.cmake
816 ↗(On Diff #181595)

Fixed the typo.

I think currently -profile-instr-use flag is only added to CMAKE_CXX_FLAGS and CMAKE_C_FLAGS. I need them to pass to linker and shared linker also. Since there is no flag for CSIR use build. I have to pass them unconditionally. For regular PGO, the flag is void operation.

I'll remove the line of CMAKE_CXX_FLAGS and CMAKE_C_FLAGS

826 ↗(On Diff #181595)
include/llvm/Passes/PassBuilder.h
60

Make sense. I will change this part.

lib/Passes/PassBuilder.cpp
520

This is for the early inline. We skip the early inline if for optimization for size.
For CS, we will not do early inline either.

lib/Transforms/Instrumentation/PGOInstrumentation.cpp
613

This change actually will affect function hash -- if an old profile is passed here, an non-cs record could be recognized as CS profile. But that will most likely result in a CFG hash mismatch where the user will get a warning.

625

not intentionally. I will revert this change.

1467

Good catch! This is from my other changes that fixes BB profile count. They should be removed.

1602

Fixed. PSI is not used here.

tools/llvm-profdata/llvm-profdata.cpp
701

Yes!

tools/opt/NewPMDriver.cpp
231

Yes. we should remove this TODO.

264

Since both CSprofile and non-CS-profile are merged into one profile. I think it's earlier to assume regular PGO is enabled for CSPGO. If there is a need for support the other way, I will add later.

xur updated this revision to Diff 183124.Jan 23 2019, 10:12 AM
xur marked 6 inline comments as done.

Integrated Teresa's review comments.

Also fixed a bug that can cause mismatch in new pass manager -- some optimization passes are under guard of PGOOptions::IRUse. LTOBackend sets PGOAction to NOAction for CSIRInstr (because regular profile is not passed down for CSIRInstr). This leads to different IR for CSIRInstr and CSIRUse.

This patch sets the PGOAction to IRUSE for CSIRInstr.

xur updated this revision to Diff 183899.Jan 28 2019, 10:09 AM

Fixed a pass ordering issue in legacy pass manager for ThinLTO: Need to place CS Instrumentation after EliminateAvailableExternallyPass. This can cause duplicate profile variables that leads to bad profile for online profile merge.

Also synced the trunk and fixed a merge issue.

Mostly minor comments inline. But still missing tests. Needs testing via new gold-plugin and opt options. Should also add options to llvm-lto2 to enable testing the LTOBackend path without needing gold or any other specific linker.

include/llvm/Passes/PassBuilder.h
71

Should we also have IRUse when CSAction is CSIRInstr?

include/llvm/ProfileData/InstrProfReader.h
438

The same change should be done elsewhere in this class (document and change to UseCS). There are a couple other interfaces with the same option.

include/llvm/ProfileData/InstrProfWriter.h
78

Document new parameter (there is currently nowhere in the file that indicates CS means context sensitive).

include/llvm/Transforms/Instrumentation.h
92

Document new parameter here and further down in createInstrProfilingLegacyPass (there is currently nowhere in the file that indicates CS means context sensitive).

lib/Passes/PassBuilder.cpp
520

Please add comment to that effect.

lib/Transforms/IPO/PassManagerBuilder.cpp
288

Similar suggestion to the one I had for new PM (add comment about now not when IsCS).

lib/Transforms/Instrumentation/InstrProfiling.cpp
1010

typo in instrumentation (missing n in the middle).

tools/opt/opt.cpp
290 ↗(On Diff #183899)

The reason I suggested moving this here was to use the same set of options to set up CSPGO for the old PM. But it doesn't seem like it is being set up when invoked via opt. I'd expect something down in AddOptimizationPasses like where it sets up regular IR PGO PassManagerBuilder options.

Please also add a profile dumping/reading round trip test with isCS flag (there are existing similar test for llvm-profdata)

text-format --> index-format-> text-format : diff text-format

include/llvm/ProfileData/InstrProfWriter.h
92

What is this (promotion)'s use case?

include/llvm/Transforms/Instrumentation/PGOInstrumentation.h
31

Add documentation for the new class. Also explain why it needs to be created in a pass.

lib/Analysis/ProfileSummaryInfo.cpp
95

This is confusing. It is better to just pass 'IsCS' flag to getProfileSummary method.

lib/ProfileData/InstrProf.cpp
1015 ↗(On Diff #183899)

nit: createProfileFileNameVar.

Can this one be split out as a NFC change?

tools/llvm-profdata/llvm-profdata.cpp
879

New option needs to be documented in command line guide.

xur updated this revision to Diff 185907.Feb 7 2019, 6:24 PM
xur marked 14 inline comments as done.

Integrated the review comments from Teresa and David.
Added unit tests for
(1) llvm-profdata
(2) context-sensitive profile summary
(3) opt new options and passes
(4) gold plugin
(5) using lto2 to test csgen and csuse.

davidxl added inline comments.Feb 8 2019, 9:09 AM
include/llvm/ProfileData/InstrProfWriter.h
92

Can you explain the 'promotion' case here (as comments)?

lib/Analysis/ProfileSummaryInfo.cpp
95

how about this comment?

lib/Transforms/Instrumentation/PGOInstrumentation.cpp
462

Remove this.

xur updated this revision to Diff 185994.Feb 8 2019, 9:44 AM

Fixed the missing changes David mentioned.

include/llvm/Passes/PassBuilder.h
71

CSIRInstr can run without IRUSE. CSIRUse needs to IRUse (the only reason is they share the profile).

include/llvm/ProfileData/InstrProfWriter.h
92

when merging a non-cs profile with cs profile, the result is cs profile.

lib/ProfileData/InstrProf.cpp
1015 ↗(On Diff #183899)

split the change to nfc r353230.

tools/opt/opt.cpp
290 ↗(On Diff #183899)

OK. Got it.

xur updated this revision to Diff 186019.Feb 8 2019, 11:50 AM

Forgot to include one more test:
A test/Other/cspgo-O2-pipeline.ll

xur updated this revision to Diff 186529.Feb 12 2019, 12:23 PM

Removed the unrelated instructions (comments) from the tests.

LGTM, but wait for @davidxl to do final approval since I'm not sure his last comments addressed.

davidxl accepted this revision.Feb 25 2019, 9:27 AM

lgtm

This revision is now accepted and ready to land.Feb 25 2019, 9:27 AM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptFeb 26 2019, 2:38 PM
ABataev added inline comments.
llvm/trunk/include/llvm/Transforms/Instrumentation/PGOInstrumentation.h
41 ↗(On Diff #188465)

Seems to me, it breaks the build with the shared libraries. Could you take a look at this?
lib/Passes/CMakeFiles/LLVMPasses.dir/PassBuilder.cpp.o: In function `llvm::PGOInstrumentationGenCreateVar::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&)':
/build/llvm/include/llvm/Transforms/Instrumentation/PGOInstrumentation.h:42: undefined reference to `llvm::createProfileFileNameVar(llvm::Module&, llvm::StringRef)'
/build/llvm/include/llvm/Transforms/Instrumentation/PGOInstrumentation.h:43: undefined reference to `llvm::createIRLevelProfileFlagVar(llvm::Module&, bool)'

rupprecht added inline comments.
llvm/trunk/include/llvm/Transforms/Instrumentation/PGOInstrumentation.h
41 ↗(On Diff #188465)

Not OP, but I also hit this. rL355346 should fix it.

ABataev added inline comments.Mar 4 2019, 2:57 PM
llvm/trunk/include/llvm/Transforms/Instrumentation/PGOInstrumentation.h
41 ↗(On Diff #188465)

Thanks for the fix!

hctim added a subscriber: hctim.Mar 6 2019, 12:55 PM

Looks like this commit has broken the clang-hexagon-elf#23291: build. PTAL :)

FAIL: LLVM :: Transforms/PGOProfile/thinlto_cspgo_gen.ll (41188 of 44167)
******************** TEST 'LLVM :: Transforms/PGOProfile/thinlto_cspgo_gen.ll' FAILED ********************
Script:
--
: 'RUN: at line 1';   /var/lib/buildbot/slaves/hexagon-build-03/clang-hexagon-elf/stage1/bin/opt -module-summary /var/lib/buildbot/slaves/hexagon-build-03/clang-hexagon-elf/llvm/test/Transforms/PGOProfile/thinlto_cspgo_gen.ll -o /var/lib/buildbot/slaves/hexagon-build-03/clang-hexagon-elf/stage1/test/Transforms/PGOProfile/Output/thinlto_cspgo_gen.ll.tmp1.bc
: 'RUN: at line 2';   /var/lib/buildbot/slaves/hexagon-build-03/clang-hexagon-elf/stage1/bin/opt -module-summary /var/lib/buildbot/slaves/hexagon-build-03/clang-hexagon-elf/llvm/test/Transforms/PGOProfile/Inputs/thinlto_cspgo_bar_gen.ll -o /var/lib/buildbot/slaves/hexagon-build-03/clang-hexagon-elf/stage1/test/Transforms/PGOProfile/Output/thinlto_cspgo_gen.ll.tmp2.bc
: 'RUN: at line 3';   /var/lib/buildbot/slaves/hexagon-build-03/clang-hexagon-elf/stage1/bin/llvm-lto2 run -lto-cspgo-profile-file=alloc -lto-cspgo-gen -save-temps -o /var/lib/buildbot/slaves/hexagon-build-03/clang-hexagon-elf/stage1/test/Transforms/PGOProfile/Output/thinlto_cspgo_gen.ll.tmp /var/lib/buildbot/slaves/hexagon-build-03/clang-hexagon-elf/stage1/test/Transforms/PGOProfile/Output/thinlto_cspgo_gen.ll.tmp1.bc /var/lib/buildbot/slaves/hexagon-build-03/clang-hexagon-elf/stage1/test/Transforms/PGOProfile/Output/thinlto_cspgo_gen.ll.tmp2.bc    -r=/var/lib/buildbot/slaves/hexagon-build-03/clang-hexagon-elf/stage1/test/Transforms/PGOProfile/Output/thinlto_cspgo_gen.ll.tmp1.bc,foo,pl    -r=/var/lib/buildbot/slaves/hexagon-build-03/clang-hexagon-elf/stage1/test/Transforms/PGOProfile/Output/thinlto_cspgo_gen.ll.tmp1.bc,bar,l    -r=/var/lib/buildbot/slaves/hexagon-build-03/clang-hexagon-elf/stage1/test/Transforms/PGOProfile/Output/thinlto_cspgo_gen.ll.tmp1.bc,main,plx    -r=/var/lib/buildbot/slaves/hexagon-build-03/clang-hexagon-elf/stage1/test/Transforms/PGOProfile/Output/thinlto_cspgo_gen.ll.tmp1.bc,__llvm_profile_filename,plx    -r=/var/lib/buildbot/slaves/hexagon-build-03/clang-hexagon-elf/stage1/test/Transforms/PGOProfile/Output/thinlto_cspgo_gen.ll.tmp1.bc,__llvm_profile_raw_version,plx    -r=/var/lib/buildbot/slaves/hexagon-build-03/clang-hexagon-elf/stage1/test/Transforms/PGOProfile/Output/thinlto_cspgo_gen.ll.tmp2.bc,bar,pl    -r=/var/lib/buildbot/slaves/hexagon-build-03/clang-hexagon-elf/stage1/test/Transforms/PGOProfile/Output/thinlto_cspgo_gen.ll.tmp2.bc,odd,pl    -r=/var/lib/buildbot/slaves/hexagon-build-03/clang-hexagon-elf/stage1/test/Transforms/PGOProfile/Output/thinlto_cspgo_gen.ll.tmp2.bc,even,pl    -r=/var/lib/buildbot/slaves/hexagon-build-03/clang-hexagon-elf/stage1/test/Transforms/PGOProfile/Output/thinlto_cspgo_gen.ll.tmp2.bc,__llvm_profile_filename,x    -r=/var/lib/buildbot/slaves/hexagon-build-03/clang-hexagon-elf/stage1/test/Transforms/PGOProfile/Output/thinlto_cspgo_gen.ll.tmp2.bc,__llvm_profile_raw_version,x
: 'RUN: at line 14';   /var/lib/buildbot/slaves/hexagon-build-03/clang-hexagon-elf/stage1/bin/llvm-dis /var/lib/buildbot/slaves/hexagon-build-03/clang-hexagon-elf/stage1/test/Transforms/PGOProfile/Output/thinlto_cspgo_gen.ll.tmp.1.4.opt.bc -o - | /var/lib/buildbot/slaves/hexagon-build-03/clang-hexagon-elf/stage1/bin/FileCheck /var/lib/buildbot/slaves/hexagon-build-03/clang-hexagon-elf/llvm/test/Transforms/PGOProfile/thinlto_cspgo_gen.ll --check-prefix=CSGEN
--
Exit Code: 1

Command Output (stderr):
--
llvm-lto2: LTO::run failed: No available targets are compatible with triple "x86_64-unknown-linux-gnu"

--

********************
PASS: LLVM :: Transforms/PartiallyInlineLibCalls/nobuiltin.ll (41189 of 44167)
PASS: LLVM :: Transforms/PartiallyInlineLibCalls/bad-prototype.ll (41190 of 44167)
PASS: LLVM :: Transforms/PGOProfile/memop_size_annotation.ll (41191 of 44167)
PASS: LLVM :: Transforms/PGOProfile/loop1.ll (41192 of 44167)
PASS: LLVM :: Transforms/PhaseOrdering/2010-03-22-empty-baseclass.ll (41193 of 44167)
PASS: LLVM :: Transforms/PGOProfile/loop2.ll (41194 of 44167)
FAIL: LLVM :: Transforms/PGOProfile/thinlto_cspgo_use.ll (41195 of 44167)
******************** TEST 'LLVM :: Transforms/PGOProfile/thinlto_cspgo_use.ll' FAILED ********************
Script:
--
: 'RUN: at line 1';   /var/lib/buildbot/slaves/hexagon-build-03/clang-hexagon-elf/stage1/bin/opt -module-summary /var/lib/buildbot/slaves/hexagon-build-03/clang-hexagon-elf/llvm/test/Transforms/PGOProfile/thinlto_cspgo_use.ll -o /var/lib/buildbot/slaves/hexagon-build-03/clang-hexagon-elf/stage1/test/Transforms/PGOProfile/Output/thinlto_cspgo_use.ll.tmp1.bc
: 'RUN: at line 2';   /var/lib/buildbot/slaves/hexagon-build-03/clang-hexagon-elf/stage1/bin/opt -module-summary /var/lib/buildbot/slaves/hexagon-build-03/clang-hexagon-elf/llvm/test/Transforms/PGOProfile/Inputs/thinlto_cspgo_bar_use.ll -o /var/lib/buildbot/slaves/hexagon-build-03/clang-hexagon-elf/stage1/test/Transforms/PGOProfile/Output/thinlto_cspgo_use.ll.tmp2.bc
: 'RUN: at line 3';   /var/lib/buildbot/slaves/hexagon-build-03/clang-hexagon-elf/stage1/bin/llvm-profdata merge /var/lib/buildbot/slaves/hexagon-build-03/clang-hexagon-elf/llvm/test/Transforms/PGOProfile/Inputs/thinlto_cs.proftext -o /var/lib/buildbot/slaves/hexagon-build-03/clang-hexagon-elf/stage1/test/Transforms/PGOProfile/Output/thinlto_cspgo_use.ll.tmp3.profdata
: 'RUN: at line 4';   /var/lib/buildbot/slaves/hexagon-build-03/clang-hexagon-elf/stage1/bin/llvm-lto2 run -lto-cspgo-profile-file=/var/lib/buildbot/slaves/hexagon-build-03/clang-hexagon-elf/stage1/test/Transforms/PGOProfile/Output/thinlto_cspgo_use.ll.tmp3.profdata -save-temps -o /var/lib/buildbot/slaves/hexagon-build-03/clang-hexagon-elf/stage1/test/Transforms/PGOProfile/Output/thinlto_cspgo_use.ll.tmp /var/lib/buildbot/slaves/hexagon-build-03/clang-hexagon-elf/stage1/test/Transforms/PGOProfile/Output/thinlto_cspgo_use.ll.tmp1.bc /var/lib/buildbot/slaves/hexagon-build-03/clang-hexagon-elf/stage1/test/Transforms/PGOProfile/Output/thinlto_cspgo_use.ll.tmp2.bc    -r=/var/lib/buildbot/slaves/hexagon-build-03/clang-hexagon-elf/stage1/test/Transforms/PGOProfile/Output/thinlto_cspgo_use.ll.tmp1.bc,foo,pl    -r=/var/lib/buildbot/slaves/hexagon-build-03/clang-hexagon-elf/stage1/test/Transforms/PGOProfile/Output/thinlto_cspgo_use.ll.tmp1.bc,bar,l    -r=/var/lib/buildbot/slaves/hexagon-build-03/clang-hexagon-elf/stage1/test/Transforms/PGOProfile/Output/thinlto_cspgo_use.ll.tmp1.bc,main,plx    -r=/var/lib/buildbot/slaves/hexagon-build-03/clang-hexagon-elf/stage1/test/Transforms/PGOProfile/Output/thinlto_cspgo_use.ll.tmp2.bc,bar,pl    -r=/var/lib/buildbot/slaves/hexagon-build-03/clang-hexagon-elf/stage1/test/Transforms/PGOProfile/Output/thinlto_cspgo_use.ll.tmp2.bc,odd,pl    -r=/var/lib/buildbot/slaves/hexagon-build-03/clang-hexagon-elf/stage1/test/Transforms/PGOProfile/Output/thinlto_cspgo_use.ll.tmp2.bc,even,pl
: 'RUN: at line 11';   /var/lib/buildbot/slaves/hexagon-build-03/clang-hexagon-elf/stage1/bin/llvm-dis /var/lib/buildbot/slaves/hexagon-build-03/clang-hexagon-elf/stage1/test/Transforms/PGOProfile/Output/thinlto_cspgo_use.ll.tmp.1.4.opt.bc -o - | /var/lib/buildbot/slaves/hexagon-build-03/clang-hexagon-elf/stage1/bin/FileCheck /var/lib/buildbot/slaves/hexagon-build-03/clang-hexagon-elf/llvm/test/Transforms/PGOProfile/thinlto_cspgo_use.ll --check-prefix=CSUSE
--
Exit Code: 1

Command Output (stderr):
--
llvm-lto2: LTO::run failed: No available targets are compatible with triple "x86_64-unknown-linux-gnu"

--

********************
ABataev removed a subscriber: ABataev.Mar 6 2019, 12:57 PM
pcc added a subscriber: pcc.Apr 1 2019, 9:00 PM
pcc added inline comments.
llvm/trunk/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
673 ↗(On Diff #188465)

Hi Rong, I discovered a problem with this line of code. If the number of select instructions when the profile was collected differs by a multiple of 16 from the number when the profile is used, we get a hash collision which results in us hitting an assertion here:
http://llvm-cs.pcc.me.uk/lib/Transforms/Instrumentation/PGOInstrumentation.cpp#1042

Granted, the same problem seems to have already existed for differences that are multiples of 256, but this seems to make the problem more likely to occur.

xur marked 2 inline comments as done.Apr 2 2019, 2:51 PM
xur added inline comments.
llvm/trunk/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
673 ↗(On Diff #188465)

I sent this patch for review.

https://reviews.llvm.org/D60154