This is an archive of the discontinued LLVM Phabricator instance.

[SamplePGO] Stale profile matching(part 1)
ClosedPublic

Authored by wlei on Apr 3 2023, 10:14 AM.

Details

Summary

AutoFDO/CSSPGO often has to deal with stale profiles collected on binaries built from several revisions behind release. It’s likely to get incorrect profile annotations using the stale profile, which results in unstable or low performing binaries. Currently for source location based profile, once a code change causes a profile mismatch, all the locations afterward are mismatched, the affected samples or inlining info are lost. If we can provide a matching framework to reuse parts of the mismatched profile - aka incremental PGO, it will make PGO more stable, also increase the optimization coverage and boost the performance of binary.

This patch is the part 1 of stale profile matching, summary of the implementation:

  • Added a structure for the matching result:LocToLocMap, which is a location to location map meaning the location of current build is matched to the location of the previous build(to be used to query the “stale” profile).
  • In order to use the matching results for sample query, we need to pass them to all the location queries. For code cleanliness, we added a new pointer field(IRToProfileLocationMap) to FunctionSamples.
  • Added a wrapper(mapIRLocToProfileLoc) for the query to the location, the location from input IR will be remapped to the matched profile location.
  • Added a new switch --salvage-stale-profile.
  • Some refactoring for the staleness detection.

Test case is in part 2 with the matching algorithm.

Diff Detail

Event Timeline

wlei created this revision.Apr 3 2023, 10:14 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 3 2023, 10:14 AM
wlei requested review of this revision.Apr 3 2023, 10:14 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 3 2023, 10:14 AM
wlei edited the summary of this revision. (Show Details)Apr 4 2023, 9:54 AM
wlei added reviewers: hoy, wenlei, xur, davidxl.
hoy added inline comments.Apr 5 2023, 10:55 AM
llvm/include/llvm/ProfileData/SampleProf.h
1247

nit: "generated by" -> "used for"

llvm/lib/Transforms/IPO/SampleProfile.cpp
448

nit: For each function, the matcher generates a map, of which each entry is a mapping from the location
of current build to the location in the profile.

451

Where is this DS populated? Is it in a separate patch?

2197

Condition this under ReportProfileStaleness || PersistProfileStaleness ?

Thanks for working on this. Looks good overall, except a few naming nits.

llvm/include/llvm/ProfileData/SampleProf.h
845

similarly, mapIRLocToProfileLoc?

1264

How about name it IRToProfileLocationMap to make it more explicit.

llvm/lib/Transforms/IPO/SampleProfile.cpp
134

nit: we always match profile regardless of staleness. how about something like salvage-stale-profile (or sample-profile-salvage-stale)

451

same question. I saw it being populated in the 2nd patch, but the split between the two patches is a bit weird if its population is left out. not a big deal though as long as the entire stack works..

2100

With the refactoring, this function is now narrower, and perhaps we could rename it to something like countProfileMismatches to reflect the narrowed scope.

Nice.

Just browsed the patch, I don't seem to find how FUncToMatchingsMap is populated which is the core of the stale profile matching algorithm. Does algorithm use some anchor points to do the remapping?

By the way, another way is to offline remapping --- basically 'correcting' the stale profile by mapping the old locations to the new locations and produce a refreshed profile. This is what we plan to do internally.

hoy added a comment.Apr 5 2023, 11:53 AM

By the way, another way is to offline remapping --- basically 'correcting' the stale profile by mapping the old locations to the new locations and produce a refreshed profile. This is what we plan to do internally.

This sounds interesting. By offline do you mean the profile generation time? Wondering how to get the new locations without building the new binary.

By the way, another way is to offline remapping --- basically 'correcting' the stale profile by mapping the old locations to the new locations and produce a refreshed profile. This is what we plan to do internally.

This sounds interesting. By offline do you mean the profile generation time? Wondering how to get the new locations without building the new binary.

Right -- the source revision information is available for any target in the build environment.

llvm/lib/Transforms/IPO/SampleProfile.cpp
451

It is better make that part available to review at the same time for better context.

mtrofin added inline comments.
llvm/include/llvm/ProfileData/SampleProf.h
1061

drive-by nits:

  • s/M/LTLM or basically anything that's not suggestive of "M == Module", which is seen everywhere else in the codebase
  • can StaleProfileMappings be const LocToLocMap? (and same then for the parameter of this setter) - it would simplify the burden on a future reader/maintainer
  • is StaleProfileMappings expected to be updated multiple times? if not, could you assert(StaleProfileMappings == null && "this should be set only once") - same maintainability argument.
llvm/lib/Transforms/IPO/SampleProfile.cpp
483

drive-by nit: could this be called distributeStaleProfileMappings?

484

do these need to be public?

By the way, another way is to offline remapping --- basically 'correcting' the stale profile by mapping the old locations to the new locations and produce a refreshed profile. This is what we plan to do internally.

This sounds interesting. By offline do you mean the profile generation time? Wondering how to get the new locations without building the new binary.

Right -- the source revision information is available for any target in the build environment.

By doing the mapping during profile generation outside of compiler, you would need a new profile for each target source version. That seems less flexible comparing to doing the mapping in the compiler where the mapping is naturally built for the current source being compiled.

By the way, another way is to offline remapping --- basically 'correcting' the stale profile by mapping the old locations to the new locations and produce a refreshed profile. This is what we plan to do internally.

This sounds interesting. By offline do you mean the profile generation time? Wondering how to get the new locations without building the new binary.

Right -- the source revision information is available for any target in the build environment.

By doing the mapping during profile generation outside of compiler, you would need a new profile for each target source version. That seems less flexible comparing to doing the mapping in the compiler where the mapping is naturally built for the current source being compiled.

In practice, it does not need to be -- aka many kinds of source changed don't need to trigger profile updating. Changes to cold source paths don't matter either. It is ok for the target source revision of the updated profile to be behind that of the actual build.

wenlei added a comment.Apr 5 2023, 2:12 PM

By the way, another way is to offline remapping --- basically 'correcting' the stale profile by mapping the old locations to the new locations and produce a refreshed profile. This is what we plan to do internally.

This sounds interesting. By offline do you mean the profile generation time? Wondering how to get the new locations without building the new binary.

Right -- the source revision information is available for any target in the build environment.

By doing the mapping during profile generation outside of compiler, you would need a new profile for each target source version. That seems less flexible comparing to doing the mapping in the compiler where the mapping is naturally built for the current source being compiled.

In practice, it does not need to be -- aka many kinds of source changed don't need to trigger profile updating. Changes to cold source paths don't matter either. It is ok for the target source revision of the updated profile to be behind that of the actual build.

Sure, then you need to make sure the source version you are generating profile for doesn't diverge too much from the source version that you will actually compile. Often times we also can't predict the source version that will actually be used for compilation later, what if a last minute sizable hotfix is landed? What you described should be doable, but more work/complexity IMO. Isn't it easier if this is all handled at compile time transparently within the compiler, given the cost of doing that isn't really measurable.

By the way, another way is to offline remapping --- basically 'correcting' the stale profile by mapping the old locations to the new locations and produce a refreshed profile. This is what we plan to do internally.

This sounds interesting. By offline do you mean the profile generation time? Wondering how to get the new locations without building the new binary.

Right -- the source revision information is available for any target in the build environment.

By doing the mapping during profile generation outside of compiler, you would need a new profile for each target source version. That seems less flexible comparing to doing the mapping in the compiler where the mapping is naturally built for the current source being compiled.

In practice, it does not need to be -- aka many kinds of source changed don't need to trigger profile updating. Changes to cold source paths don't matter either. It is ok for the target source revision of the updated profile to be behind that of the actual build.

Sure, then you need to make sure the source version you are generating profile for doesn't diverge too much from the source version that you will actually compile. Often times we also can't predict the source version that will actually be used for compilation later, what if a last minute sizable hotfix is landed? What you described should be doable, but more work/complexity IMO. Isn't it easier if this is all handled at compile time transparently within the compiler, given the cost of doing that isn't really measurable.

The updating can be done on-demand based on triggers (source changes or direct user directive). Doing this offline can open up more expensive fuzzy algorithms (open ended). Having a layer in the compiler to do this is fine -- though would like to see how effective it is for our internal workloads).

wlei added a comment.Apr 5 2023, 2:38 PM

Thank you for all the feedback, I will try to address it later. I saw you all are curious the algorithm part, here it is: https://reviews.llvm.org/D147545.

Just browsed the patch, I don't seem to find how FUncToMatchingsMap is populated which is the core of the stale profile matching algorithm. Does algorithm use some anchor points to do the remapping?

Yes, here is the algorithm part : https://reviews.llvm.org/D147545, it leverages the callee name/ calltarget name as anchor which doesn't need any other additional info.

Thank you for all the feedback, I will try to address it later. I saw you all are curious the algorithm part, here it is: https://reviews.llvm.org/D147545.

Just browsed the patch, I don't seem to find how FUncToMatchingsMap is populated which is the core of the stale profile matching algorithm. Does algorithm use some anchor points to do the remapping?

Yes, here is the algorithm part : https://reviews.llvm.org/D147545, it leverages the callee name/ calltarget name as anchor which doesn't need any other additional info.

thanks. The anchor based remapping can be effective to formatting changes, NFC changes such as adding new comments in the source .

Thank you for all the feedback, I will try to address it later. I saw you all are curious the algorithm part, here it is: https://reviews.llvm.org/D147545.

Just browsed the patch, I don't seem to find how FUncToMatchingsMap is populated which is the core of the stale profile matching algorithm. Does algorithm use some anchor points to do the remapping?

Yes, here is the algorithm part : https://reviews.llvm.org/D147545, it leverages the callee name/ calltarget name as anchor which doesn't need any other additional info.

Just curious, I wonder if there happens to be a measurement of the percentage of samples recovered (that would have been dropped) for csspgo. I guess I'm trying see how much headroom remains with D147545 :)

wenlei added a comment.Apr 5 2023, 5:10 PM

By the way, another way is to offline remapping --- basically 'correcting' the stale profile by mapping the old locations to the new locations and produce a refreshed profile. This is what we plan to do internally.

This sounds interesting. By offline do you mean the profile generation time? Wondering how to get the new locations without building the new binary.

Right -- the source revision information is available for any target in the build environment.

By doing the mapping during profile generation outside of compiler, you would need a new profile for each target source version. That seems less flexible comparing to doing the mapping in the compiler where the mapping is naturally built for the current source being compiled.

In practice, it does not need to be -- aka many kinds of source changed don't need to trigger profile updating. Changes to cold source paths don't matter either. It is ok for the target source revision of the updated profile to be behind that of the actual build.

Sure, then you need to make sure the source version you are generating profile for doesn't diverge too much from the source version that you will actually compile. Often times we also can't predict the source version that will actually be used for compilation later, what if a last minute sizable hotfix is landed? What you described should be doable, but more work/complexity IMO. Isn't it easier if this is all handled at compile time transparently within the compiler, given the cost of doing that isn't really measurable.

The updating can be done on-demand based on triggers (source changes or direct user directive). Doing this offline can open up more expensive fuzzy algorithms (open ended). Having a layer in the compiler to do this is fine -- though would like to see how effective it is for our internal workloads).

Ok. Full disclosure though, we started with ideas around more sophisticated fuzzy graph matching for CSSPGO (we even emit cfg info into profile for some experiments), but settled on simpler algorithm since that worked good enough. We will upstream additional "breaking change" for probe Id assignment later that pushes this to be more effective for CSSPGO.

wlei added a comment.Apr 5 2023, 6:04 PM

Thank you for all the feedback, I will try to address it later. I saw you all are curious the algorithm part, here it is: https://reviews.llvm.org/D147545.

Just browsed the patch, I don't seem to find how FUncToMatchingsMap is populated which is the core of the stale profile matching algorithm. Does algorithm use some anchor points to do the remapping?

Yes, here is the algorithm part : https://reviews.llvm.org/D147545, it leverages the callee name/ calltarget name as anchor which doesn't need any other additional info.

Just curious, I wonder if there happens to be a measurement of the percentage of samples recovered (that would have been dropped) for csspgo. I guess I'm trying see how much headroom remains with D147545 :)

Good point, so far no, it's doable, we have the https://reviews.llvm.org/D136627 to compute how many samples are dropped due to the CSSPGO's checksum mismatch, which is what this stale profile matching want to reuse. Similarly, after the matching in D147545, we can compute how many samples/callsites in profile still remains mismatched.

wlei added inline comments.Apr 5 2023, 10:22 PM
llvm/include/llvm/ProfileData/SampleProf.h
1061

Thanks for the feedback, all make sense, done.

1247

This is the output of the matching algorithm, not to be used for the matching:)

1264

Sounds good, done.

llvm/lib/Transforms/IPO/SampleProfile.cpp
134

Sounds good.

451

Ok, sorry for the confusing. I will move it to the second part.

483

Sounds good.

484

Will move to private.

2100

Done.

2197

Done. Note: this is also going to be used for matching work for classic AutoFDO. For matching, it need this mismatch metrics to suggest whether to use the matching, for pseudo probe, there is the checksum but doesn't apply to AutoFDO. Will leave when we worked on AutoFDO.

wlei updated this revision to Diff 511274.Apr 5 2023, 10:22 PM

addressing feedback

wenlei accepted this revision.Apr 6 2023, 12:34 PM

lgtm (change description needs updating to reflect new switch name)

This revision is now accepted and ready to land.Apr 6 2023, 12:34 PM
davidxl added inline comments.Apr 7 2023, 10:57 AM
llvm/include/llvm/ProfileData/SampleProf.h
848

NewLoc can be confusing. Perhaps name it "OrigLoc" or "ProfileLoc".

wlei updated this revision to Diff 514308.Apr 17 2023, 10:44 AM

Updating D147456: [AutoFDO] Stale profile matching(part 1)

llvm/include/llvm/ProfileData/SampleProf.h
848

Changed to ProfileLoc , thanks!

wlei edited the summary of this revision. (Show Details)Apr 21 2023, 1:26 PM
wlei retitled this revision from [AutoFDO] Stale profile matching(part 1) to [SamplePGO] Stale profile matching(part 1).Apr 26 2023, 10:05 AM
wlei edited the summary of this revision. (Show Details)Apr 28 2023, 12:56 PM
This revision was landed with ongoing or failed builds.Apr 28 2023, 1:13 PM
This revision was automatically updated to reflect the committed changes.

It looks like this change or its followup is causing test failures on the Fuchsia toolchain builders:

https://luci-milo.appspot.com/ui/p/fuchsia/builders/toolchain.ci/clang-linux-x64/b8782540257332498033/test-results?sortby=&groupby=

Script:
--
: 'RUN: at line 3';   /b/s/w/ir/x/w/staging/llvm_build/bin/opt < /b/s/w/ir/x/w/llvm-llvm-project/llvm/test/Transforms/SampleProfile/pseudo-probe-stale-profile-matching.ll -passes=sample-profile -sample-profile-file=/b/s/w/ir/x/w/llvm-llvm-project/llvm/test/Transforms/SampleProfile/Inputs/pseudo-probe-stale-profile-matching.prof --salvage-stale-profile -S --debug-only=sample-profile 2>&1 | /b/s/w/ir/x/w/staging/llvm_build/bin/FileCheck /b/s/w/ir/x/w/llvm-llvm-project/llvm/test/Transforms/SampleProfile/pseudo-probe-stale-profile-matching.ll
--
Exit Code: 1

Command Output (stderr):
--
/b/s/w/ir/x/w/llvm-llvm-project/llvm/test/Transforms/SampleProfile/pseudo-probe-stale-profile-matching.ll:66:10: error: CHECK: expected string not found in input
; CHECK: Callsite with callee:foo is matched from 13 to 6
         ^
<stdin>:12:34: note: scanning from here
Location is matched from 11 to 11
                                 ^
<stdin>:13:1: note: possible intended match here
Callsite with callee:foo is matched from 13 to 0
^
wlei added a comment.Apr 28 2023, 3:59 PM

It looks like this change or its followup is causing test failures on the Fuchsia toolchain builders:

https://luci-milo.appspot.com/ui/p/fuchsia/builders/toolchain.ci/clang-linux-x64/b8782540257332498033/test-results?sortby=&groupby=

Script:
--
: 'RUN: at line 3';   /b/s/w/ir/x/w/staging/llvm_build/bin/opt < /b/s/w/ir/x/w/llvm-llvm-project/llvm/test/Transforms/SampleProfile/pseudo-probe-stale-profile-matching.ll -passes=sample-profile -sample-profile-file=/b/s/w/ir/x/w/llvm-llvm-project/llvm/test/Transforms/SampleProfile/Inputs/pseudo-probe-stale-profile-matching.prof --salvage-stale-profile -S --debug-only=sample-profile 2>&1 | /b/s/w/ir/x/w/staging/llvm_build/bin/FileCheck /b/s/w/ir/x/w/llvm-llvm-project/llvm/test/Transforms/SampleProfile/pseudo-probe-stale-profile-matching.ll
--
Exit Code: 1

Command Output (stderr):
--
/b/s/w/ir/x/w/llvm-llvm-project/llvm/test/Transforms/SampleProfile/pseudo-probe-stale-profile-matching.ll:66:10: error: CHECK: expected string not found in input
; CHECK: Callsite with callee:foo is matched from 13 to 6
         ^
<stdin>:12:34: note: scanning from here
Location is matched from 11 to 11
                                 ^
<stdin>:13:1: note: possible intended match here
Callsite with callee:foo is matched from 13 to 0
^

Sorry for the break, push an attempt to fix https://reviews.llvm.org/rGba3cbc7aadf9, let me know if it doesn't work.