This is an archive of the discontinued LLVM Phabricator instance.

[SampleFDO] Persist profile staleness metrics into binary
ClosedPublic

Authored by wlei on Oct 25 2022, 10:12 AM.

Details

Summary

With https://reviews.llvm.org/D136627, now we have the metrics for profile staleness based on profile statistics, monitoring the profile staleness in real-time can help user quickly identify performance issues. For a production scenario, the build is usually incremental and if we want the real-time metrics, we should store/cache all the old object's metrics somewhere and pull them in a post-build time. To make it more convenient, this patch add an option to persist them into the object binary, the metrics can be reported right away by decoding the binary rather than polling the previous stdout/stderrs from a cache system.

For implementation, it writes the statistics first into a new metadata section(llvm.stats) then encode into a special ELF .llvm_stats section. The section data is formatted as a list of key/value pair so that future statistics can be easily extended. This is also under a new switch(-persist-profile-staleness)

In terms of size overhead, the metrics are computed at module level, so the size overhead should be small, measured on one of our internal service, it costs less than < 1MB for a 10GB+ binary.

Diff Detail

Event Timeline

wlei created this revision.Oct 25 2022, 10:12 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 25 2022, 10:12 AM
wlei requested review of this revision.Oct 25 2022, 10:12 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 25 2022, 10:12 AM
wlei edited the summary of this revision. (Show Details)Oct 25 2022, 1:45 PM
wlei added reviewers: hoy, wenlei, xur, davidxl.

The use case of the feature is still a little fuzzy to me. Why is the mismatch issue not caught and handled at build time?

The use case of the feature is still a little fuzzy to me. Why is the mismatch issue not caught and handled at build time?

With fleet-wide profiling for sample PGO, there's no much people can do at build time if we don't want to block builds when stale profile is found. And it's also not that scalable to address it at build time for thousands of services. We aim to build a system that monitors profile staleness for thousands of services using sample pgo, which then drives the operational work of tightening up PGO pipeline (done by people outside of compiler team). This is the compiler part for collecting such data.

wlei added a comment.Oct 25 2022, 3:00 PM

The use case of the feature is still a little fuzzy to me. Why is the mismatch issue not caught and handled at build time?

Sorry it was not clear. It is caught and handled at build time for one object file, the previous patch is supposed to run in an early compile-time not any linker time, one set of metrics for one obj file, but we'd like to report one aggregated metrics for the whole binary, so this patch is mostly to merge/aggregate them.

Also we' like to catch and monitor it in real time not during the off-line investigation time, one issue we hit is the incremental build, the object files are built(sometimes remotely) and cached in the database(can be shared with other users), so if we want to aggregate the metrics from old cached object file, we need also to cache the compile-time warning/stdouts, that requires a bit work on the build infra side. Hence, we chose to use the way in this patch for the merge.

wenlei added inline comments.Oct 25 2022, 3:02 PM
llvm/lib/MC/MCObjectFileInfo.cpp
534

prof_stats might still be a bit narrow. persistent key-value in obj can be a general mechanism used for other purpose as well. maybe llvm-stats?

For incremental build, are the artifacts (build logs) also cached?

Anyway, I can see the usefulness of the post-build analysis case. However should we consider more general mechanism for message persistence? I will copy some folks in the review.

For incremental build, are the artifacts (build logs) also cached?

Anyway, I can see the usefulness of the post-build analysis case. However should we consider more general mechanism for message persistence? I will copy some folks in the review.

Yes, this should be more general than profile staleness. Actually the other use case we have in mind is for persisting static performance proxy, which can also be used as full reward function for MLGO.

wlei added a comment.Oct 26 2022, 10:09 AM

For incremental build, are the artifacts (build logs) also cached?

Our infra doesn't support this right now.

Anyway, I can see the usefulness of the post-build analysis case. However should we consider more general mechanism for message persistence? I will copy some folks in the review.

Thanks for the feedback, yes, this key/value structure intends to be extended for other stats, I'd appreciate more feedback to make it general.

llvm/lib/MC/MCObjectFileInfo.cpp
534

Sounds good, renamed all to llvm-stats

wlei updated this revision to Diff 470852.Oct 26 2022, 10:10 AM

rename to llvm stats.

This can be dealt with later but do we need separate support (i.e. MCAsmStreamer?) for llvm-objdump to print out stats nicely?

llvm/lib/MC/MCObjectFileInfo.cpp
534

Use explicit section type ELF::SHT_PROGBITS instead of DebugSecType.

hoy added inline comments.Oct 26 2022, 11:31 AM
llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp
371

S could end up null. Suggest getLLVMStatsSection to return a non-null ptr even if it's non-elf. The work should work for non-elf since we are not using comdat concept here.

hoy added inline comments.Oct 26 2022, 11:35 AM
llvm/lib/Transforms/IPO/SampleProfile.cpp
2198

nit: StringRef should work here since all keys are literal constants.

wlei added a comment.Oct 26 2022, 5:35 PM

This can be dealt with later but do we need separate support (i.e. MCAsmStreamer?) for llvm-objdump to print out stats nicely?

Good idea to dump it in a human readable way, will try it later.

llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp
371

I see, removed the ELF condition.

llvm/lib/MC/MCObjectFileInfo.cpp
534

fixed, thanks!

llvm/lib/Transforms/IPO/SampleProfile.cpp
2198

fixed, thanks!

wlei updated this revision to Diff 470978.Oct 26 2022, 5:35 PM

Updating D136698: [SampleFDO] Persist profile staleness metrics into binary

wenlei accepted this revision.Oct 28 2022, 9:27 AM

lgtm, though I'd wait a few days in case others have comments.

This revision is now accepted and ready to land.Oct 28 2022, 9:27 AM

For incremental build, are the artifacts (build logs) also cached?

Anyway, I can see the usefulness of the post-build analysis case. However should we consider more general mechanism for message persistence? I will copy some folks in the review.

Yes, this should be more general than profile staleness. Actually the other use case we have in mind is for persisting static performance proxy, which can also be used as full reward function for MLGO.

(sorry for the delay) SGTM, I assume, if we want the values to be more complex, we can base64-encode them or something like that (i.e. we can evolve the format to not assume the values are ints); also, we may want to allow, at a later stage, the section to be populated with values available to the assembler - like exact MBB sizes - I don't think that'd be difficult to fit in at a later stage (some sort of callback to TargetLoweringObjectFileImpl.cpp or something like that). I'm listing these to check if there's any assumptions that such later evolution might invalidate.

wlei added a comment.Nov 1 2022, 2:54 PM

For incremental build, are the artifacts (build logs) also cached?

Anyway, I can see the usefulness of the post-build analysis case. However should we consider more general mechanism for message persistence? I will copy some folks in the review.

Yes, this should be more general than profile staleness. Actually the other use case we have in mind is for persisting static performance proxy, which can also be used as full reward function for MLGO.

(sorry for the delay) SGTM, I assume, if we want the values to be more complex, we can base64-encode them or something like that (i.e. we can evolve the format to not assume the values are ints); also, we may want to allow, at a later stage, the section to be populated with values available to the assembler - like exact MBB sizes - I don't think that'd be difficult to fit in at a later stage (some sort of callback to TargetLoweringObjectFileImpl.cpp or something like that). I'm listing these to check if there's any assumptions that such later evolution might invalidate.

Thank you for the feedback! Changed to use base64-encode for the value encoding.

we may want to allow, at a later stage, the section to be populated with values available to the assembler - like exact MBB sizes - I don't think that'd be difficult to fit in at a later stage (some sort of callback to TargetLoweringObjectFileImpl.cpp or something like that).

I was trying to play with the later stage values, it seems current framework should be able to be extended for this feature, i,e, it's covered as long as it can emit the values to the metadata. For the MC level values(like the MBB sizes ), IIUC, it's still able to access the metadata, And later in the AsmPrinter, TLOF.emitModuleMetadata(*OutStreamer, M); is called in AsmPrinter::doFinalization(Module &M){...}, my understanding is that doFinalization is already the very later stage, it's after the assembling things. Please let me know if I missed anything.

wlei updated this revision to Diff 472410.Nov 1 2022, 2:54 PM

Changed to use base64-encode for the value encoding.

wlei updated this revision to Diff 474116.Nov 8 2022, 5:02 PM

fix one bug: StringRef --> string

wlei added a comment.Nov 8 2022, 5:06 PM

Plan to commit this tomorrow if no any objections.

wlei edited the summary of this revision. (Show Details)Nov 9 2022, 10:33 PM
This revision was automatically updated to reflect the committed changes.