This is an archive of the discontinued LLVM Phabricator instance.

[InstrProf] Temporal Profiling
ClosedPublic

Authored by ellis on Mar 30 2023, 5:48 PM.

Details

Summary

As described in [0], this extends IRPGO to support Temporal Profiling.

When -pgo-temporal-instrumentation is used we add the llvm.instrprof.timestamp() intrinsic to the entry of functions which in turn gets lowered to a call to the compiler-rt function INSTR_PROF_PROFILE_SET_TIMESTAMP(). A new field in the llvm_prf_cnts section stores each function's timestamp. Then in llvm-profdata merge we convert these function timestamps into a trace and add it to the indexed profile.

Since these traces could significantly increase the profile size, we've added -max-temporal-profile-trace-length and -temporal-profile-trace-reservoir-size to limit the length of a trace and the number of traces in a profile, respectively.

In a future diff we plan to use these traces to construct an optimized function order to reduce the number of page faults during startup.

Special thanks to Julian Mestre for helping with reservoir sampling.

[0] https://discourse.llvm.org/t/rfc-temporal-profiling-extension-for-irpgo/68068

Diff Detail

Event Timeline

ellis created this revision.Mar 30 2023, 5:48 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 30 2023, 5:48 PM
ellis edited the summary of this revision. (Show Details)Apr 3 2023, 9:52 AM
ellis published this revision for review.Apr 3 2023, 9:52 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptApr 3 2023, 9:52 AM
Herald added subscribers: llvm-commits, Restricted Project. · View Herald Transcript
davidxl added inline comments.Apr 4 2023, 3:12 PM
compiler-rt/include/profile/InstrProfData.inc
657

8 bits --> 12 bits

compiler-rt/lib/profile/InstrProfiling.c
27

what is -1 value reserved for?

llvm/include/llvm/ProfileData/InstrProfData.inc
657

8->12

llvm/lib/ProfileData/InstrProfReader.cpp
281

Describe trace text format as comments.

648

what is this increment?

llvm/lib/ProfileData/InstrProfWriter.cpp
311

Is it better to do deeper merge, e.g. b -c -d can be merged with a-b-c-d-e, and a-b and be merged with b-c. A trace count can also be added for each merged trace to show the weight.

ellis updated this revision to Diff 510965.Apr 4 2023, 4:50 PM
ellis marked 3 inline comments as done.

Add/update comments

compiler-rt/lib/profile/InstrProfiling.c
27

For counts instrumentation the __llvm_prf_cnts section is initialized to zero while for coverage instrumetation the section is initialized to all ones. So both 0 and -1 are uninitialized values. See https://github.com/llvm/llvm-project/blob/a78997e0aa89250a193ff036e7c6e71562e03222/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp#L911-L926

llvm/lib/ProfileData/InstrProfReader.cpp
648

I added a clarifying comment. Hopefully it makes sense :)

llvm/lib/ProfileData/InstrProfWriter.cpp
311

That's an interesting idea, but I think this will lose information. If b-c-d is merged with a-b-c-d-e then we no longer remember that b was the first function in a trace. This is important because functions called early in a trace are important during startup.

Also, in practice I would guess most traces would differ in small sections scattered throughout the trace, making this type of merging less useful. We could probably "compress" these traces by storing common runs of functions, but this is overkill IMO. With TraceReservoirSize = 100 and MaxTraceLength = 10000 the max overhead is 8Mb I believe.

Looks good overall, added a few comments which are mostly cosmetic.

compiler-rt/include/profile/InstrProfData.inc
670

If we used the 63rd bit then we wouldn't have to update the mask. Is there reason why we can't do so?

llvm/include/llvm/ProfileData/InstrProfReader.h
535

I think in the common case the indexed format will contain instrumentation profiles and TemporalProfiles. If so, then naming this hasTemporalProfile would be nice.

llvm/lib/ProfileData/InstrProfReader.cpp
270–274

Prefer a more descriptive text here: "temporal_profile" or "function_traces". Also see other comment about using a consistent naming scheme.

1141

Add a comment about the expected information to be read, i.e. NumTraces and TraceStreamSize?

llvm/lib/ProfileData/InstrProfWriter.cpp
303

This feels like it would lead to non-deterministic builds where the raw profile is an input to the build. Is the seed deterministic?

341

Can we use auto& to avoid a copy for Trace?

440

There seem to be a few different phrases referring to the same functionality, e.g. TemporalProfile, FunctionTraces and just Trace in other places. It would be nice to have a single prefix for all the var and function names to be able to quickly navigate the code. What do you think?

llvm/unittests/ProfileData/InstrProfTest.cpp
233

Add a comment for the param set to false, I think it should be /*Sparse=*/=false. Here and below.

ellis updated this revision to Diff 511545.Apr 6 2023, 3:21 PM
ellis marked 4 inline comments as done.

Resolve some comments

compiler-rt/include/profile/InstrProfData.inc
670

Oh my mistake, we can in fact use the 63rd bit here. I think I was anticipating this bit being used in my earlier revisions. Thanks for catching!

llvm/lib/ProfileData/InstrProfWriter.cpp
303

I've added std::mt19937 RNG to InstrProfWriter which has a constant default seed which should make this deterministic.

440

I see what you mean. I'm using FunctionTrace and Trace interchangeably and they aren't very descriptive. What do you think of the TP (Temporal Profiling) prefix? So I would use TPTrace in code and :tp_traces for the text format.

ellis updated this revision to Diff 511546.Apr 6 2023, 3:24 PM

Remove comments

snehasish added inline comments.Apr 6 2023, 4:26 PM
llvm/include/llvm/ProfileData/InstrProfReader.h
138

nit: s/is/has

llvm/lib/ProfileData/InstrProfWriter.cpp
440

I prefer something more verbose like "TemporalProf", but up to you to decide, "TP" as the prefix is fine too.

ellis updated this revision to Diff 511577.Apr 6 2023, 5:56 PM
ellis marked 2 inline comments as done.

Rename trace variables to be more consistent

llvm/lib/ProfileData/InstrProfWriter.cpp
440

Thanks for the feedback! I ended up going with TemporalProf... and I tried to update all the variables to be more consistent.

ellis edited the summary of this revision. (Show Details)Apr 6 2023, 5:59 PM
This revision is now accepted and ready to land.Apr 6 2023, 6:25 PM
ellis added a comment.Apr 7 2023, 1:30 PM

Thanks for the quick review! Unless anyone else has concerns, I'll land on Tuesday since I'm out of town on Monday.

ellis updated this revision to Diff 512467.Apr 11 2023, 8:29 AM

Rebase and test

This revision was landed with ongoing or failed builds.Apr 11 2023, 8:31 AM
This revision was automatically updated to reflect the committed changes.
hans added a subscriber: hans.Apr 12 2023, 6:31 AM

This broke the profile tests on Darwin, see e.g. the end of https://logs.chromium.org/logs/chromium/buildbucket/cr-buildbucket/8784035713603613281/+/u/package_clang/stdout

It seems these tests are checking what symbols are exported by the runtime, and are surprised to find the new __llvm_profile_global_timestamp variable there.

From what I can tell, that variable is only accessed inside the library, and making it static fixes the tests, so I'll go ahead and do that. Please revert if that doesn't seem right to you.

ellis added a comment.Apr 12 2023, 7:53 AM

This broke the profile tests on Darwin, see e.g. the end of https://logs.chromium.org/logs/chromium/buildbucket/cr-buildbucket/8784035713603613281/+/u/package_clang/stdout

It seems these tests are checking what symbols are exported by the runtime, and are surprised to find the new __llvm_profile_global_timestamp variable there.

From what I can tell, that variable is only accessed inside the library, and making it static fixes the tests, so I'll go ahead and do that. Please revert if that doesn't seem right to you.

That seems fine to me. Thanks for fixing!

This broke the profile tests on Darwin, see e.g. the end of https://logs.chromium.org/logs/chromium/buildbucket/cr-buildbucket/8784035713603613281/+/u/package_clang/stdout

It seems these tests are checking what symbols are exported by the runtime, and are surprised to find the new __llvm_profile_global_timestamp variable there.

From what I can tell, that variable is only accessed inside the library, and making it static fixes the tests, so I'll go ahead and do that. Please revert if that doesn't seem right to you.

Seems good. If we make the variable internal linkage, we can drop the leading __, but we don't need to make the change right now :)

(Thanks for the feature and I will try studying it after a long vacation...)