This is an archive of the discontinued LLVM Phabricator instance.

Support for remapping profile data, for instrumentation-based profiling.
ClosedPublic

Authored by rsmith on Aug 24 2018, 6:26 PM.

Diff Detail

Repository
rL LLVM

Event Timeline

rsmith created this revision.Aug 24 2018, 6:26 PM
davidxl added inline comments.Aug 27 2018, 11:24 AM
lib/ProfileData/InstrProfReader.cpp
606 ↗(On Diff #162528)

I suggest moving this remapping support code out of the ProfData core library into llvm-profdata.cpp where an user option (for llvm_profdata) is provided so that the profile data with old naming schemes can be converted to new format:

something like:

llvm_profdata merge -o <new_format_prof_data> -remapping_file=<> <old_profile_data>

By so doing, compiler does not need to be made aware of it.

rsmith added inline comments.Aug 27 2018, 1:09 PM
lib/ProfileData/InstrProfReader.cpp
606 ↗(On Diff #162528)

This approach doesn't really support that workflow: we don't have the ability to form remapped mangled names at all, only the ability to tell if two names are the same modulo a set of equivalences. (As noted on D51248, if we took the "remangling" path we'd add significant complexity and also put a significant maintenance burden on ourselves to keep a remangler up to date with extensions to the demangler, which is not necessary in this equivalence-based approach.)

We could do something similar where the tool is given the old file, the remapping file, and a list of symbols in the new program, but that seems cumbersome to use (you'd need additional steps after compilation to extract the list of symbols and remap the profile data, and then you'd need to rerun the compilation with the updated profile data). The idea of this approach is that you merely need to add a remapping file and one flag to your compilations, and they will transparently be able to use profiling data from either before or after a symbol renaming with no change to the steps in the build process.

davidxl added inline comments.Aug 27 2018, 1:22 PM
lib/ProfileData/InstrProfReader.cpp
606 ↗(On Diff #162528)

Are there libraries to do name mangling and demangling? If that is the case, linking them into the profile tool has a lot of advantage over the approach that requires compiler awareness:

  1. it can handle different 'old' C++ ABIs
  2. it can handle indirect call targets -- which is very important for performance
  3. it does not require any changes in the compiler, no new options are needed

I think the most important point is 2) -- we may need to do it because of the performance reasons anyway.

If we can not do mangling/remagnalling, I don't see the option to require passing symbol list too bad either. Note that llvm-cov tool also requires user to pass the execuable path. We can do the same here -- pass a newly compiled executable to the tool so that the tool can extract the newly named symbols.

rsmith added inline comments.Aug 27 2018, 6:34 PM
lib/ProfileData/InstrProfReader.cpp
606 ↗(On Diff #162528)

As far as I'm aware there is no library out there that supports demangling, mutating the demangled tree, and then re-mangling. I think it's probably possible to write such a library, but as noted, the engineering costs would be substantial.

As noted in one of the other review threads, we can handle indirect call targets with the current approach if that's necessary, but I would rather add that functionality as a separate step after this patch series.

As for doing this by changing the compiler, I still think that approach is a significantly superior option to requiring complications of the build process by adding an extra profile-data-mutation step. This patch makes it straightforward to make broad changes (such as changing the standard library) and retain applicability of most of the profile data, without requiring detailed knowledge of how each target program is built or how the build system feeds profile data into it, and without adding an extra compilation step to compute the set of new symbol names.

I'm happy to teach llvm-profdata to apply remappings (given a remapping file and a symbol list for the new binary), but I would still like to provide the option to do the remapping on the fly.

davidxl added inline comments.Aug 27 2018, 6:47 PM
lib/ProfileData/InstrProfReader.cpp
606 ↗(On Diff #162528)

The assessment of the convenience to have the support directly in compiler depends on how often this feature is used. If it is used in daily build/workflow, you are absolutely right that this is more superior. However my view is that ABI breakages like this happen *very* rarely. Introducing compiler option for one-off change does not really bring much benefit. Assuming the functionality exists in llvm_profdata, it seems having the duplicated functionality not necessary.

Besides, there is also the issue of distributed build system and dependency tracking. With this change in compiler, you will also need to invest time and effort to support it in bazel. On the other hand, doing one time profile conversion probably does not require build system support.

rsmith added inline comments.Aug 28 2018, 12:46 PM
lib/ProfileData/InstrProfReader.cpp
606 ↗(On Diff #162528)

My approach requires no changes to bazel. We need only add a remapping file to the toolchain and a flag naming it to the compilation flags.

Doing profile conversion as a separate step means that whenever we're trying out a large-scale renaming change, we must manually invest in profile conversion, and cannot simply run the existing performance tests. That's the value added by having this feature in the compiler (even if it's also supported by llvm-profdata) -- it makes it easy to try out things that affect a lot of manglings and get an idea of the performance delta without going through the work of updating all the profiles to match.

I'll post a patch to add remapping support to llvm-profdata, but I do not think that fully solves the problem I'm trying to solve, and I'd still like to add the on-the-fly translation to the compiler.

davidxl added inline comments.Sep 11 2018, 9:20 AM
lib/ProfileData/InstrProfReader.cpp
606 ↗(On Diff #162528)

I find it cleaner to make the name mapper a member field of InstrProfIndexReader class. The base mapper can be a dummy mapper while the ItaniumMapper can override the base behavior.

rsmith updated this revision to Diff 168896.Oct 9 2018, 4:04 PM
rsmith marked an inline comment as done.

Rebase and address review comments.

davidxl accepted this revision.Oct 9 2018, 9:45 PM

lgtm

This revision is now accepted and ready to land.Oct 9 2018, 9:45 PM
MaskRay added inline comments.
include/llvm/ProfileData/InstrProfReader.h
351 ↗(On Diff #168896)

Space? template <typename HashTableImpl>

This revision was automatically updated to reflect the committed changes.
rsmith marked an inline comment as done.