PGOFuncNames are used as the key to retrieve the Function definition from the MD5 stored in the profile: (1) We use the InstrProfSymtab to map the MD5 to the PGOFuncName. (2) We use another map to find the Function definition using PGOFuncName. With the above two steps, we can find the direct-call target function in the profile.
This assumes the PGOFuncName is consistent through the above steps.
LTO's internalization privatizes many global linkage symbols into internal linkage. And for the internal functions, we prefix the source module name to the PGOFuncName.
An example:
foo.c:
static int foo(); // user specified internal function. int bar(); // internalized int goo(); // remains global
in pgo-gen/pgo-use/vp-annotation time, the PGOFuncName is
foo() --> foo.c:foo bar() --> bar goo() --> goo
in lto optimization, after internalization:
foo() --> ld-temp.o:foo bar() --> ld-temp.o:bar goo() --> goo
So if the indirect-call promotion is call in lto optimizations, we cannot find the target for foo() and bar().
We have to do the indirect-call promotion in combined module as many of the targets are non intra-modules. So we have to work around this. Here are a few solutions that we have thought of:
(1) Perform indirect-call promotion before internalization.
It works but the internalization is tightly couple into the lto framework. The structure is bad.
(2) Rename the internal linkage function physically into PGOFuncName.
This also works. But this might affect the debug.
(3) Record the source module name in linker plugin and use it to reconstruct the PGOFuncName for
internal linage not from internalization. We also need to record the list of the symbol got
internalized (could reuse some maps in lto codegen). ThinLTO is going to use this method.
(4) In profile-use, create a function meta-data if the PGOFuncName is different from its raw name. In LTO optimization, use that name if it's available. Otherwise, just strip off the source module prefix.
I choose (4) over (3) because of its simplicity. It uses more memory than (3). But I don't expect tons of user defined internal functions. If this becomes a real issue, we can switch to (3).
Note this is the issue for both Clang value profile annotaion and llvm value profile annotation. The patch only changes the llvm vaule profiling. If this is acceptable, I will change the Clang one later.