Page MenuHomePhabricator

Prepend "uniq" to symbol names hash with -funique-internal-linkage-names
ClosedPublic

Authored by tmsriram on Oct 16 2020, 11:57 PM.

Details

Summary

TLDR; Prepend the module name hash with a fixed string ".__uniq." which helps tools that consume sampled profiles and attribute it to functions to understand that this symbol belongs to a unique internal linkage type symbol.

Symbols with suffixes can result from various optimizations in the compiler. Function Multiversioning, function splitting, parameter constant propogation, unique internal linkage names.

External tools like sampled profile aggregators combine profiles from multiple runs of a binary. They use various heuristics with symbols that have suffixes to try and attribute the profile to the right function instance. For instance multi-versioned symbols like foo.avx, foo.sse4.2, etc even though different should be attributed to the same source function if a single function is versioned, using attribute target_clones (supported in GCC but yet to land in LLVM). Similarly, functions that are split (split part having a .cold suffix) could have profiles for both the original and split symbols but would be aggregated and attributed to the original function that was split.

Unique internal linkage functions however have different source instances and the aggregator must not put them together but attribute it to the appropriate function instance. To be sure that we are dealing with a symbol of a unique internal linkage function, we would like to prepend the hash with a known string ".__uniq." which these tools can check to understand the suffix type.

Diff Detail

Event Timeline

tmsriram created this revision.Oct 16 2020, 11:57 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 16 2020, 11:57 PM
tmsriram requested review of this revision.Oct 16 2020, 11:57 PM
tmsriram retitled this revision from Append "uniq" before symbol names hash with -funique-internal-linkage-names to Prepend "uniq" to symbol names hash with -funique-internal-linkage-names.Oct 17 2020, 12:05 AM
mtrofin added inline comments.Oct 17 2020, 8:05 AM
llvm/lib/Transforms/Utils/UniqueInternalLinkageNames.cpp
32

would .uniq. (so a dot before and after) fit more with existing other cases (like .llvm.)?

tmsriram updated this revision to Diff 298833.Oct 17 2020, 9:47 AM
tmsriram marked an inline comment as done.

Prepend ".uniq." to the module name hash.

As mtrofin@ observerd, ".llvm." is prepended to the suffix of global names that are promoted from local in ModuleSummaryIndex and this is consistent with that.

llvm/lib/Transforms/Utils/UniqueInternalLinkageNames.cpp
32

Thanks, .llvm. is indeed used when creating a promoted global name for a local with its original module ID. So, .uniq. would be consistent with that.

tmsriram updated this revision to Diff 299127.Oct 19 2020, 12:26 PM

Incorporating snehasishk@'s suggestion that "llvmuniq" is a better prefix for tools to search on.

Having "llvmuniq" makes reduces chances of collisions with other suffixes that might get introduced. Consistent with the ".llvm." prefix for lto global name promotion suffixes.

Incorporating snehasishk@'s suggestion that "llvmuniq" is a better prefix for tools to search on.

Having "llvmuniq" makes reduces chances of collisions with other suffixes that might get introduced. Consistent with the ".llvm." prefix for lto global name promotion suffixes.

Unique-ing names this way isn't, at core, something that's too tied to llvm, even if we implement it here first. There are tools that would leverage this feature by binding to the suffix. With a look at other present or future compilers, and given that making this decision now is free, and much more expensive later, I think it would be beneficial if we were intentional about:

  • not using "llvm" to indicate this is something tool-generated, and instead use, say "" (".uniq.")

or

  • using "llvm", on the grounds that there are other examples of general-purpose concepts that have tool-specific prefixes (like gcc_except_table, that is supported in llvm); so the prefixed name is basically a proper noun, it doesn't mean "llvm only" or "gcc only".

I believe either way, the ".llvm." case is going to stick out as an inconsistency: it should have been either ".llvmlto." or "__lto", in either case, I don't think it serves too well as an argument for the llvm prefix.

Incorporating snehasishk@'s suggestion that "llvmuniq" is a better prefix for tools to search on.

Having "llvmuniq" makes reduces chances of collisions with other suffixes that might get introduced. Consistent with the ".llvm." prefix for lto global name promotion suffixes.

Unique-ing names this way isn't, at core, something that's too tied to llvm, even if we implement it here first. There are tools that would leverage this feature by binding to the suffix. With a look at other present or future compilers, and given that making this decision now is free, and much more expensive later, I think it would be beneficial if we were intentional about:

  • not using "llvm" to indicate this is something tool-generated, and instead use, say "" (".uniq.")

or

  • using "llvm", on the grounds that there are other examples of general-purpose concepts that have tool-specific prefixes (like gcc_except_table, that is supported in llvm); so the prefixed name is basically a proper noun, it doesn't mean "llvm only" or "gcc only".

I believe either way, the ".llvm." case is going to stick out as an inconsistency: it should have been either ".llvmlto." or "__lto", in either case, I don't think it serves too well as an argument for the llvm prefix.

Acknowledged and great points. I just want to add that we should pick something that would minimize collisions with future ISA names that would land as suffixes with multiversion symbols.

Incorporating snehasishk@'s suggestion that "llvmuniq" is a better prefix for tools to search on.

Having "llvmuniq" makes reduces chances of collisions with other suffixes that might get introduced. Consistent with the ".llvm." prefix for lto global name promotion suffixes.

Unique-ing names this way isn't, at core, something that's too tied to llvm, even if we implement it here first. There are tools that would leverage this feature by binding to the suffix. With a look at other present or future compilers, and given that making this decision now is free, and much more expensive later, I think it would be beneficial if we were intentional about:

  • not using "llvm" to indicate this is something tool-generated, and instead use, say "" (".uniq.")

or

  • using "llvm", on the grounds that there are other examples of general-purpose concepts that have tool-specific prefixes (like gcc_except_table, that is supported in llvm); so the prefixed name is basically a proper noun, it doesn't mean "llvm only" or "gcc only".

I believe either way, the ".llvm." case is going to stick out as an inconsistency: it should have been either ".llvmlto." or "__lto", in either case, I don't think it serves too well as an argument for the llvm prefix.

Acknowledged and great points. I just want to add that we should pick something that would minimize collisions with future ISA names that would land as suffixes with multiversion symbols.

Ack - I'd propose "__" then, it very likely won't collide, it's compiler-independent, and (arguably) suggests tooling as origin.

tmsriram updated this revision to Diff 300097.Oct 22 2020, 1:49 PM

Change the prefix to "__uniq" like mtrofin@ suggested.

mtrofin accepted this revision.Oct 22 2020, 2:04 PM

lgtm

This revision is now accepted and ready to land.Oct 22 2020, 2:04 PM
tmsriram edited the summary of this revision. (Show Details)Oct 22 2020, 8:46 PM
Herald added a project: Restricted Project. · View Herald TranscriptMon, Oct 26, 2:24 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript