This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Utils/
-
Transforms/
-
Utils/
-
UniqueInternalLinkageNames.cpp
-
test/Transforms/UniqueInternalLinkageNames/
-
Transforms/
-
UniqueInternalLinkageNames/
1/1
unique_symbol_names.ll

Differential D94154

Unique Internal Linkage Name suffixes must be demangler friendly
ClosedPublic

Authored by tmsriram on Jan 5 2021, 10:30 PM.

Download Raw Diff

Details

Reviewers

dblaikie
hoy

Commits

rGd8c6d24359f1: -funique-internal-linkage-names appends a hex md5hash suffix to the symbol name…

Summary

-funique-internal-linkage-names appends a hex md5hash suffix to the symbol name which is not demangler friendly, convert it to decimal.

Please see D93747 for more context which tries to make linkage names of internal linkage functions to be the uniqueified names. This causes a problem with gdb because breaking using the demangled function name will not work if the new uniqueified name cannot be demangled. The problem is the generated suffix which is a mix of integers and letters which do not demangle. The demangler accepts either all numbers or all letters. This patch simply converts the hash to decimal.

There is no loss of uniqueness by doing this as the precision is maintained. The symbol names get longer by a few characters though.

Diff Detail

Event Timeline

tmsriram created this revision.Jan 5 2021, 10:30 PM

Herald added a subscriber: hiraditya. · View Herald TranscriptJan 5 2021, 10:30 PM

tmsriram requested review of this revision.Jan 5 2021, 10:30 PM

Herald added a project: Restricted Project. · View Herald TranscriptJan 5 2021, 10:30 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Thanks for the quick fix!

llvm/test/Transforms/UniqueInternalLinkageNames/unique_symbol_names.ll
13	The clang side test `unique-internal-linkage-names.cpp` may also need to be changed.

tmsriram added a child revision: D93747: Rename debug linkage name with -funique-internal-linkage-names.Jan 5 2021, 10:39 PM

This is only demangler friendly if the name is already an itanium mangled name, right? (ie: the name starts with _Z) So it wouldn't work for C code?

Do you have plans to fix this more generally? (I think to fix it more generally, you might need to move this feature up to clang and have clang mangle the name then add the suffix (at least for itanium mangling - not sure if windows mangling supports arbitrary suffixes like this, @rnk might know) - that way for C functions you can force/enable the mangling as is done for attribute((overloadable)) and others (I think attribute((enable_if)) also causes mangling of C functions))

In D94154#2481491, @dblaikie wrote:

This is only demangler friendly if the name is already an itanium mangled name, right? (ie: the name starts with _Z) So it wouldn't work for C code?

I'm pretty sure most demangling tools such as c++filt check for a leading prefix of _+Z before demangling, so I don't think there are any concerns for C names. A reasonable demangler would pass them through as is.

Do you have plans to fix this more generally? (I think to fix it more generally, you might need to move this feature up to clang and have clang mangle the name then add the suffix (at least for itanium mangling - not sure if windows mangling supports arbitrary suffixes like this, @rnk might know) - that way for C functions you can force/enable the mangling as is done for attribute((overloadable)) and others (I think attribute((enable_if)) also causes mangling of C functions))

That was Sriram's original idea: have the mangler do it. I reviewed the code. Modifying the mangler was fairly complicated, so I suggested doing it in a pass. The original pass also operated by appending a suffix after mangling, it just happened earlier. There isn't a good way to "embed" this uniquification into the Itanium mangling scheme, so far as I am aware. The best way I can think of would be to pretend that internal linkage things are in an anonymous namespace. IIRC there used to be ways to embed something unique into that part of the mangling.

That all said, I don't see any reason to block this decimilization change. It's limited in scope and an obvious improvement.

In D94154#2481491, @dblaikie wrote:

This is only demangler friendly if the name is already an itanium mangled name, right? (ie: the name starts with _Z) So it wouldn't work for C code?

Right. But in this case, a DW_AT_linkage_name is never generated. The DW_AT_name uses the C style symbol name. This proves we should not force a DW_AT_linkage_name with uniqueified names if the field was null to begin with, does that resolve this problem? In the "overloadable" scenario, the linkage name is mangled so there is no problem.

Do you have plans to fix this more generally? (I think to fix it more generally, you might need to move this feature up to clang and have clang mangle the name then add the suffix (at least for itanium mangling - not sure if windows mangling supports arbitrary suffixes like this, @rnk might know) - that way for C functions you can force/enable the mangling as is done for attribute((overloadable)) and others (I think attribute((enable_if)) also causes mangling of C functions))

! In D94154#2481491, @dblaikie wrote:

This is only demangler friendly if the name is already an itanium mangled name, right? (ie: the name starts with _Z) So it wouldn't work for C code?

Do you have plans to fix this more generally? (I think to fix it more generally, you might need to move this feature up to clang and have clang mangle the name then add the suffix (at least for itanium mangling - not sure if windows mangling supports arbitrary suffixes like this, @rnk might know) - that way for C functions you can force/enable the mangling as is done for attribute((overloadable)) and others (I think attribute((enable_if)) also causes mangling of C functions))

tmsriram mentioned this in D93747: Rename debug linkage name with -funique-internal-linkage-names.Jan 6 2021, 10:16 AM

Fix a clang test to reflect the new change.

https://github.com/gcc-mirror/gcc/blob/master/libiberty/cp-demangle.c#L3863

The demangler code for suffixes that accepts either lowercase letters with underscore or numbers.

In D94154#2483336, @tmsriram wrote:

https://github.com/gcc-mirror/gcc/blob/master/libiberty/cp-demangle.c#L3863

The demangler code for suffixes that accepts either lowercase letters with underscore or numbers.

Thanks for pointing out. And we already do something like that during LTO symbol promotion: https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/IR/ModuleSummaryIndex.h#L1376

hoy accepted this revision.Jan 6 2021, 4:40 PM

This revision is now accepted and ready to land.Jan 6 2021, 4:40 PM

In D94154#2482425, @rnk wrote:

In D94154#2481491, @dblaikie wrote:

This is only demangler friendly if the name is already an itanium mangled name, right? (ie: the name starts with _Z) So it wouldn't work for C code?

I'm pretty sure most demangling tools such as c++filt check for a leading prefix of _+Z before demangling, so I don't think there are any concerns for C names. A reasonable demangler would pass them through as is.

Yep - but passing them through unmodified was causing problems for gdb which was demangling the mangled names as they appear in the DWARF and using the unmangled name to figure out how to do name lookup. So if the symbol wasn't getting unmangled you wouldn't be able to "break f1" instead you'd have to "break f1.__part.1" which would be an unfortunate regression in debuggability.

But it seems like that only applies if a mangled name is present in the DWARF at all - if no mangled name is present, and the debug info just gives the pretty name it works OK. Bit weird to have no record of the real symbol name in the DWARF, but so far it doesn't seem to cause any problems? So I'm OK-ish with this.

Do you have plans to fix this more generally? (I think to fix it more generally, you might need to move this feature up to clang and have clang mangle the name then add the suffix (at least for itanium mangling - not sure if windows mangling supports arbitrary suffixes like this, @rnk might know) - that way for C functions you can force/enable the mangling as is done for attribute((overloadable)) and others (I think attribute((enable_if)) also causes mangling of C functions))

That was Sriram's original idea: have the mangler do it. I reviewed the code. Modifying the mangler was fairly complicated, so I suggested doing it in a pass. The original pass also operated by appending a suffix after mangling, it just happened earlier. There isn't a good way to "embed" this uniquification into the Itanium mangling scheme, so far as I am aware.

Oh, I was still in favor of adding it after the mangling (the current "mangled.__part.number") but I thought it may be necessary to force mangling on C functions before adding the suffix, if they needed to be demangleable for debug info purposes.

That all said, I don't see any reason to block this decimilization change. It's limited in scope and an obvious improvement.

My concern was that it was layering more workarounds on a patch series that might be going in the wrong direction overall.

Anyway, sounds like, if I'm understanding/tested correctly, that the issue with unmangleability isn't about the symbol name itself but the DW_AT_linkage_name in the DWARF, so it seems like if that isn't present then there's no issue if the real symbol name can't be unmangled/back to the simple name. (but there's some reason the DW_AT_linkage_name, if present, must match the symbol name? (so it can't be the original unsuffixed mangled name))

In D94154#2485197, @dblaikie wrote:

In D94154#2482425, @rnk wrote:

In D94154#2481491, @dblaikie wrote:

This is only demangler friendly if the name is already an itanium mangled name, right? (ie: the name starts with _Z) So it wouldn't work for C code?

I'm pretty sure most demangling tools such as c++filt check for a leading prefix of _+Z before demangling, so I don't think there are any concerns for C names. A reasonable demangler would pass them through as is.

Yep - but passing them through unmodified was causing problems for gdb which was demangling the mangled names as they appear in the DWARF and using the unmangled name to figure out how to do name lookup. So if the symbol wasn't getting unmangled you wouldn't be able to "break f1" instead you'd have to "break f1.__part.1" which would be an unfortunate regression in debuggability.

But it seems like that only applies if a mangled name is present in the DWARF at all - if no mangled name is present, and the debug info just gives the pretty name it works OK. Bit weird to have no record of the real symbol name in the DWARF, but so far it doesn't seem to cause any problems? So I'm OK-ish with this.

Do you have plans to fix this more generally? (I think to fix it more generally, you might need to move this feature up to clang and have clang mangle the name then add the suffix (at least for itanium mangling - not sure if windows mangling supports arbitrary suffixes like this, @rnk might know) - that way for C functions you can force/enable the mangling as is done for attribute((overloadable)) and others (I think attribute((enable_if)) also causes mangling of C functions))

That was Sriram's original idea: have the mangler do it. I reviewed the code. Modifying the mangler was fairly complicated, so I suggested doing it in a pass. The original pass also operated by appending a suffix after mangling, it just happened earlier. There isn't a good way to "embed" this uniquification into the Itanium mangling scheme, so far as I am aware.

Oh, I was still in favor of adding it after the mangling (the current "mangled.__part.number") but I thought it may be necessary to force mangling on C functions before adding the suffix, if they needed to be demangleable for debug info purposes.

That all said, I don't see any reason to block this decimilization change. It's limited in scope and an obvious improvement.

My concern was that it was layering more workarounds on a patch series that might be going in the wrong direction overall.

Anyway, sounds like, if I'm understanding/tested correctly, that the issue with unmangleability isn't about the symbol name itself but the DW_AT_linkage_name in the DWARF, so it seems like if that isn't present then there's no issue if the real symbol name can't be unmangled/back to the simple name. (but there's some reason the DW_AT_linkage_name, if present, must match the symbol name? (so it can't be the original unsuffixed mangled name))

One reason that DW_AT_linkage_name should be consistent with the real linkage name is to favor AutoFDO. DW_AT_linkage_name is used to generate profiles for inlined functions.

In D94154#2486081, @hoy wrote:

In D94154#2485197, @dblaikie wrote:

In D94154#2482425, @rnk wrote:

In D94154#2481491, @dblaikie wrote:

This is only demangler friendly if the name is already an itanium mangled name, right? (ie: the name starts with _Z) So it wouldn't work for C code?

I'm pretty sure most demangling tools such as c++filt check for a leading prefix of _+Z before demangling, so I don't think there are any concerns for C names. A reasonable demangler would pass them through as is.

Yep - but passing them through unmodified was causing problems for gdb which was demangling the mangled names as they appear in the DWARF and using the unmangled name to figure out how to do name lookup. So if the symbol wasn't getting unmangled you wouldn't be able to "break f1" instead you'd have to "break f1.__part.1" which would be an unfortunate regression in debuggability.

But it seems like that only applies if a mangled name is present in the DWARF at all - if no mangled name is present, and the debug info just gives the pretty name it works OK. Bit weird to have no record of the real symbol name in the DWARF, but so far it doesn't seem to cause any problems? So I'm OK-ish with this.

Do you have plans to fix this more generally? (I think to fix it more generally, you might need to move this feature up to clang and have clang mangle the name then add the suffix (at least for itanium mangling - not sure if windows mangling supports arbitrary suffixes like this, @rnk might know) - that way for C functions you can force/enable the mangling as is done for attribute((overloadable)) and others (I think attribute((enable_if)) also causes mangling of C functions))

That was Sriram's original idea: have the mangler do it. I reviewed the code. Modifying the mangler was fairly complicated, so I suggested doing it in a pass. The original pass also operated by appending a suffix after mangling, it just happened earlier. There isn't a good way to "embed" this uniquification into the Itanium mangling scheme, so far as I am aware.

Oh, I was still in favor of adding it after the mangling (the current "mangled.__part.number") but I thought it may be necessary to force mangling on C functions before adding the suffix, if they needed to be demangleable for debug info purposes.

That all said, I don't see any reason to block this decimilization change. It's limited in scope and an obvious improvement.

My concern was that it was layering more workarounds on a patch series that might be going in the wrong direction overall.

Anyway, sounds like, if I'm understanding/tested correctly, that the issue with unmangleability isn't about the symbol name itself but the DW_AT_linkage_name in the DWARF, so it seems like if that isn't present then there's no issue if the real symbol name can't be unmangled/back to the simple name. (but there's some reason the DW_AT_linkage_name, if present, must match the symbol name? (so it can't be the original unsuffixed mangled name))

One reason that DW_AT_linkage_name should be consistent with the real linkage name is to favor AutoFDO. DW_AT_linkage_name is used to generate profiles for inlined functions.

How's this work for C functions? Same sort of problem?

In D94154#2487727, @dblaikie wrote:

In D94154#2486081, @hoy wrote:

In D94154#2485197, @dblaikie wrote:

In D94154#2482425, @rnk wrote:

In D94154#2481491, @dblaikie wrote:

This is only demangler friendly if the name is already an itanium mangled name, right? (ie: the name starts with _Z) So it wouldn't work for C code?

I'm pretty sure most demangling tools such as c++filt check for a leading prefix of _+Z before demangling, so I don't think there are any concerns for C names. A reasonable demangler would pass them through as is.

Yep - but passing them through unmodified was causing problems for gdb which was demangling the mangled names as they appear in the DWARF and using the unmangled name to figure out how to do name lookup. So if the symbol wasn't getting unmangled you wouldn't be able to "break f1" instead you'd have to "break f1.__part.1" which would be an unfortunate regression in debuggability.

But it seems like that only applies if a mangled name is present in the DWARF at all - if no mangled name is present, and the debug info just gives the pretty name it works OK. Bit weird to have no record of the real symbol name in the DWARF, but so far it doesn't seem to cause any problems? So I'm OK-ish with this.

Do you have plans to fix this more generally? (I think to fix it more generally, you might need to move this feature up to clang and have clang mangle the name then add the suffix (at least for itanium mangling - not sure if windows mangling supports arbitrary suffixes like this, @rnk might know) - that way for C functions you can force/enable the mangling as is done for attribute((overloadable)) and others (I think attribute((enable_if)) also causes mangling of C functions))

That was Sriram's original idea: have the mangler do it. I reviewed the code. Modifying the mangler was fairly complicated, so I suggested doing it in a pass. The original pass also operated by appending a suffix after mangling, it just happened earlier. There isn't a good way to "embed" this uniquification into the Itanium mangling scheme, so far as I am aware.

Oh, I was still in favor of adding it after the mangling (the current "mangled.__part.number") but I thought it may be necessary to force mangling on C functions before adding the suffix, if they needed to be demangleable for debug info purposes.

That all said, I don't see any reason to block this decimilization change. It's limited in scope and an obvious improvement.

My concern was that it was layering more workarounds on a patch series that might be going in the wrong direction overall.

Anyway, sounds like, if I'm understanding/tested correctly, that the issue with unmangleability isn't about the symbol name itself but the DW_AT_linkage_name in the DWARF, so it seems like if that isn't present then there's no issue if the real symbol name can't be unmangled/back to the simple name. (but there's some reason the DW_AT_linkage_name, if present, must match the symbol name? (so it can't be the original unsuffixed mangled name))

One reason that DW_AT_linkage_name should be consistent with the real linkage name is to favor AutoFDO. DW_AT_linkage_name is used to generate profiles for inlined functions.

How's this work for C functions? Same sort of problem?

Yes, same problem exists for C functions. For C functions without DW_AT_linkage_name, DW_AT_name will be used by AutoFDO and that's not uniquefied.

In D94154#2487745, @hoy wrote:

In D94154#2487727, @dblaikie wrote:

In D94154#2486081, @hoy wrote:

In D94154#2485197, @dblaikie wrote:

In D94154#2482425, @rnk wrote:

In D94154#2481491, @dblaikie wrote:

This is only demangler friendly if the name is already an itanium mangled name, right? (ie: the name starts with _Z) So it wouldn't work for C code?

I'm pretty sure most demangling tools such as c++filt check for a leading prefix of _+Z before demangling, so I don't think there are any concerns for C names. A reasonable demangler would pass them through as is.

Yep - but passing them through unmodified was causing problems for gdb which was demangling the mangled names as they appear in the DWARF and using the unmangled name to figure out how to do name lookup. So if the symbol wasn't getting unmangled you wouldn't be able to "break f1" instead you'd have to "break f1.__part.1" which would be an unfortunate regression in debuggability.

But it seems like that only applies if a mangled name is present in the DWARF at all - if no mangled name is present, and the debug info just gives the pretty name it works OK. Bit weird to have no record of the real symbol name in the DWARF, but so far it doesn't seem to cause any problems? So I'm OK-ish with this.

Do you have plans to fix this more generally? (I think to fix it more generally, you might need to move this feature up to clang and have clang mangle the name then add the suffix (at least for itanium mangling - not sure if windows mangling supports arbitrary suffixes like this, @rnk might know) - that way for C functions you can force/enable the mangling as is done for attribute((overloadable)) and others (I think attribute((enable_if)) also causes mangling of C functions))

That was Sriram's original idea: have the mangler do it. I reviewed the code. Modifying the mangler was fairly complicated, so I suggested doing it in a pass. The original pass also operated by appending a suffix after mangling, it just happened earlier. There isn't a good way to "embed" this uniquification into the Itanium mangling scheme, so far as I am aware.

Oh, I was still in favor of adding it after the mangling (the current "mangled.__part.number") but I thought it may be necessary to force mangling on C functions before adding the suffix, if they needed to be demangleable for debug info purposes.

That all said, I don't see any reason to block this decimilization change. It's limited in scope and an obvious improvement.

My concern was that it was layering more workarounds on a patch series that might be going in the wrong direction overall.

Anyway, sounds like, if I'm understanding/tested correctly, that the issue with unmangleability isn't about the symbol name itself but the DW_AT_linkage_name in the DWARF, so it seems like if that isn't present then there's no issue if the real symbol name can't be unmangled/back to the simple name. (but there's some reason the DW_AT_linkage_name, if present, must match the symbol name? (so it can't be the original unsuffixed mangled name))

One reason that DW_AT_linkage_name should be consistent with the real linkage name is to favor AutoFDO. DW_AT_linkage_name is used to generate profiles for inlined functions.

How's this work for C functions? Same sort of problem?

Yes, same problem exists for C functions. For C functions without DW_AT_linkage_name, DW_AT_name will be used by AutoFDO and that's not uniquefied.

And it can't be added because that'd break debuggers by giving them a (correct) linkage name that they couldn't demangle.

@rnk so I think this ^ leads us back to the "maybe this should be done in clang where we can enable mangling for C static functions when needed/when this feature is enabled". (I don't mean that we should add this scheme into the mangling scheem proper - I'm still OK with it being a ".part.N" suffix - but that that suffix should only be added to a mangled name, not to an unmangled name like a plain C function name) - not sure how that works for Windows (does Windows mangling scheme have a notion of suffixes, etc - I guess this system as implemented assumes Windows is cool with ".foo.N" suffixes?).

In D94154#2487762, @dblaikie wrote:

In D94154#2487745, @hoy wrote:

In D94154#2487727, @dblaikie wrote:

In D94154#2486081, @hoy wrote:

In D94154#2485197, @dblaikie wrote:

In D94154#2482425, @rnk wrote:

In D94154#2481491, @dblaikie wrote:

This is only demangler friendly if the name is already an itanium mangled name, right? (ie: the name starts with _Z) So it wouldn't work for C code?

I'm pretty sure most demangling tools such as c++filt check for a leading prefix of _+Z before demangling, so I don't think there are any concerns for C names. A reasonable demangler would pass them through as is.

Yep - but passing them through unmodified was causing problems for gdb which was demangling the mangled names as they appear in the DWARF and using the unmangled name to figure out how to do name lookup. So if the symbol wasn't getting unmangled you wouldn't be able to "break f1" instead you'd have to "break f1.__part.1" which would be an unfortunate regression in debuggability.

But it seems like that only applies if a mangled name is present in the DWARF at all - if no mangled name is present, and the debug info just gives the pretty name it works OK. Bit weird to have no record of the real symbol name in the DWARF, but so far it doesn't seem to cause any problems? So I'm OK-ish with this.

Do you have plans to fix this more generally? (I think to fix it more generally, you might need to move this feature up to clang and have clang mangle the name then add the suffix (at least for itanium mangling - not sure if windows mangling supports arbitrary suffixes like this, @rnk might know) - that way for C functions you can force/enable the mangling as is done for attribute((overloadable)) and others (I think attribute((enable_if)) also causes mangling of C functions))

That was Sriram's original idea: have the mangler do it. I reviewed the code. Modifying the mangler was fairly complicated, so I suggested doing it in a pass. The original pass also operated by appending a suffix after mangling, it just happened earlier. There isn't a good way to "embed" this uniquification into the Itanium mangling scheme, so far as I am aware.

Oh, I was still in favor of adding it after the mangling (the current "mangled.__part.number") but I thought it may be necessary to force mangling on C functions before adding the suffix, if they needed to be demangleable for debug info purposes.

That all said, I don't see any reason to block this decimilization change. It's limited in scope and an obvious improvement.

My concern was that it was layering more workarounds on a patch series that might be going in the wrong direction overall.

Anyway, sounds like, if I'm understanding/tested correctly, that the issue with unmangleability isn't about the symbol name itself but the DW_AT_linkage_name in the DWARF, so it seems like if that isn't present then there's no issue if the real symbol name can't be unmangled/back to the simple name. (but there's some reason the DW_AT_linkage_name, if present, must match the symbol name? (so it can't be the original unsuffixed mangled name))

One reason that DW_AT_linkage_name should be consistent with the real linkage name is to favor AutoFDO. DW_AT_linkage_name is used to generate profiles for inlined functions.

How's this work for C functions? Same sort of problem?

Yes, same problem exists for C functions. For C functions without DW_AT_linkage_name, DW_AT_name will be used by AutoFDO and that's not uniquefied.

And it can't be added because that'd break debuggers by giving them a (correct) linkage name that they couldn't demangle.

@rnk so I think this ^ leads us back to the "maybe this should be done in clang where we can enable mangling for C static functions when needed/when this feature is enabled". (I don't mean that we should add this scheme into the mangling scheem proper - I'm still OK with it being a ".part.N" suffix - but that that suffix should only be added to a mangled name, not to an unmangled name like a plain C function name) - not sure how that works for Windows (does Windows mangling scheme have a notion of suffixes, etc - I guess this system as implemented assumes Windows is cool with ".foo.N" suffixes?).

We are going in circles. For C functions, there is no linkage name added and it will not be newly introduced either, Hoy's patch specifically updates linkage name only when it is not null. When "overloadable" is used, linkage name is added as a mangled name for C functions too and the suffix will also make it demangleable. I don't see any issues.

In D94154#2487891, @tmsriram wrote:

In D94154#2487762, @dblaikie wrote:

@rnk so I think this ^ leads us back to the "maybe this should be done in clang where we can enable mangling for C static functions when needed/when this feature is enabled". (I don't mean that we should add this scheme into the mangling scheem proper - I'm still OK with it being a ".part.N" suffix - but that that suffix should only be added to a mangled name, not to an unmangled name like a plain C function name) - not sure how that works for Windows (does Windows mangling scheme have a notion of suffixes, etc - I guess this system as implemented assumes Windows is cool with ".foo.N" suffixes?).

We are going in circles. For C functions, there is no linkage name added and it will not be newly introduced either, Hoy's patch specifically updates linkage name only when it is not null. When "overloadable" is used, linkage name is added as a mangled name for C functions too and the suffix will also make it demangleable. I don't see any issues.

I think David already approved the patch in Phabricator, feel free to land it. David isn't blocking the patch, I think he's suggesting that perhaps we should revisit the IR pass design decision in the near future.

Regarding David's suggestion to only added suffixes to mangled names, wouldn't that be problematic for plain C static functions with no mangling? Suppose you have a program with many C TUs, each of which contains a static free function foo. In order for AutoFDO to work, the compiler has to rename all those foo's to something globally unique. That will probably break some debugging functionality, but hey, the user asked for unique names, so they got them.

Regarding the Microsoft mangling scheme, it is possible that these names will not demangle. Thinking about it more now, in the MS name mangling scheme, clang already tries to generate globally unique names for C++ things with internal linkage (lambdas, anonymous namespaces, etc). It is possible to create entities with non-unique names with unnamed tag types and C static functions, but for the debugger's sake, the mangling scheme makes a best effort to generate globally unique names. This seems like a good reason to actually move this functionality back to the frontend mangler, since only the frontend knows which internal linkage functions already have globally unique IDs embedded in them.

Anyway, I apologize for any confusing or contradictory feedback I may have given. I'm trying to unblock folks, but this doesn't have my full attention.

In D94154#2488088, @rnk wrote:

In D94154#2487891, @tmsriram wrote:

In D94154#2487762, @dblaikie wrote:

@rnk so I think this ^ leads us back to the "maybe this should be done in clang where we can enable mangling for C static functions when needed/when this feature is enabled". (I don't mean that we should add this scheme into the mangling scheem proper - I'm still OK with it being a ".part.N" suffix - but that that suffix should only be added to a mangled name, not to an unmangled name like a plain C function name) - not sure how that works for Windows (does Windows mangling scheme have a notion of suffixes, etc - I guess this system as implemented assumes Windows is cool with ".foo.N" suffixes?).

We are going in circles. For C functions, there is no linkage name added and it will not be newly introduced either, Hoy's patch specifically updates linkage name only when it is not null. When "overloadable" is used, linkage name is added as a mangled name for C functions too and the suffix will also make it demangleable. I don't see any issues.

I think David already approved the patch in Phabricator, feel free to land it. David isn't blocking the patch, I think he's suggesting that perhaps we should revisit the IR pass design decision in the near future.

No worries at all, David has a valid point and I agree with the C issue he brought up.

Regarding David's suggestion to only added suffixes to mangled names, wouldn't that be problematic for plain C static functions with no mangling? Suppose you have a program with many C TUs, each of which contains a static free function foo. In order for AutoFDO to work, the compiler has to rename all those foo's to something globally unique. That will probably break some debugging functionality, but hey, the user asked for unique names, so they got them.

Regarding the Microsoft mangling scheme, it is possible that these names will not demangle. Thinking about it more now, in the MS name mangling scheme, clang already tries to generate globally unique names for C++ things with internal linkage (lambdas, anonymous namespaces, etc). It is possible to create entities with non-unique names with unnamed tag types and C static functions, but for the debugger's sake, the mangling scheme makes a best effort to generate globally unique names. This seems like a good reason to actually move this functionality back to the frontend mangler, since only the frontend knows which internal linkage functions already have globally unique IDs embedded in them.

I don't think there is a correctness issue but there is an opportunity lost with C internal linkage functions. The way to solve this would be to generate the mangled linkage name of the function when unique names is needed. But, that would mean the unique names feature should be done early in clang and not as a pass. It looks like David Blaikie is right here and I tend to agree. What do you think @rnk ? If that is alright, I can resurrect the old patch.

Anyway, I apologize for any confusing or contradictory feedback I may have given. I'm trying to unblock folks, but this doesn't have my full attention.

In D94154#2488088, @rnk wrote:

In D94154#2487891, @tmsriram wrote:

In D94154#2487762, @dblaikie wrote:

@rnk so I think this ^ leads us back to the "maybe this should be done in clang where we can enable mangling for C static functions when needed/when this feature is enabled". (I don't mean that we should add this scheme into the mangling scheem proper - I'm still OK with it being a ".part.N" suffix - but that that suffix should only be added to a mangled name, not to an unmangled name like a plain C function name) - not sure how that works for Windows (does Windows mangling scheme have a notion of suffixes, etc - I guess this system as implemented assumes Windows is cool with ".foo.N" suffixes?).

We are going in circles. For C functions, there is no linkage name added and it will not be newly introduced either, Hoy's patch specifically updates linkage name only when it is not null. When "overloadable" is used, linkage name is added as a mangled name for C functions too and the suffix will also make it demangleable. I don't see any issues.

I think David already approved the patch in Phabricator, feel free to land it. David isn't blocking the patch, I think he's suggesting that perhaps we should revisit the IR pass design decision in the near future.

Regarding David's suggestion to only added suffixes to mangled names, wouldn't that be problematic for plain C static functions with no mangling? Suppose you have a program with many C TUs, each of which contains a static free function foo. In order for AutoFDO to work, the compiler has to rename all those foo's to something globally unique. That will probably break some debugging functionality, but hey, the user asked for unique names, so they got them.

For static C functions, if we are going to rename them in Clang, can a mangled debug linkage name be assigned to them as if they are decorated with the overloadable attribute?

Regarding the Microsoft mangling scheme, it is possible that these names will not demangle. Thinking about it more now, in the MS name mangling scheme, clang already tries to generate globally unique names for C++ things with internal linkage (lambdas, anonymous namespaces, etc). It is possible to create entities with non-unique names with unnamed tag types and C static functions, but for the debugger's sake, the mangling scheme makes a best effort to generate globally unique names. This seems like a good reason to actually move this functionality back to the frontend mangler, since only the frontend knows which internal linkage functions already have globally unique IDs embedded in them.

Anyway, I apologize for any confusing or contradictory feedback I may have given. I'm trying to unblock folks, but this doesn't have my full attention.

In D94154#2488088, @rnk wrote:

In D94154#2487891, @tmsriram wrote:

In D94154#2487762, @dblaikie wrote:

@rnk so I think this ^ leads us back to the "maybe this should be done in clang where we can enable mangling for C static functions when needed/when this feature is enabled". (I don't mean that we should add this scheme into the mangling scheem proper - I'm still OK with it being a ".part.N" suffix - but that that suffix should only be added to a mangled name, not to an unmangled name like a plain C function name) - not sure how that works for Windows (does Windows mangling scheme have a notion of suffixes, etc - I guess this system as implemented assumes Windows is cool with ".foo.N" suffixes?).

We are going in circles. For C functions, there is no linkage name added and it will not be newly introduced either, Hoy's patch specifically updates linkage name only when it is not null. When "overloadable" is used, linkage name is added as a mangled name for C functions too and the suffix will also make it demangleable. I don't see any issues.

I think David already approved the patch in Phabricator, feel free to land it. David isn't blocking the patch, I think he's suggesting that perhaps we should revisit the IR pass design decision in the near future.

Regarding David's suggestion to only added suffixes to mangled names, wouldn't that be problematic for plain C static functions with no mangling? Suppose you have a program with many C TUs, each of which contains a static free function foo. In order for AutoFDO to work, the compiler has to rename all those foo's to something globally unique. That will probably break some debugging functionality, but hey, the user asked for unique names, so they got them.

Regarding the Microsoft mangling scheme, it is possible that these names will not demangle. Thinking about it more now, in the MS name mangling scheme, clang already tries to generate globally unique names for C++ things with internal linkage (lambdas, anonymous namespaces, etc). It is possible to create entities with non-unique names with unnamed tag types and C static functions, but for the debugger's sake, the mangling scheme makes a best effort to generate globally unique names. This seems like a good reason to actually move this functionality back to the frontend mangler, since only the frontend knows which internal linkage functions already have globally unique IDs embedded in them.

Anyway, I apologize for any confusing or contradictory feedback I may have given. I'm trying to unblock folks, but this doesn't have my full attention.

Closed by commit rGd8c6d24359f1: -funique-internal-linkage-names appends a hex md5hash suffix to the symbol name… (authored by tmsriram). · Explain WhyJan 11 2021, 11:10 AM

This revision was automatically updated to reflect the committed changes.

tmsriram added a commit: rGd8c6d24359f1: -funique-internal-linkage-names appends a hex md5hash suffix to the symbol name….

Herald added a project: Restricted Project. · View Herald TranscriptJan 11 2021, 11:10 AM

Herald added a subscriber: cfe-commits. · View Herald Transcript

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Utils/

UniqueInternalLinkageNames.cpp

5 lines

test/

Transforms/

UniqueInternalLinkageNames/

unique_symbol_names.ll

4 lines

Diff 314798

llvm/lib/Transforms/Utils/UniqueInternalLinkageNames.cpp

	Show All 21 Lines

	static bool uniqueifyInternalLinkageNames(Module &M) {			static bool uniqueifyInternalLinkageNames(Module &M) {
	llvm::MD5 Md5;			llvm::MD5 Md5;
	Md5.update(M.getSourceFileName());			Md5.update(M.getSourceFileName());
	llvm::MD5::MD5Result R;			llvm::MD5::MD5Result R;
	Md5.final(R);			Md5.final(R);
	SmallString<32> Str;			SmallString<32> Str;
	llvm::MD5::stringifyResult(R, Str);			llvm::MD5::stringifyResult(R, Str);
				// Convert MD5hash to Decimal. Demangler suffixes can either contain numbers
				// or characters but not both.
				APInt IntHash = APInt(128, Str.str(), 16);
	// Prepend "__uniq" before the hash for tools like profilers to understand that			// Prepend "__uniq" before the hash for tools like profilers to understand that
	// this symbol is of internal linkage type.			// this symbol is of internal linkage type.
	std::string ModuleNameHash = (Twine(".__uniq.") + Twine(Str)).str();			std::string ModuleNameHash = (Twine(".__uniq.") + Twine(IntHash.toString(10, false))).str();
	bool Changed = false;			bool Changed = false;

	// Append the module hash to all internal linkage functions.			// Append the module hash to all internal linkage functions.
	for (auto &F : M) {			for (auto &F : M) {
	if (F.hasInternalLinkage()) {			if (F.hasInternalLinkage()) {
	F.setName(F.getName() + ModuleNameHash);			F.setName(F.getName() + ModuleNameHash);
	Changed = true;			Changed = true;
	}			}
	▲ Show 20 Lines • Show All 59 Lines • Show Last 20 Lines

llvm/test/Transforms/UniqueInternalLinkageNames/unique_symbol_names.ll

	; RUN: opt -S -unique-internal-linkage-names < %s \| FileCheck %s			; RUN: opt -S -unique-internal-linkage-names < %s \| FileCheck %s
	; RUN: opt -S -passes=unique-internal-linkage-names < %s \| FileCheck %s			; RUN: opt -S -passes=unique-internal-linkage-names < %s \| FileCheck %s

	source_filename = "foo.c"			source_filename = "foo.c"

	@glob = internal global i32 0			@glob = internal global i32 0

	define internal i32 @foo() {			define internal i32 @foo() {
	entry:			entry:
	ret i32 0			ret i32 0
	}			}

	; CHECK: @glob.__uniq.6ae72bb15a7d1834b42ae042a58f7a4d = internal global			; CHECK: @glob.__uniq.142098474322525230676991677820000238157 = internal global
				hoyUnsubmitted Done Reply Inline Actions The clang side test `unique-internal-linkage-names.cpp` may also need to be changed. hoy: The clang side test `unique-internal-linkage-names.cpp` may also need to be changed.
	; CHECK: define internal i32 @foo.__uniq.6ae72bb15a7d1834b42ae042a58f7a4d()			; CHECK: define internal i32 @foo.__uniq.142098474322525230676991677820000238157()