This is an archive of the discontinued LLVM Phabricator instance.

Unique Internal Linkage Name suffixes must be demangler friendly
ClosedPublic

Authored by tmsriram on Jan 5 2021, 10:30 PM.

Details

Summary

-funique-internal-linkage-names appends a hex md5hash suffix to the symbol name which is not demangler friendly, convert it to decimal.

Please see D93747 for more context which tries to make linkage names of internal linkage functions to be the uniqueified names. This causes a problem with gdb because breaking using the demangled function name will not work if the new uniqueified name cannot be demangled. The problem is the generated suffix which is a mix of integers and letters which do not demangle. The demangler accepts either all numbers or all letters. This patch simply converts the hash to decimal.

There is no loss of uniqueness by doing this as the precision is maintained. The symbol names get longer by a few characters though.

Diff Detail

Event Timeline

tmsriram created this revision.Jan 5 2021, 10:30 PM
tmsriram requested review of this revision.Jan 5 2021, 10:30 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 5 2021, 10:30 PM
hoy added a comment.Jan 5 2021, 10:36 PM

Thanks for the quick fix!

llvm/test/Transforms/UniqueInternalLinkageNames/unique_symbol_names.ll
13

The clang side test unique-internal-linkage-names.cpp may also need to be changed.

dblaikie added a subscriber: rnk.Jan 6 2021, 12:01 AM

This is only demangler friendly if the name is already an itanium mangled name, right? (ie: the name starts with _Z) So it wouldn't work for C code?

Do you have plans to fix this more generally? (I think to fix it more generally, you might need to move this feature up to clang and have clang mangle the name then add the suffix (at least for itanium mangling - not sure if windows mangling supports arbitrary suffixes like this, @rnk might know) - that way for C functions you can force/enable the mangling as is done for attribute((overloadable)) and others (I think attribute((enable_if)) also causes mangling of C functions))

rnk added a comment.Jan 6 2021, 9:37 AM

This is only demangler friendly if the name is already an itanium mangled name, right? (ie: the name starts with _Z) So it wouldn't work for C code?

I'm pretty sure most demangling tools such as c++filt check for a leading prefix of _+Z before demangling, so I don't think there are any concerns for C names. A reasonable demangler would pass them through as is.

Do you have plans to fix this more generally? (I think to fix it more generally, you might need to move this feature up to clang and have clang mangle the name then add the suffix (at least for itanium mangling - not sure if windows mangling supports arbitrary suffixes like this, @rnk might know) - that way for C functions you can force/enable the mangling as is done for attribute((overloadable)) and others (I think attribute((enable_if)) also causes mangling of C functions))

That was Sriram's original idea: have the mangler do it. I reviewed the code. Modifying the mangler was fairly complicated, so I suggested doing it in a pass. The original pass also operated by appending a suffix after mangling, it just happened earlier. There isn't a good way to "embed" this uniquification into the Itanium mangling scheme, so far as I am aware. The best way I can think of would be to pretend that internal linkage things are in an anonymous namespace. IIRC there used to be ways to embed something unique into that part of the mangling.

That all said, I don't see any reason to block this decimilization change. It's limited in scope and an obvious improvement.

This is only demangler friendly if the name is already an itanium mangled name, right? (ie: the name starts with _Z) So it wouldn't work for C code?

Right. But in this case, a DW_AT_linkage_name is never generated. The DW_AT_name uses the C style symbol name. This proves we should not force a DW_AT_linkage_name with uniqueified names if the field was null to begin with, does that resolve this problem? In the "overloadable" scenario, the linkage name is mangled so there is no problem.

Do you have plans to fix this more generally? (I think to fix it more generally, you might need to move this feature up to clang and have clang mangle the name then add the suffix (at least for itanium mangling - not sure if windows mangling supports arbitrary suffixes like this, @rnk might know) - that way for C functions you can force/enable the mangling as is done for attribute((overloadable)) and others (I think attribute((enable_if)) also causes mangling of C functions))

! In D94154#2481491, @dblaikie wrote:

This is only demangler friendly if the name is already an itanium mangled name, right? (ie: the name starts with _Z) So it wouldn't work for C code?

Do you have plans to fix this more generally? (I think to fix it more generally, you might need to move this feature up to clang and have clang mangle the name then add the suffix (at least for itanium mangling - not sure if windows mangling supports arbitrary suffixes like this, @rnk might know) - that way for C functions you can force/enable the mangling as is done for attribute((overloadable)) and others (I think attribute((enable_if)) also causes mangling of C functions))

tmsriram updated this revision to Diff 314938.Jan 6 2021, 10:43 AM
tmsriram marked an inline comment as done.

Fix a clang test to reflect the new change.

https://github.com/gcc-mirror/gcc/blob/master/libiberty/cp-demangle.c#L3863

The demangler code for suffixes that accepts either lowercase letters with underscore or numbers.

hoy added a comment.Jan 6 2021, 4:40 PM

https://github.com/gcc-mirror/gcc/blob/master/libiberty/cp-demangle.c#L3863

The demangler code for suffixes that accepts either lowercase letters with underscore or numbers.

Thanks for pointing out. And we already do something like that during LTO symbol promotion: https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/IR/ModuleSummaryIndex.h#L1376

hoy accepted this revision.Jan 6 2021, 4:40 PM
This revision is now accepted and ready to land.Jan 6 2021, 4:40 PM
dblaikie accepted this revision.Jan 7 2021, 12:32 PM
In D94154#2482425, @rnk wrote:

This is only demangler friendly if the name is already an itanium mangled name, right? (ie: the name starts with _Z) So it wouldn't work for C code?

I'm pretty sure most demangling tools such as c++filt check for a leading prefix of _+Z before demangling, so I don't think there are any concerns for C names. A reasonable demangler would pass them through as is.

Yep - but passing them through unmodified was causing problems for gdb which was demangling the mangled names as they appear in the DWARF and using the unmangled name to figure out how to do name lookup. So if the symbol wasn't getting unmangled you wouldn't be able to "break f1" instead you'd have to "break f1.__part.1" which would be an unfortunate regression in debuggability.

But it seems like that only applies if a mangled name is present in the DWARF at all - if no mangled name is present, and the debug info just gives the pretty name it works OK. Bit weird to have no record of the real symbol name in the DWARF, but so far it doesn't seem to cause any problems? So I'm OK-ish with this.

Do you have plans to fix this more generally? (I think to fix it more generally, you might need to move this feature up to clang and have clang mangle the name then add the suffix (at least for itanium mangling - not sure if windows mangling supports arbitrary suffixes like this, @rnk might know) - that way for C functions you can force/enable the mangling as is done for attribute((overloadable)) and others (I think attribute((enable_if)) also causes mangling of C functions))

That was Sriram's original idea: have the mangler do it. I reviewed the code. Modifying the mangler was fairly complicated, so I suggested doing it in a pass. The original pass also operated by appending a suffix after mangling, it just happened earlier. There isn't a good way to "embed" this uniquification into the Itanium mangling scheme, so far as I am aware.

Oh, I was still in favor of adding it after the mangling (the current "mangled.__part.number") but I thought it may be necessary to force mangling on C functions before adding the suffix, if they needed to be demangleable for debug info purposes.

That all said, I don't see any reason to block this decimilization change. It's limited in scope and an obvious improvement.

My concern was that it was layering more workarounds on a patch series that might be going in the wrong direction overall.

Anyway, sounds like, if I'm understanding/tested correctly, that the issue with unmangleability isn't about the symbol name itself but the DW_AT_linkage_name in the DWARF, so it seems like if that isn't present then there's no issue if the real symbol name can't be unmangled/back to the simple name. (but there's some reason the DW_AT_linkage_name, if present, must match the symbol name? (so it can't be the original unsuffixed mangled name))

hoy added a comment.Jan 7 2021, 10:42 PM
In D94154#2482425, @rnk wrote:

This is only demangler friendly if the name is already an itanium mangled name, right? (ie: the name starts with _Z) So it wouldn't work for C code?

I'm pretty sure most demangling tools such as c++filt check for a leading prefix of _+Z before demangling, so I don't think there are any concerns for C names. A reasonable demangler would pass them through as is.

Yep - but passing them through unmodified was causing problems for gdb which was demangling the mangled names as they appear in the DWARF and using the unmangled name to figure out how to do name lookup. So if the symbol wasn't getting unmangled you wouldn't be able to "break f1" instead you'd have to "break f1.__part.1" which would be an unfortunate regression in debuggability.

But it seems like that only applies if a mangled name is present in the DWARF at all - if no mangled name is present, and the debug info just gives the pretty name it works OK. Bit weird to have no record of the real symbol name in the DWARF, but so far it doesn't seem to cause any problems? So I'm OK-ish with this.

Do you have plans to fix this more generally? (I think to fix it more generally, you might need to move this feature up to clang and have clang mangle the name then add the suffix (at least for itanium mangling - not sure if windows mangling supports arbitrary suffixes like this, @rnk might know) - that way for C functions you can force/enable the mangling as is done for attribute((overloadable)) and others (I think attribute((enable_if)) also causes mangling of C functions))

That was Sriram's original idea: have the mangler do it. I reviewed the code. Modifying the mangler was fairly complicated, so I suggested doing it in a pass. The original pass also operated by appending a suffix after mangling, it just happened earlier. There isn't a good way to "embed" this uniquification into the Itanium mangling scheme, so far as I am aware.

Oh, I was still in favor of adding it after the mangling (the current "mangled.__part.number") but I thought it may be necessary to force mangling on C functions before adding the suffix, if they needed to be demangleable for debug info purposes.

That all said, I don't see any reason to block this decimilization change. It's limited in scope and an obvious improvement.

My concern was that it was layering more workarounds on a patch series that might be going in the wrong direction overall.

Anyway, sounds like, if I'm understanding/tested correctly, that the issue with unmangleability isn't about the symbol name itself but the DW_AT_linkage_name in the DWARF, so it seems like if that isn't present then there's no issue if the real symbol name can't be unmangled/back to the simple name. (but there's some reason the DW_AT_linkage_name, if present, must match the symbol name? (so it can't be the original unsuffixed mangled name))

One reason that DW_AT_linkage_name should be consistent with the real linkage name is to favor AutoFDO. DW_AT_linkage_name is used to generate profiles for inlined functions.

In D94154#2486081, @hoy wrote:
In D94154#2482425, @rnk wrote:

This is only demangler friendly if the name is already an itanium mangled name, right? (ie: the name starts with _Z) So it wouldn't work for C code?

I'm pretty sure most demangling tools such as c++filt check for a leading prefix of _+Z before demangling, so I don't think there are any concerns for C names. A reasonable demangler would pass them through as is.

Yep - but passing them through unmodified was causing problems for gdb which was demangling the mangled names as they appear in the DWARF and using the unmangled name to figure out how to do name lookup. So if the symbol wasn't getting unmangled you wouldn't be able to "break f1" instead you'd have to "break f1.__part.1" which would be an unfortunate regression in debuggability.

But it seems like that only applies if a mangled name is present in the DWARF at all - if no mangled name is present, and the debug info just gives the pretty name it works OK. Bit weird to have no record of the real symbol name in the DWARF, but so far it doesn't seem to cause any problems? So I'm OK-ish with this.

Do you have plans to fix this more generally? (I think to fix it more generally, you might need to move this feature up to clang and have clang mangle the name then add the suffix (at least for itanium mangling - not sure if windows mangling supports arbitrary suffixes like this, @rnk might know) - that way for C functions you can force/enable the mangling as is done for attribute((overloadable)) and others (I think attribute((enable_if)) also causes mangling of C functions))

That was Sriram's original idea: have the mangler do it. I reviewed the code. Modifying the mangler was fairly complicated, so I suggested doing it in a pass. The original pass also operated by appending a suffix after mangling, it just happened earlier. There isn't a good way to "embed" this uniquification into the Itanium mangling scheme, so far as I am aware.

Oh, I was still in favor of adding it after the mangling (the current "mangled.__part.number") but I thought it may be necessary to force mangling on C functions before adding the suffix, if they needed to be demangleable for debug info purposes.

That all said, I don't see any reason to block this decimilization change. It's limited in scope and an obvious improvement.

My concern was that it was layering more workarounds on a patch series that might be going in the wrong direction overall.

Anyway, sounds like, if I'm understanding/tested correctly, that the issue with unmangleability isn't about the symbol name itself but the DW_AT_linkage_name in the DWARF, so it seems like if that isn't present then there's no issue if the real symbol name can't be unmangled/back to the simple name. (but there's some reason the DW_AT_linkage_name, if present, must match the symbol name? (so it can't be the original unsuffixed mangled name))

One reason that DW_AT_linkage_name should be consistent with the real linkage name is to favor AutoFDO. DW_AT_linkage_name is used to generate profiles for inlined functions.

How's this work for C functions? Same sort of problem?

hoy added a comment.Jan 8 2021, 1:18 PM
In D94154#2486081, @hoy wrote:
In D94154#2482425, @rnk wrote:

This is only demangler friendly if the name is already an itanium mangled name, right? (ie: the name starts with _Z) So it wouldn't work for C code?

I'm pretty sure most demangling tools such as c++filt check for a leading prefix of _+Z before demangling, so I don't think there are any concerns for C names. A reasonable demangler would pass them through as is.

Yep - but passing them through unmodified was causing problems for gdb which was demangling the mangled names as they appear in the DWARF and using the unmangled name to figure out how to do name lookup. So if the symbol wasn't getting unmangled you wouldn't be able to "break f1" instead you'd have to "break f1.__part.1" which would be an unfortunate regression in debuggability.

But it seems like that only applies if a mangled name is present in the DWARF at all - if no mangled name is present, and the debug info just gives the pretty name it works OK. Bit weird to have no record of the real symbol name in the DWARF, but so far it doesn't seem to cause any problems? So I'm OK-ish with this.

Do you have plans to fix this more generally? (I think to fix it more generally, you might need to move this feature up to clang and have clang mangle the name then add the suffix (at least for itanium mangling - not sure if windows mangling supports arbitrary suffixes like this, @rnk might know) - that way for C functions you can force/enable the mangling as is done for attribute((overloadable)) and others (I think attribute((enable_if)) also causes mangling of C functions))

That was Sriram's original idea: have the mangler do it. I reviewed the code. Modifying the mangler was fairly complicated, so I suggested doing it in a pass. The original pass also operated by appending a suffix after mangling, it just happened earlier. There isn't a good way to "embed" this uniquification into the Itanium mangling scheme, so far as I am aware.

Oh, I was still in favor of adding it after the mangling (the current "mangled.__part.number") but I thought it may be necessary to force mangling on C functions before adding the suffix, if they needed to be demangleable for debug info purposes.

That all said, I don't see any reason to block this decimilization change. It's limited in scope and an obvious improvement.

My concern was that it was layering more workarounds on a patch series that might be going in the wrong direction overall.

Anyway, sounds like, if I'm understanding/tested correctly, that the issue with unmangleability isn't about the symbol name itself but the DW_AT_linkage_name in the DWARF, so it seems like if that isn't present then there's no issue if the real symbol name can't be unmangled/back to the simple name. (but there's some reason the DW_AT_linkage_name, if present, must match the symbol name? (so it can't be the original unsuffixed mangled name))

One reason that DW_AT_linkage_name should be consistent with the real linkage name is to favor AutoFDO. DW_AT_linkage_name is used to generate profiles for inlined functions.

How's this work for C functions? Same sort of problem?

Yes, same problem exists for C functions. For C functions without DW_AT_linkage_name, DW_AT_name will be used by AutoFDO and that's not uniquefied.

In D94154#2487745, @hoy wrote:
In D94154#2486081, @hoy wrote:
In D94154#2482425, @rnk wrote:

This is only demangler friendly if the name is already an itanium mangled name, right? (ie: the name starts with _Z) So it wouldn't work for C code?

I'm pretty sure most demangling tools such as c++filt check for a leading prefix of _+Z before demangling, so I don't think there are any concerns for C names. A reasonable demangler would pass them through as is.

Yep - but passing them through unmodified was causing problems for gdb which was demangling the mangled names as they appear in the DWARF and using the unmangled name to figure out how to do name lookup. So if the symbol wasn't getting unmangled you wouldn't be able to "break f1" instead you'd have to "break f1.__part.1" which would be an unfortunate regression in debuggability.

But it seems like that only applies if a mangled name is present in the DWARF at all - if no mangled name is present, and the debug info just gives the pretty name it works OK. Bit weird to have no record of the real symbol name in the DWARF, but so far it doesn't seem to cause any problems? So I'm OK-ish with this.

Do you have plans to fix this more generally? (I think to fix it more generally, you might need to move this feature up to clang and have clang mangle the name then add the suffix (at least for itanium mangling - not sure if windows mangling supports arbitrary suffixes like this, @rnk might know) - that way for C functions you can force/enable the mangling as is done for attribute((overloadable)) and others (I think attribute((enable_if)) also causes mangling of C functions))

That was Sriram's original idea: have the mangler do it. I reviewed the code. Modifying the mangler was fairly complicated, so I suggested doing it in a pass. The original pass also operated by appending a suffix after mangling, it just happened earlier. There isn't a good way to "embed" this uniquification into the Itanium mangling scheme, so far as I am aware.

Oh, I was still in favor of adding it after the mangling (the current "mangled.__part.number") but I thought it may be necessary to force mangling on C functions before adding the suffix, if they needed to be demangleable for debug info purposes.

That all said, I don't see any reason to block this decimilization change. It's limited in scope and an obvious improvement.

My concern was that it was layering more workarounds on a patch series that might be going in the wrong direction overall.

Anyway, sounds like, if I'm understanding/tested correctly, that the issue with unmangleability isn't about the symbol name itself but the DW_AT_linkage_name in the DWARF, so it seems like if that isn't present then there's no issue if the real symbol name can't be unmangled/back to the simple name. (but there's some reason the DW_AT_linkage_name, if present, must match the symbol name? (so it can't be the original unsuffixed mangled name))

One reason that DW_AT_linkage_name should be consistent with the real linkage name is to favor AutoFDO. DW_AT_linkage_name is used to generate profiles for inlined functions.

How's this work for C functions? Same sort of problem?

Yes, same problem exists for C functions. For C functions without DW_AT_linkage_name, DW_AT_name will be used by AutoFDO and that's not uniquefied.

And it can't be added because that'd break debuggers by giving them a (correct) linkage name that they couldn't demangle.

@rnk so I think this ^ leads us back to the "maybe this should be done in clang where we can enable mangling for C static functions when needed/when this feature is enabled". (I don't mean that we should add this scheme into the mangling scheem proper - I'm still OK with it being a ".part.N" suffix - but that that suffix should only be added to a mangled name, not to an unmangled name like a plain C function name) - not sure how that works for Windows (does Windows mangling scheme have a notion of suffixes, etc - I guess this system as implemented assumes Windows is cool with ".foo.N" suffixes?).

In D94154#2487745, @hoy wrote:
In D94154#2486081, @hoy wrote:
In D94154#2482425, @rnk wrote:

This is only demangler friendly if the name is already an itanium mangled name, right? (ie: the name starts with _Z) So it wouldn't work for C code?

I'm pretty sure most demangling tools such as c++filt check for a leading prefix of _+Z before demangling, so I don't think there are any concerns for C names. A reasonable demangler would pass them through as is.

Yep - but passing them through unmodified was causing problems for gdb which was demangling the mangled names as they appear in the DWARF and using the unmangled name to figure out how to do name lookup. So if the symbol wasn't getting unmangled you wouldn't be able to "break f1" instead you'd have to "break f1.__part.1" which would be an unfortunate regression in debuggability.

But it seems like that only applies if a mangled name is present in the DWARF at all - if no mangled name is present, and the debug info just gives the pretty name it works OK. Bit weird to have no record of the real symbol name in the DWARF, but so far it doesn't seem to cause any problems? So I'm OK-ish with this.

Do you have plans to fix this more generally? (I think to fix it more generally, you might need to move this feature up to clang and have clang mangle the name then add the suffix (at least for itanium mangling - not sure if windows mangling supports arbitrary suffixes like this, @rnk might know) - that way for C functions you can force/enable the mangling as is done for attribute((overloadable)) and others (I think attribute((enable_if)) also causes mangling of C functions))

That was Sriram's original idea: have the mangler do it. I reviewed the code. Modifying the mangler was fairly complicated, so I suggested doing it in a pass. The original pass also operated by appending a suffix after mangling, it just happened earlier. There isn't a good way to "embed" this uniquification into the Itanium mangling scheme, so far as I am aware.

Oh, I was still in favor of adding it after the mangling (the current "mangled.__part.number") but I thought it may be necessary to force mangling on C functions before adding the suffix, if they needed to be demangleable for debug info purposes.

That all said, I don't see any reason to block this decimilization change. It's limited in scope and an obvious improvement.

My concern was that it was layering more workarounds on a patch series that might be going in the wrong direction overall.

Anyway, sounds like, if I'm understanding/tested correctly, that the issue with unmangleability isn't about the symbol name itself but the DW_AT_linkage_name in the DWARF, so it seems like if that isn't present then there's no issue if the real symbol name can't be unmangled/back to the simple name. (but there's some reason the DW_AT_linkage_name, if present, must match the symbol name? (so it can't be the original unsuffixed mangled name))

One reason that DW_AT_linkage_name should be consistent with the real linkage name is to favor AutoFDO. DW_AT_linkage_name is used to generate profiles for inlined functions.

How's this work for C functions? Same sort of problem?

Yes, same problem exists for C functions. For C functions without DW_AT_linkage_name, DW_AT_name will be used by AutoFDO and that's not uniquefied.

And it can't be added because that'd break debuggers by giving them a (correct) linkage name that they couldn't demangle.

@rnk so I think this ^ leads us back to the "maybe this should be done in clang where we can enable mangling for C static functions when needed/when this feature is enabled". (I don't mean that we should add this scheme into the mangling scheem proper - I'm still OK with it being a ".part.N" suffix - but that that suffix should only be added to a mangled name, not to an unmangled name like a plain C function name) - not sure how that works for Windows (does Windows mangling scheme have a notion of suffixes, etc - I guess this system as implemented assumes Windows is cool with ".foo.N" suffixes?).

We are going in circles. For C functions, there is no linkage name added and it will not be newly introduced either, Hoy's patch specifically updates linkage name only when it is not null. When "overloadable" is used, linkage name is added as a mangled name for C functions too and the suffix will also make it demangleable. I don't see any issues.

rnk added a comment.Jan 8 2021, 3:41 PM

@rnk so I think this ^ leads us back to the "maybe this should be done in clang where we can enable mangling for C static functions when needed/when this feature is enabled". (I don't mean that we should add this scheme into the mangling scheem proper - I'm still OK with it being a ".part.N" suffix - but that that suffix should only be added to a mangled name, not to an unmangled name like a plain C function name) - not sure how that works for Windows (does Windows mangling scheme have a notion of suffixes, etc - I guess this system as implemented assumes Windows is cool with ".foo.N" suffixes?).

We are going in circles. For C functions, there is no linkage name added and it will not be newly introduced either, Hoy's patch specifically updates linkage name only when it is not null. When "overloadable" is used, linkage name is added as a mangled name for C functions too and the suffix will also make it demangleable. I don't see any issues.

I think David already approved the patch in Phabricator, feel free to land it. David isn't blocking the patch, I think he's suggesting that perhaps we should revisit the IR pass design decision in the near future.

Regarding David's suggestion to only added suffixes to mangled names, wouldn't that be problematic for plain C static functions with no mangling? Suppose you have a program with many C TUs, each of which contains a static free function foo. In order for AutoFDO to work, the compiler has to rename all those foo's to something globally unique. That will probably break some debugging functionality, but hey, the user asked for unique names, so they got them.

Regarding the Microsoft mangling scheme, it is possible that these names will not demangle. Thinking about it more now, in the MS name mangling scheme, clang already tries to generate globally unique names for C++ things with internal linkage (lambdas, anonymous namespaces, etc). It is possible to create entities with non-unique names with unnamed tag types and C static functions, but for the debugger's sake, the mangling scheme makes a best effort to generate globally unique names. This seems like a good reason to actually move this functionality back to the frontend mangler, since only the frontend knows which internal linkage functions already have globally unique IDs embedded in them.

Anyway, I apologize for any confusing or contradictory feedback I may have given. I'm trying to unblock folks, but this doesn't have my full attention.

In D94154#2488088, @rnk wrote:

@rnk so I think this ^ leads us back to the "maybe this should be done in clang where we can enable mangling for C static functions when needed/when this feature is enabled". (I don't mean that we should add this scheme into the mangling scheem proper - I'm still OK with it being a ".part.N" suffix - but that that suffix should only be added to a mangled name, not to an unmangled name like a plain C function name) - not sure how that works for Windows (does Windows mangling scheme have a notion of suffixes, etc - I guess this system as implemented assumes Windows is cool with ".foo.N" suffixes?).

We are going in circles. For C functions, there is no linkage name added and it will not be newly introduced either, Hoy's patch specifically updates linkage name only when it is not null. When "overloadable" is used, linkage name is added as a mangled name for C functions too and the suffix will also make it demangleable. I don't see any issues.

I think David already approved the patch in Phabricator, feel free to land it. David isn't blocking the patch, I think he's suggesting that perhaps we should revisit the IR pass design decision in the near future.

No worries at all, David has a valid point and I agree with the C issue he brought up.

Regarding David's suggestion to only added suffixes to mangled names, wouldn't that be problematic for plain C static functions with no mangling? Suppose you have a program with many C TUs, each of which contains a static free function foo. In order for AutoFDO to work, the compiler has to rename all those foo's to something globally unique. That will probably break some debugging functionality, but hey, the user asked for unique names, so they got them.

Regarding the Microsoft mangling scheme, it is possible that these names will not demangle. Thinking about it more now, in the MS name mangling scheme, clang already tries to generate globally unique names for C++ things with internal linkage (lambdas, anonymous namespaces, etc). It is possible to create entities with non-unique names with unnamed tag types and C static functions, but for the debugger's sake, the mangling scheme makes a best effort to generate globally unique names. This seems like a good reason to actually move this functionality back to the frontend mangler, since only the frontend knows which internal linkage functions already have globally unique IDs embedded in them.

I don't think there is a correctness issue but there is an opportunity lost with C internal linkage functions. The way to solve this would be to generate the mangled linkage name of the function when unique names is needed. But, that would mean the unique names feature should be done early in clang and not as a pass. It looks like David Blaikie is right here and I tend to agree. What do you think @rnk ? If that is alright, I can resurrect the old patch.

Anyway, I apologize for any confusing or contradictory feedback I may have given. I'm trying to unblock folks, but this doesn't have my full attention.

hoy added a comment.Jan 8 2021, 3:56 PM
In D94154#2488088, @rnk wrote:

@rnk so I think this ^ leads us back to the "maybe this should be done in clang where we can enable mangling for C static functions when needed/when this feature is enabled". (I don't mean that we should add this scheme into the mangling scheem proper - I'm still OK with it being a ".part.N" suffix - but that that suffix should only be added to a mangled name, not to an unmangled name like a plain C function name) - not sure how that works for Windows (does Windows mangling scheme have a notion of suffixes, etc - I guess this system as implemented assumes Windows is cool with ".foo.N" suffixes?).

We are going in circles. For C functions, there is no linkage name added and it will not be newly introduced either, Hoy's patch specifically updates linkage name only when it is not null. When "overloadable" is used, linkage name is added as a mangled name for C functions too and the suffix will also make it demangleable. I don't see any issues.

I think David already approved the patch in Phabricator, feel free to land it. David isn't blocking the patch, I think he's suggesting that perhaps we should revisit the IR pass design decision in the near future.

Regarding David's suggestion to only added suffixes to mangled names, wouldn't that be problematic for plain C static functions with no mangling? Suppose you have a program with many C TUs, each of which contains a static free function foo. In order for AutoFDO to work, the compiler has to rename all those foo's to something globally unique. That will probably break some debugging functionality, but hey, the user asked for unique names, so they got them.

For static C functions, if we are going to rename them in Clang, can a mangled debug linkage name be assigned to them as if they are decorated with the overloadable attribute?

Regarding the Microsoft mangling scheme, it is possible that these names will not demangle. Thinking about it more now, in the MS name mangling scheme, clang already tries to generate globally unique names for C++ things with internal linkage (lambdas, anonymous namespaces, etc). It is possible to create entities with non-unique names with unnamed tag types and C static functions, but for the debugger's sake, the mangling scheme makes a best effort to generate globally unique names. This seems like a good reason to actually move this functionality back to the frontend mangler, since only the frontend knows which internal linkage functions already have globally unique IDs embedded in them.

Anyway, I apologize for any confusing or contradictory feedback I may have given. I'm trying to unblock folks, but this doesn't have my full attention.

In D94154#2488088, @rnk wrote:

@rnk so I think this ^ leads us back to the "maybe this should be done in clang where we can enable mangling for C static functions when needed/when this feature is enabled". (I don't mean that we should add this scheme into the mangling scheem proper - I'm still OK with it being a ".part.N" suffix - but that that suffix should only be added to a mangled name, not to an unmangled name like a plain C function name) - not sure how that works for Windows (does Windows mangling scheme have a notion of suffixes, etc - I guess this system as implemented assumes Windows is cool with ".foo.N" suffixes?).

We are going in circles. For C functions, there is no linkage name added and it will not be newly introduced either, Hoy's patch specifically updates linkage name only when it is not null. When "overloadable" is used, linkage name is added as a mangled name for C functions too and the suffix will also make it demangleable. I don't see any issues.

I think David already approved the patch in Phabricator, feel free to land it. David isn't blocking the patch, I think he's suggesting that perhaps we should revisit the IR pass design decision in the near future.

Regarding David's suggestion to only added suffixes to mangled names, wouldn't that be problematic for plain C static functions with no mangling? Suppose you have a program with many C TUs, each of which contains a static free function foo. In order for AutoFDO to work, the compiler has to rename all those foo's to something globally unique. That will probably break some debugging functionality, but hey, the user asked for unique names, so they got them.

Regarding the Microsoft mangling scheme, it is possible that these names will not demangle. Thinking about it more now, in the MS name mangling scheme, clang already tries to generate globally unique names for C++ things with internal linkage (lambdas, anonymous namespaces, etc). It is possible to create entities with non-unique names with unnamed tag types and C static functions, but for the debugger's sake, the mangling scheme makes a best effort to generate globally unique names. This seems like a good reason to actually move this functionality back to the frontend mangler, since only the frontend knows which internal linkage functions already have globally unique IDs embedded in them.

Anyway, I apologize for any confusing or contradictory feedback I may have given. I'm trying to unblock folks, but this doesn't have my full attention.

Herald added a project: Restricted Project. · View Herald TranscriptJan 11 2021, 11:10 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript