This is an archive of the discontinued LLVM Phabricator instance.

[Mangle] Add flag to asm labels to disable '\01' prefixing
ClosedPublic

Authored by vsk on Sep 19 2019, 12:57 PM.

Details

Summary

LLDB synthesizes decls using asm labels. These decls cannot have a mangle
different than the one specified in the label name. I.e., the '\01' prefix
should not be added.

Fixes an expression evaluation failure in lldb's TestVirtual.py on iOS.

rdar://45827323

Diff Detail

Event Timeline

vsk created this revision.Sep 19 2019, 12:57 PM
rjmccall added inline comments.Sep 19 2019, 1:30 PM
clang/lib/AST/Mangle.cpp
140

This is one of those bugs where looking at the code really reveals a lot of problematic behavior.

AsmLabelAttr is generally assumed to carry a literal label, i.e. it should not have the global prefix added to it. We should not be changing that assumption globally just based on whether an ExternalASTSource has been set up; that's going to break in any sort of non-uniform situation, e.g. if an LLDB user writes a declaration with an asm label, or if we import a Clang module where a declaration has an asm label. Either ExternalASTSources should be putting the real (prefixed) symbol name in the AsmLabelAttr, or they should be making AsmLabelAttrs that are different somehow, e.g. by giving AsmLabelAttr a flag (which doesn't need to be spellable in the source language) saying whether the label is literal or subject to global prefixing.

Separately, it seems to me that if the AsmLabelAttr is literal but the label happens to include the global prefix, we should strip the prefix from the label and not add a \01 so that we get a consistent symbol name in LLVM.

teemperor requested changes to this revision.Sep 19 2019, 1:46 PM

I wonder what's the motivation for making this a setting in the ExternalASTSource? Opposed to e.g. storing a bit in AsmLabelAttr, which we could squeeze in by maybe stealing a bit from labelLength? Having this as a global setting in the ExternalASTSource seems like it could end up in a lot of confusion when we refactor our ExternalASTSources and this unexpectedly breaks. Especially since multiplexing ExternalASTSources is a thing people do (including LLDB itself), so if one source wants this feature on and the other doesn't, then we are in a dead end.

Also I wonder what happens with our real declarations we get from Clang modules (ObjC for now and C++ in the future). They now also share this setting even though they could use asm labels for legitimate reasons (even though I'm not sure I ever saw one being used in practice).

One minor bug I found: You need to turn this on in SemaSourceWithPriorities and maybe also ExternalASTSourceWrapper (see ASTUtils.h) otherwise this seems to regress when using C++ modules and mangling stuff for an expression (I couldn't use ClangExternalASTSourceCommon there as we needed a clang::SemaSource).

This revision now requires changes to proceed.Sep 19 2019, 1:46 PM
vsk updated this revision to Diff 221066.Sep 20 2019, 10:57 AM
vsk retitled this revision from [Mangle] Check ExternalASTSource before adding prefix to asm label names to [Mangle] Add flag to asm labels to disable global prefixing.
vsk edited the summary of this revision. (Show Details)

Thanks for your feedback. I've added a flag to asm labels to disable global prefixing. I've tried to minimize the behavior change in this patch -- it seems to me that additional cleanups can happen in follow-ups.

rjmccall added inline comments.Sep 20 2019, 3:59 PM
clang/include/clang/Basic/Attr.td
725–733

Does this actually change how it can be written in source? I think it might be a good idea to do that in the long run, but let's not in this patch.

Also something really needs to document what this is actually for.

clang/lib/AST/Mangle.cpp
140

Please check for whether the label is literal first before pulling out the global prefix.

clang/lib/Sema/SemaDecl.cpp
2770

This is definitely not a safe assumption; for one, we might eventually allow this to be spelled in source, and for another, you can redeclare something from the global context in LLDB and therefore get this kind of mismatch. Anyway, I don't think you need the assumption here; just add a method to AsmLabelAttr that checks for equality in a way that compensates for the inclusion/suppression of the global prefix.

vsk updated this revision to Diff 221391.Sep 23 2019, 12:43 PM
vsk marked 3 inline comments as done.
  • Address latest review feedback.
rjmccall added inline comments.Sep 23 2019, 1:51 PM
clang/include/clang/Basic/Attr.td
734

LiteralLabel is an unfortunate name for this property because getLiteralLabel sounds like it ought to return a string. How about IsLiteralLabel?

Also there's a "fake" flag to make it clear that this argument isn't (currently) written in source, so this should be BoolArgument<"IsLiteralLabel", /*optional*/ 0, /*fake*/ 1>.

Comment suggestion:

IsLiteralLabel specifies whether the label is literal (i.e. suppresses the global C symbol prefix) or not.
738

isEquivalent, please.

clang/include/clang/Basic/AttrDocs.td
2579

Oh, thank you for adding documentation. I think the prefixing thing probably ought to be explained here; here's a suggestion for some wording:

This attribute can be used on a function or variable to specify its symbol name.

On some targets, all C symbols are prefixed by default with a single character, typically ``_``.  This was done historically to distinguish them from symbols used by other languages.  (This prefix is also added to the standard Itanium C++ ABI prefix on "mangled" symbol names, so that e.g. on such targets the true symbol name for a C++ variable declared as ``int cppvar;`` would be ``__Z6cppvar``; note the two underscores.)  This prefix is *not* added to the symbol names specified by the ``asm`` attribute; programmers wishing to match a C symbol name must compensate for this.

For example, consider the following C code:

<example>

Clang's implementation of this attribute is compatible with GCC's, `documented here <https://gcc.gnu.org/onlinedocs/gcc/Asm-Labels.html>`_.

While it is possible to use this attribute to name a special symbol used internally by the compiler, such as an LLVM intrinsic, this is neither recommended nor supported and may cause the compiler to crash or miscompile.  Users who wish to gain access to intrinsic behavior are strongly encouraged to request new builtin functions.
clang/lib/AST/Mangle.cpp
130

This is actually backwards, right? A literal label is one that doesn't get the global prefix and therefore potentially needs the \01 prefix to suppress it.

vsk added inline comments.Sep 24 2019, 3:55 PM
clang/lib/AST/Mangle.cpp
130

Thank you for the careful and helpful review! Before updating this patch, I'd like to make sure we're on the same page on this point.

What I'm searching for is a way to prevent mangleName from adding either a \01 prefix or the global prefix to the label contained within an AsmLabelAttr. As the result from mangleName is passed to a hash function, omitting both prefixes is necessary to fix the lldb expression evaluation failure mentioned in the summary. Based on (probably a bad reading of) earlier feedback, I thought that marking an attr as "Literal" would be the preferred method to suppress all prefixes.

However, it sounds like you believe:

  1. An attr marked as "Literal" should have a \01 prefix applied, but no global prefix.
  2. An attr *not* marked as "Literal" should have a global prefix applied (if one is available).

If I've characterized your view correctly, then we really need something else to denote "do not add any kind of prefix". One option is to use BoolArgument<"IsUnprefixed", /*optional=*/0, /*fake=*/1>, and set the literal/non-literal distinction aside.

OTOH, if what you really meant was that mangleName should omit all prefixes for non-"Literal" labels (and that "Literal" is the right name/distinction for this idea), I can simply invert the condition I've been using.

rjmccall added inline comments.Sep 24 2019, 5:09 PM
clang/lib/AST/Mangle.cpp
130

I'm just asking you to invert the condition. At the user level, a "literal" label is one that should be taken "literally", as in, it shouldn't have the global prefix. The \01 prefix is not part of the user-level semantics of asm labels; it's just part of the representation of symbols in LLVM IR, and mangleName should add it or not depending on what gets the right behavior from LLVM. Ultimately, that will include canonicalizing labels so that an asm label that starts with the global prefix (e.g. asm("_foo")) will just be translated appropriately (to @foo rather than @\01_foo), but you've reasonably asked not to take that on in this patch, so all you need to do is make sure that we're only adding \01 when the label is literal and there actually is a global prefix.

vsk updated this revision to Diff 221645.Sep 24 2019, 6:00 PM
vsk retitled this revision from [Mangle] Add flag to asm labels to disable global prefixing to [Mangle] Add flag to asm labels to disable '\01' prefixing.
vsk edited the summary of this revision. (Show Details)
  • Address latest round of comments.

Thanks. One last and minor request: please mention in the comment on IsLiteralLabel that non-literal labels are used by some external AST sources like LLDB.

teemperor resigned from this revision.Sep 25 2019, 2:08 AM

This LGTM modulo rjmccall's comment.

vsk updated this revision to Diff 221801.Sep 25 2019, 10:13 AM
  • Add a comment describing where non-literal labels are used.
rjmccall accepted this revision.Sep 25 2019, 10:28 AM

Thanks, LGTM.

This revision is now accepted and ready to land.Sep 25 2019, 10:28 AM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptSep 25 2019, 11:00 AM
aaron.ballman added inline comments.
cfe/trunk/lib/AST/Mangle.cpp
127 ↗(On Diff #221805)

Sorry to dredge up an old review, but I recently ran into a bug in this area and am not certain of how to fix it. What should happen if the asm label is a literal which is empty? e.g.,

void foo(void) __asm__("");
void foo(void) {}

Currently, that seems to get through this code path and stream an empty string into the output stream. However, writeFuncOrVarName() doesn't check that it actually wrote anything into the stream when calling mangleName() and returns success to the caller. This causes writeName() to eventually call llvm::Mangler::getNameWithPrefix(), which asserts that the passed StringRef is not an empty string and we crash. The typical LLVM code path seems to treat unnamed symbols as being mangled with __unnamed_N (where N is replaced), but this relies on having a GlobalValue* available to get the next mangling ID.

I'm not certain of where to fix the bug. Should the LLVM side be able to accept an empty string and try to gin up a GlobalValue* to mangle it properly? Should the FE side not generate an empty name here? Should empty asm labels be rejected at the source level?

Thanks for any insights you can provide!