Page MenuHomePhabricator

Refactor implementation of -funique-internal-linkage-names.
ClosedPublic

Authored by tmsriram on Feb 4 2021, 11:11 PM.

Details

Summary

The option -funique-internal-linkage-names was added in D73307 and D78243 as a LLVM early pass to insert a unique suffix to internal linkage functions and vars. The unique suffix was the hash of the module path. However, we found that this can be done more cleanly in clang early and the fixes that need to be done later can be completely avoided. The fixes in particular are trying to modify the DW_AT_linkage_name and finding the right place to insert the pass.

This patch ressurects the original implementation proposed in D73307 which was reviewed and then ditched in favor of the pass based approach.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
tmsriram requested review of this revision.Feb 4 2021, 11:11 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 4 2021, 11:11 PM
hoy added a comment.Feb 5 2021, 9:55 AM

Thanks for moving this feature into the front end. It looks quite clean.

clang/lib/CodeGen/CGCall.cpp
2158

Can a test be made for this?

clang/lib/CodeGen/CodeGenModule.cpp
205

Will this have a unique dwarf debug name automatically generated? I'm wondering if we should make a test for this.

(might be simpler to add the new functionality to Clang, having Clang opt out of the old functionality (to avoid both unique prefixes being added together, I guess?) and then separately remove the LLVM functionality in a follow-up patch, once it's unused by Clang.

clang/lib/CodeGen/CodeGenModule.cpp
202

Can remove some syntax here:

wmi added a comment.Feb 6 2021, 11:04 AM

It is a good move. I see that in old pass manager, the initialization phases of all passes run before the body of any pass starts, so for many passes, they will see different names for the same symbol in initialization phase and main phase. SampleProfileLoaderPass is affected for example. It is less of a problem for new pass manager. However, the requirement of putting UniqueInternalLinkageNamesPass very early in the pipeline is still a very strong requirement. It is possible for other pass to have the same requirement and to cause conflict. So like to see the refactoring and thanks for the improvement!

rnk added inline comments.Feb 8 2021, 12:26 PM
clang/lib/CodeGen/CodeGenModule.cpp
1281

This raises the original concern that I had that pushed me in the direction of doing things as a pass: How does this interact with multi-version functions? It probably just works, but I remember this was something that made me think an IR pass would be simpler, or more likely to do the right thing.

Something else to think about: clang has other ways to generate internal functions with non-unique names. Dynamic initialization comes to mind (__cxx_global_var_init). Consider:

template <typename T>
struct Foo { static int gv; };
int f(int);
template <typename T>
int Foo<T>::gv = f(sizeof(T));
void escape(int*);
void instantiate() {
  escape(&Foo<int>::gv);
  escape(&Foo<short>::gv);
}

The IR pass was renaming these for us previously.

tmsriram added inline comments.Feb 8 2021, 1:12 PM
clang/lib/CodeGen/CodeGenModule.cpp
1281

I had a test for multi versioned functions, clang/test/CodeGen/unique-internal-linkage-names.cpp. Here is the gist though. foo() with multi-version clones for sse4.2 and avx will produce the following function symbols:

  1. _Z3foov.__uniq.1234
  2. _Z3foov.sse4.2.__uniq.1234
  3. _Z3foov.avx.__uniq.1234
  4. _Z3foov.__uniq.1234.resolver

Multi-versioned suffixes are not demangler friendly in general as such, even without this patch. For instance, _Z3foov.sse4.2 will not demangle, Is there a specific concern?

rnk added inline comments.Feb 8 2021, 4:13 PM
clang/lib/CodeGen/CodeGenModule.cpp
1281

Mainly, I'd just like to keep that test coverage. The logic moved to the frontend, so now that should become a frontend test.

Just from reading the patch, I'm guessing the names now become _Z3foov.__uniq.1234.avx/sse/etc, but I could be wrong. It would be good to encode our expectations one way or another.

tmsriram added inline comments.Feb 8 2021, 5:43 PM
clang/lib/CodeGen/CodeGenModule.cpp
1281

I like the module hash coming before the multi-version target suffix too like you pointed out. I just need to re-order the code in getMangledNameImpl. I will do that.

tmsriram updated this revision to Diff 323179.Feb 11 2021, 3:53 PM
tmsriram marked 4 inline comments as done.

This patch update handles the following:

  1. For C internal linkage functions which do not mangle implicitly, force mangling when -funique-internal-linkage-names is present before adding the suffix. This ensures demanglers work correctly on the linkage name as required.
  2. The mangler works on a decl and does not have much external context. Add an implicit attribute to the decl to indicate to the mangler that a function name must be mangled.
  3. Add tests for DW_AT_linkage_name for C and C++ functions.
  4. Check for "sample-profile-elision-policy" attribute.

This patch addresses all the short comings pointed out in D73307 where the linkage names had to be updated explicitly and C functions could not be handled effectively.

tmsriram marked an inline comment as done.Feb 11 2021, 3:55 PM
aaron.ballman added inline comments.Feb 12 2021, 6:25 AM
clang/include/clang/Basic/Attr.td
2159 ↗(On Diff #323179)

Should this attribute be inherited by redeclarations?

2165 ↗(On Diff #323179)
tmsriram added inline comments.Feb 12 2021, 12:03 PM
clang/include/clang/Basic/Attr.td
2159 ↗(On Diff #323179)

The attribute is added in EmitTopLevelDecl in CodeGenModule which happens after ActOnDeclarator in SemaDecl.cpp. I dont think this needs to be inherited but I am not sure. Is there an example I can use to check if this is fine?

aaron.ballman added inline comments.Feb 18 2021, 5:55 AM
clang/include/clang/Basic/Attr.td
2159 ↗(On Diff #323179)

I was thinking of something along these lines, where the initial declaration is explicitly marked static and the definition is not:

static int foo(void);
int foo(void) { return 12; }

int main() {
  return foo();
}

However, I realize now that it shouldn't matter because the attribute isn't attached until codegen time. FWIW, this means the attribute won't appear in the AST, which means things like clang-tidy or third-parties that consume the AST won't have access to this information.

clang/lib/CodeGen/CodeGenModule.cpp
5629

It's a bit novel to add the attribute at codegen time rather than at sema -- is there a reason this attribute should not be added to the AST from within Sema?

dblaikie added inline comments.Feb 18 2021, 10:48 AM
clang/lib/CodeGen/CodeGenModule.cpp
5629

How would it be if this didn't use an attribute at all? If this checking was done in ItaniumMangle directly, rather than checking done here, transforming the result of the checking into an attribute, and then checking the attribute later?

tmsriram added inline comments.Feb 18 2021, 11:06 AM
clang/lib/CodeGen/CodeGenModule.cpp
5629

@dblaikie That is what I tried to do first. I didnt find an easy way to do this without either storing some state in the Mangler about every such decl or adding something to the decl itself. I guess the Mangler needs to know 3 things:

  1. That funique-internal-linakge-names is enabled - maybe I can add a bool
  2. That the language is not C++ - This is already possible
  3. That the decl has internal linkage - this may require adding more state to the mangler

I think it is 3) that would require some plumbing unless I missed something already existing.

tmsriram added inline comments.Feb 18 2021, 11:22 AM
clang/lib/CodeGen/CodeGenModule.cpp
5629

I will revisit this, at a high level I think the mangler should know about the linkage of the decl as it does "_ZL" for this. I think I missed it.

aaron.ballman added inline comments.Feb 18 2021, 11:34 AM
clang/lib/CodeGen/CodeGenModule.cpp
5629

How would it be if this didn't use an attribute at all? If this checking was done in ItaniumMangle directly, rather than checking done here, transforming the result of the checking into an attribute, and then checking the attribute later?

I think that would have the same oddity, wouldn't it? However, I didn't do a good job of explaining my concern, which may not actually be a valid concern: would someone ever want to use this information from, say, an AST matcher for a clang-tidy check? Or is this purely a way to plumb information through the compiler without adding extra bits on a FunctionDecl? My concern was mostly that someone may want to do an AST match on this and they won't be able to because the attribute is added too late in the process.

We currently use 40 bits within FunctionDeclBitfields, so adding a single bit there does not seem like it would be the end of the world (but I didn't check thoroughly for issues with adding another bit there), so that's an option for squirreling the information away without something as heavy as an attribute.

tmsriram added inline comments.Feb 18 2021, 12:39 PM
clang/include/clang/Basic/Attr.td
2159 ↗(On Diff #323179)

Ack. I will make sure I check for this case and also see if I can avoid using the attribute in some manner.

dblaikie added inline comments.Feb 18 2021, 1:23 PM
clang/lib/CodeGen/CodeGenModule.cpp
5629

the mangler has the clang::Function, so I'd have thought it could figure out the linkage from that - but I realize the linkage calculations are a bit more complicated/probably need some other sources

tmsriram updated this revision to Diff 324886.Feb 18 2021, 10:59 PM

Removed the atttribute and directly added logic in ItaniumMangle.cpp to check if decl is internal linkage as per @dblaikie 's suggestion.

ItaniumMangle.cpp generates "_ZL" mangling for internal linkage symbols for C functions. Reused the same logic to check if a decl is internal linkage and needs a unique name. Added one extra bit to ItaniumMangle.cpp to record that -funique-internal-linkage-names is used.

tmsriram updated this revision to Diff 324895.Feb 18 2021, 11:26 PM

Fix a bug.

dblaikie added inline comments.Feb 19 2021, 12:32 PM
clang/lib/AST/Mangle.cpp
119–124

Strikes me as strange to test the language mode here (presumably this logic would apply equally to C++ code, right? If at most it'd be a no-op because things are already mangled)

But, for instance - what about functions with extern "C" linkage in a C++ file?

clang/lib/CodeGen/CodeGenModule.cpp
205

Does this need the double underscore? I thought other suffixes here generally don't have that feature?

206

Might be worth a parameter name comment explaining what the 'false' parameter means? (I guess the 10 means the numeric base to use)

1184–1187

The variety of things being tested here seems non-trivial, and I've no idea how to validate it's correct. Any chance this is existing logic that could be reused from somewhere else/refactored into a common function to call from some existing use case and this new one? (similar thougts on ItaniumDemangle.cpp:isInternalLinkageDecl)

tmsriram added inline comments.Feb 19 2021, 5:20 PM
clang/lib/AST/Mangle.cpp
119–124

Internal linkage functions under extern "C" are already mangled even without this patch. I was surprised too. Here is the example:

extern "C" {
  static int foo(void) {
   return 0; 
  }
  static int l
  int bar() {
    return foo() + l; 
}
}
$ clang++ -c foo.cc
$ nm foo.o
T bar
t _ZL3foov
b _ZL1l
clang/lib/CodeGen/CodeGenModule.cpp
205

This was discussed in D89617 to have two underscores before uniq as the convention when I first added this identifier. Not sure if you want to revisit that.

1184–1187

So the Itanium's internal linakge decl check is exactly the logic that is used to determine when to add the "_ZL" to internal linkage names. I also refactored the original code to share the logic. For CodeGenModule.cpp, I am not sure what to use as C++ has other cases where internal linkage could be used, like anonymous namespaces maybe. I will try to dig a little more here.

dblaikie added inline comments.Feb 20 2021, 11:09 AM
clang/lib/AST/Mangle.cpp
119–124

ah, right, my bad.

clang/lib/CodeGen/CodeGenModule.cpp
205

Ah, thanks for the pointer! Might be worth documenting in a comment (if this is the only place that string appears)

Do you have a rough sense/list of the suffixes used by LLVM or other implementations at the moment (I see some code in LLVM that looks for ".part." and ".llvm." at least) - I don't know that "__" really provides much. The double underscore rule is to differentiate user symbols from C++ symbols in the C++ standard, but doesn't do much/anything to protect one implementation from another implementation, for instance.

1184–1187

Thanks!

MaskRay edited reviewers, added: rsmith; removed: HAPPY.Feb 21 2021, 4:35 PM
MaskRay added inline comments.
clang/test/CodeGen/unique-internal-linkage-names-dwarf.c
2

The test can be converted to IR test (-emit-llvm) for better separation of concerns.

Then llvm-nm & llvm-dwarfdump can be removed here.

tmsriram added inline comments.Feb 22 2021, 12:33 PM
clang/test/CodeGen/unique-internal-linkage-names-dwarf.c
2

I tried that but DW_AT_linkage_name is not emitted in the IR directly. This needs to be directly checked, I can check in assembly alternately. WDYT?

tmsriram added inline comments.Feb 22 2021, 12:35 PM
clang/test/CodeGen/unique-internal-linkage-names-dwarf.c
2

Oh darn it is generated as metadata, my bad. Please ignore my previous comment.

tmsriram updated this revision to Diff 325650.Feb 22 2021, 7:31 PM
tmsriram marked 12 inline comments as done.

Address reviewer comments:

  • Add all possible cases where internal linkage symbols can be created with C++
  • Convert tests to emit LLVM IR
dblaikie accepted this revision.Feb 23 2021, 5:18 PM

Looks good to me - maybe give it a week or so in case other reviewers have some unresolved concerns.

This revision is now accepted and ready to land.Feb 23 2021, 5:18 PM

D8243

Did you link to the wrong revision?

clang/test/CodeGen/unique-internal-linkage-names-dwarf.cpp
5

excess spaces after -o -

clang/test/CodeGen/unique-internal-linkage-names.cpp
62

If you want to test a function attribute

// UNIQUE: ... @function() [[#ATTR:]]
// UNIQUE: attributes [[#ATTR]] = {{{.*}}"sample-profile...}}

The current test does not test which definition this attribute is attached to.

tmsriram updated this revision to Diff 325963.Feb 23 2021, 7:11 PM
tmsriram edited the summary of this revision. (Show Details)
tmsriram marked 2 inline comments as done.

Address reviewer comments:

  • Fix attribute checking to check the right function
  • Correct typos

Looks good to me - maybe give it a week or so in case other reviewers have some unresolved concerns.

Thanks! Regarding double-underscore for prefixes, there is also "part" which is used by function parts created via basic block sections. I dont have a strong opinion on the usage of "" and I can reopen the discussion if needed.

MaskRay accepted this revision.Feb 23 2021, 7:37 PM
MaskRay added inline comments.
clang/test/CodeGen/unique-internal-linkage-names.cpp
57
tmsriram updated this revision to Diff 325974.Feb 23 2021, 9:09 PM
tmsriram marked an inline comment as done.

Address reviewer comment to use numeric substitution block.

hoy accepted this revision.Mar 5 2021, 10:14 AM

LGTM. It addresses the C scenario smartly. Thanks.

This revision was landed with ongoing or failed builds.Mar 5 2021, 1:34 PM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptMar 5 2021, 1:34 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript