This is an archive of the discontinued LLVM Phabricator instance.

Solution to fix PR27066 - Redefinition with same mangled name as another definition (dllexport and uuid)
Needs ReviewPublic

Authored by zahiraam on Feb 21 2018, 7:53 AM.

Details

Summary

In this (partial) test case:
...
struct declspec(dllexport) C1 : public A <&uuidof(S1)>
{};
struct declspec(dllexport) C2 : public A<&uuidof(S2)>
{};
...

clang should be generating a single instantiation of A. It is instead generating two of them. This happens only in the presence of uuid.

Compiling this with clang this test case (dllexport with no uuid):
template <const char* s> class A {};
char s = 'a';
struct declspec(dllexport) C1 : public A <&s>
{};
struct
declspec(dllexport) C2 : public A<&s>
{};

has the same(correct) behavior than MSVC:

ksh-3.2$ dumpbin /symbols t3.o | grep "class A"
008 00000000 SECT4 notype () External | ??4?$A@$1?s@@3DA@@QEAAAEAV0@AEBV0@@Z (public: class A<&char s> & cdecl A<&char s>::operator=(class A<&char s> const &))
00D 00000000 SECT5 notype () External | ??4?$A@$1?s@@3DA@@QEAAAEAV0@$$QEAV0@@Z (public: class A<&char s> &
cdecl A<&char s>::operator=(class A<&char s> &&))
ksh-3.2$

The solution proposed in this patch is the generation a new template argument for uuid.

Let me know if you agree.

Diff Detail

Event Timeline

zahiraam created this revision.Feb 21 2018, 7:53 AM

We should really, really avoid making this sort of change without first trying to desugar uuidof into a reference to a variable. That would solve a ton of problems, problems like this one.

It seems to me that the test here is very much lacking. It doesn't seem to include the example you've mentioned, and has no validation to ensure that there is a single instantiation happening. I'd like to see what happens when a UUID is passed as a pack, how SFINAE works on it, etc.

include/clang/AST/RecursiveASTVisitor.h
847

No need for curly brackets.

888

Curly brackets likely not necessary here.

include/clang/AST/TemplateBase.h
214 ↗(On Diff #135250)

This function overload should be documented.

lib/AST/ASTContext.cpp
5066 ↗(On Diff #135250)

Not only for you here, but is there any reason why this cannot just be a fallthrough for the ones above? I suspect we'd prefer to get those 3 all combined.

lib/AST/MicrosoftMangle.cpp
1426 ↗(On Diff #135250)

If you combine the getAsUuidExpr line and the mangleExpession line, you can get rid of curlies, and be more consistent with the surrounding code.

lib/AST/TemplateBase.cpp
581 ↗(On Diff #135250)

Why is much of this required? Wouldn't just calling printPretty with the current policy work? Why use a separate stream rather than the 'DB' stream?

lib/Sema/SemaTemplate.cpp
4629 ↗(On Diff #135250)

Is this section clang-formatted? It seems a little oddly newlined.

zahiraam updated this revision to Diff 135502.Feb 22 2018, 12:25 PM

Adding test case.

We should really, really avoid making this sort of change without first trying to desugar uuidof into a reference to a variable. That would solve a ton of problems, problems like this one.

Not sure I fully understand what you are proposing?

Are you proposing that generated symbols like this:
??4?$A@$E?_GUID_ddb47a6a_0f23_11d5_9109_00e0296b75d3@@3U__s_GUID@@B@@QEAAAEAV0@AEBV0@@Z
Be de-sugared? Wouldn't that be different that what MS is doing?
Can you please give me more details about what you are thinking of?
Thanks.

We should really, really avoid making this sort of change without first trying to desugar uuidof into a reference to a variable. That would solve a ton of problems, problems like this one.

Not sure I fully understand what you are proposing?

Are you proposing that generated symbols like this:
??4?$A@$E?_GUID_ddb47a6a_0f23_11d5_9109_00e0296b75d3@@3U__s_GUID@@B@@QEAAAEAV0@AEBV0@@Z
Be de-sugared? Wouldn't that be different that what MS is doing?
Can you please give me more details about what you are thinking of?
Thanks.

We create an AST such that the following:

struct _GUID;

template <const _GUID *>
class A {};

struct __declspec(uuid("{DDB47A6A-0F23-11D5-9109-00E0296B75D3}")) S {};

struct __declspec(dllexport) C : public A<&__uuidof(S)> {};

is turned into something like:

struct _GUID;

__declspec(selectany) __s_GUID const _GUID_ddb47a6a_0f23_11d5_9109_00e0296b75d3 = {...};

template <const _GUID *>
class A {};

struct __declspec(uuid("{DDB47A6A-0F23-11D5-9109-00E0296B75D3}")) S {};

struct __declspec(dllexport) C : public A<&_GUID_ddb47a6a_0f23_11d5_9109_00e0296b75d3> {};

Do we need to also track whether the argument is a pointer or reference to a UUID (and also the cv-qualifiers)? For the Declaration case, we track this by tracking the corresponding parameter type; the same thing would presumably work here.

We should really, really avoid making this sort of change without first trying to desugar uuidof into a reference to a variable. That would solve a ton of problems, problems like this one.

This desugaring approach is not how we generally do things in Clang. The fact that MS exposes a variable that can be named from user code is, in my opinion, simply a bug in their implementation -- their implementation details are leaking -- and not part of the actual semantics here. I view this as exactly analogous to typeid (which would have exactly the same problems if its result could be used as a non-type template parameter); as with typeid, __uuidof notionally produces a global object not corresponding to any variable. If we want to model this as a declaration, we could add a new Decl subclass for these uuid objects (and eventually also for objects produced by typeid). But I don't think we should model them as variables unless that's actually part of their intended semantics.

Do we need to also track whether the argument is a pointer or reference to a UUID (and also the cv-qualifiers)? For the Declaration case, we track this by tracking the corresponding parameter type; the same thing would presumably work here.

We should really, really avoid making this sort of change without first trying to desugar uuidof into a reference to a variable. That would solve a ton of problems, problems like this one.

This desugaring approach is not how we generally do things in Clang. The fact that MS exposes a variable that can be named from user code is, in my opinion, simply a bug in their implementation -- their implementation details are leaking -- and not part of the actual semantics here. I view this as exactly analogous to typeid (which would have exactly the same problems if its result could be used as a non-type template parameter); as with typeid, __uuidof notionally produces a global object not corresponding to any variable. If we want to model this as a declaration, we could add a new Decl subclass for these uuid objects (and eventually also for objects produced by typeid). But I don't think we should model them as variables unless that's actually part of their intended semantics.

Here's my thinking: the __uuidof expression literally declares a variable called _GUID_ddb47a6a_0f23_11d5_9109_00e0296b75d3 of type __s_GUID which is why it behaves the way it does: https://godbolt.org/g/74FY7U

I don't think it is reasonable to invent new semantics which are different from the MSVC ones because we find the MSVC ones inelegant. The huge upside I see from matching their behavior is that the implementation is dramatically simpler, we have a considerable amount of mumbo jumbo in the template instantiation code to handle __uuidof correctly and it looks like we will need some more.

What is the relative upside to a new kind of Decl? Better AST fidelity?

Here's my thinking: the __uuidof expression literally declares a variable called _GUID_ddb47a6a_0f23_11d5_9109_00e0296b75d3 of type __s_GUID which is why it behaves the way it does: https://godbolt.org/g/74FY7U

This is an implementation detail leaking, though. no? Note that that is a reserved name.

I don't think it is reasonable to invent new semantics which are different from the MSVC ones because we find the MSVC ones inelegant.

I mostly agree, but my point is that this is *not* the MSVC semantics, it's merely an implementation detail that non-conforming code happens to be able to observe. Suppose that type_info objects were similarly accessible in MSVC by guessing their mangled names. Would you be arguing that we should inject variables for those too? (And note that it is *nearly* true that type_info objects work that way: https://godbolt.org/g/zByFFg -- but the parser gets confused somehow when you reference them.) The only difference I can see between these cases is that the reserved name used for the GUID case happens to not contain any ?s and @s, so happens to be utterable as an identifier.

We should not attempt to be compatible with the cases where MSVC's implementation details happen to leak into user-visible semantics.

What is the relative upside to a new kind of Decl? Better AST fidelity?

Yes, exactly. The same reason we don't desguar other things any more than we have to.

Here's my thinking: the __uuidof expression literally declares a variable called _GUID_ddb47a6a_0f23_11d5_9109_00e0296b75d3 of type __s_GUID which is why it behaves the way it does: https://godbolt.org/g/74FY7U

This is an implementation detail leaking, though. no? Note that that is a reserved name.

I don't think it is reasonable to invent new semantics which are different from the MSVC ones because we find the MSVC ones inelegant.

I mostly agree, but my point is that this is *not* the MSVC semantics, it's merely an implementation detail that non-conforming code happens to be able to observe. Suppose that type_info objects were similarly accessible in MSVC by guessing their mangled names. Would you be arguing that we should inject variables for those too? (And note that it is *nearly* true that type_info objects work that way: https://godbolt.org/g/zByFFg -- but the parser gets confused somehow when you reference them.) The only difference I can see between these cases is that the reserved name used for the GUID case happens to not contain any ?s and @s, so happens to be utterable as an identifier.

We should not attempt to be compatible with the cases where MSVC's implementation details happen to leak into user-visible semantics.

What is the relative upside to a new kind of Decl? Better AST fidelity?

Yes, exactly. The same reason we don't desguar other things any more than we have to.

Before I start implementing this I want to make sure that we are on the same page on this.
From your comments it looks like we do not want to create global variables for the uuid globals. I tend to agree with that. I do like the idea of creating a new Decl type for the uuid declspec. But I want to make sure that I understand what you are referring to.
Currently this declaration:
struct
__declspec(uuid("{DDB47A6A-0F23-11D5-9109-00E0296B75D3}"))
S1;

a CXXRecordDecl type is generated with attributes (the uuid being an attribute). Are you proposing that instead a new type is generated for S1 say something like CXXUuidRecord? And create also a new TagType that corresponds to this new Decl?

Currently this declaration:
struct
__declspec(uuid("{DDB47A6A-0F23-11D5-9109-00E0296B75D3}"))
S1;

a CXXRecordDecl type is generated with attributes (the uuid being an attribute). Are you proposing that instead a new type is generated for S1 say something like CXXUuidRecord? And create also a new TagType that corresponds to this new Decl?

No. Concretely, I'd suggest we create a new UuidDecl object representing the _GUID object. An instance of UuidDecl would be created and owned by the uuid attribute, and __uuidof(X) would denote the UuidDecl owned by the uuid attribute on the CXXRecordDecl. We'd also need some way to form redeclaration chains for UuidDecls (which means we'll need a map from UUID to UuidDecl somewhere, perhaps on the ASTContext).

We should really, really avoid making this sort of change without first trying to desugar uuidof into a reference to a variable. That would solve a ton of problems, problems like this one.

Not sure I fully understand what you are proposing?

Are you proposing that generated symbols like this:
??4?$A@$E?_GUID_ddb47a6a_0f23_11d5_9109_00e0296b75d3@@3U__s_GUID@@B@@QEAAAEAV0@AEBV0@@Z
Be de-sugared? Wouldn't that be different that what MS is doing?
Can you please give me more details about what you are thinking of?
Thanks.

Currently this declaration:
struct
__declspec(uuid("{DDB47A6A-0F23-11D5-9109-00E0296B75D3}"))
S1;

a CXXRecordDecl type is generated with attributes (the uuid being an attribute). Are you proposing that instead a new type is generated for S1 say something like CXXUuidRecord? And create also a new TagType that corresponds to this new Decl?

No. Concretely, I'd suggest we create a new UuidDecl object representing the _GUID object. An instance of UuidDecl would be created and owned by the uuid attribute, and __uuidof(X) would denote the UuidDecl owned by the uuid attribute on the CXXRecordDecl. We'd also need some way to form redeclaration chains for UuidDecls (which means we'll need a map from UUID to UuidDecl somewhere, perhaps on the ASTContext).

Currently this declaration:
struct
__declspec(uuid("{DDB47A6A-0F23-11D5-9109-00E0296B75D3}"))
S1;

a CXXRecordDecl type is generated with attributes (the uuid being an attribute). Are you proposing that instead a new type is generated for S1 say something like CXXUuidRecord? And create also a new TagType that corresponds to this new Decl?

No. Concretely, I'd suggest we create a new UuidDecl object representing the _GUID object. An instance of UuidDecl would be created and owned by the uuid attribute, and __uuidof(X) would denote the UuidDecl owned by the uuid attribute on the CXXRecordDecl. We'd also need some way to form redeclaration chains for UuidDecls (which means we'll need a map from UUID to UuidDecl somewhere, perhaps on the ASTContext).

As per Richard comment. I have added a UuidDecl owned by the uuid attribute. This patch concerns only this change. It is not a patch to fix the bug yet. I want to make sure that we are in agreement with my changes first. Then I will change the code that deal with ParseCXUuid.
Comments are welcome.
Thanks.

zahiraam updated this revision to Diff 142516.Apr 14 2018, 8:25 AM
rsmith added inline comments.Apr 14 2018, 8:38 AM
include/clang/Sema/AttributeList.h
601 ↗(On Diff #142516)

You're not storing a DeclSpecUuidDecl within the attribute, so you don't need to reserve enough space for one here. In fact, this constant appears to be unused, so you can just remove it.

lib/AST/Decl.cpp
4813

No space after *, please.

4816

{ on the previous line, please.

lib/AST/DeclCXX.cpp
1796–1797

It would be nice to have a convenience accessor on UuidAttr to get the UUID as a string. (You can add one using AdditionalMembers in Attr.td.)

lib/Parse/ParseDecl.cpp
568–583 ↗(On Diff #142516)

The Parser should not create Decls (that's a layering violation). There are two ways to handle this:

  1. Add a Sema method to act on a UUID attribute string, and return a pointer that you can stash in the Attrs list.
  2. Just store a string in the Attrs list, and move the code to convert the string into a DeclSpecUuidDecl* and store that on the UuidAttr into Sema.

I'd strongly prefer the second option; from the point of view of the parser, we can still just treat this as an attribute that takes a string, and we can do the string -> DeclSpecUuidDecl handling entirely within the semantic analysis for the attribute (in Sema).

572 ↗(On Diff #142516)

What should happen if multiple declarations (of distinct entities) use the same __declspec(uuid(...)) attribute? Should you get a redefinition error for the attribute or should they all share the same UUID entity? Either way, we'll want to do some (minimal) UUID lookup and build a redeclaration chain for DeclSpecUuidDecls.

lib/Sema/SemaDeclAttr.cpp
5406–5407

Do we still diagnose UUID mismatches somewhere else?

zahiraam marked 4 inline comments as done.Apr 15 2018, 9:05 AM
zahiraam added inline comments.
lib/Parse/ParseDecl.cpp
572 ↗(On Diff #142516)

If we want to align with MS, we don't want to signal an error. So may be I should have a map that assigns to each StrUuid a list of DeclSpecUuid?
So for this code:
struct
__declspec(uuid("{DDB47A6A-0F23-11D5-9109-00E0296B75D3}"))
S1 {};

struct
__declspec(uuid("{DDB47A6A-0F23-11D5-9109-00E0296B75D3}"))
S2 {};

I will have a map from DDB47A6A-0F23-11D5-9109-00E0296B75D3 to {S1, S2}. Do you agree?

lib/Sema/SemaDeclAttr.cpp
5406–5407

Not as far as I know. I guess I should put this diag somewhere?

zahiraam marked 3 inline comments as done.Apr 20 2018, 8:30 AM
zahiraam added inline comments.
lib/Sema/SemaDeclAttr.cpp
5406–5407

Fixed the initial code.

zahiraam updated this revision to Diff 143321.Apr 20 2018, 8:33 AM

Richard,
Please let me know if I have answered to all the issues you raised. Thanks.

zahiraam updated this revision to Diff 146642.May 14 2018, 10:50 AM
zahiraam updated this revision to Diff 191371.Mar 19 2019, 12:54 PM

It would be nice to have a review for this year old (updated) patch. Thanks.

rnk added a comment.Mar 19 2019, 2:47 PM

I see! I only took a quick look, but this looks exactly like what @rsmith has been asking for in discussions for a long time now: a more explicit AST representation of uuid of uuidof in template arguments. I'll make an effort to get his attention and see if this addresses his concerns.

Thanks, this looks like a good broad direction.

I think this patch does not goes far enough yet. Broad comments (partially duplicating some of the specific comments on the patch itself):

  • form the DeclSpecUuidDecl earlier, when parsing the attribute argument
  • store a pointer to the DeclSpecUuidDecl on a non-dependent CXXUuidofExpr
  • remove the maps in Sema, and instead ask attributes and uuidof expressions for their declarations
  • DeclSpecUuidDecl creation needs to either deduplicate declarations with the same UUID or form redeclaration chains for such declarations
  • we are missing at least serialization and deserialization code for the new declaration kind (and likely other things too: ASTImporter support, for instance)
  • we should be able to emit DeclSpecUuidDecls in CodeGen, replacing our current ad-hoc emission of UUID globals
include/clang/Sema/ParsedAttr.h
222

I don't really like adding new special kinds of argument representation in ParsedAttr, but... this really is a very special kind of argument, with rules different from anything else.

Have you considered forming the DeclSpecUuidDecl object earlier (when the argument is parsed) and storing it here? That'd be a closer match for how we handle other forms of attribute arguments (particularly expressions).

564

Copy-paste error in assert message

include/clang/Sema/Sema.h
408–409

Remove this? It is only used once, and only to initialize an unused variable.

413

Use an llvm::StringMap here? But I'm also not convinced this should exist at all.

414

This seems suspicious. If we want the uuid of a declaration, should we not be asking it for its UuidAttr instead?

lib/Parse/ParseDecl.cpp
588 ↗(On Diff #191371)

This code is dead: note that we returned on the line before.

589 ↗(On Diff #191371)

This comment does not describe what this code does.

594 ↗(On Diff #191371)

How does this handle the case where multiple tokens must be concatenated to form a uuid? Eg, uuid(12345678-9abc-def0-1234-56789abcdef0)

lib/Sema/SemaDeclAttr.cpp
5405

We should be comparing to see if the declarations are the same here (use declaresSameEntity on the UUID decls).

5434–5439

Sema should not be dealing with lexical issues like this. This should have been handled when parsing the attribute argument. (Also this is missing handling for other encoding prefixes and suffixes.)

lib/Sema/SemaExprCXX.cpp
629–635

I think this should be finding the UuidAttr of the declaration and asking it for its DeclSpecUuidDecl.

636

This variable is unused. (And it's not clear why we would assume that the most-recently-created DeclSpecUuidDecl would correspond to anything relevant here.)

637–640

We should store a pointer to the UUID declaration on a non-dependent CXXUuidofExpr.

lib/Sema/SemaTemplateInstantiateDecl.cpp
635–639

Just use llvm_unreachable here; DeclSpecUuidDecls are global and shouldn't be instantiated.

test/Parser/MicrosoftExtensions.cpp
79 ↗(On Diff #191371)

Why is this error now appearing on the wrong line?

test/Parser/ms-square-bracket-attributes.mm
22–23 ↗(On Diff #191371)

This is a diagnostic quality regression. The attribute does not require a string. But it does not accept a string with an encoding prefix.

utils/TableGen/ClangAttrEmitter.cpp
332

Nit: no need for parens here, and bad indentation.

zahiraam updated this revision to Diff 193258.Apr 2 2019, 5:18 AM
zahiraam marked 15 inline comments as done.

There are still a few things I haven't addressed yet. Mostly because not sure there is another solution like getting rid of the map from StringRef to expr. The other issue is not adding new kind to ParsedAttr. There may be another way of doing it, but didn't look at it yet.
Meanwhile can you please let me know if you are happy with the current status of the implementation.

lib/Parse/ParseDecl.cpp
594 ↗(On Diff #191371)

ksh-3.2$ cat m3.cpp
struct declspec (uuid(12345678-9abc-def0-1234-56789abcdef0)) s;
ksh-3.2$ clang -c m3.cpp
m3.cpp:1:35: error: invalid digit 'a' in decimal constant
struct
declspec (uuid(12345678-9abc-def0-1234-56789abcdef0)) s;

^

m3.cpp:1:39: error: use of undeclared identifier 'def0'
struct __declspec (uuid(12345678-9abc-def0-1234-56789abcdef0)) s;

^

m3.cpp:1:54: error: invalid digit 'a' in decimal constant
struct __declspec (uuid(12345678-9abc-def0-1234-56789abcdef0)) s;

^

3 errors generated.
ksh-3.2$

rsmith added inline comments.May 10 2019, 11:38 AM
include/clang/AST/Decl.h
4289

Can we store a StringLiteral* here instead?

include/clang/Sema/Sema.h
408–409

Comment marked done but not done?

414

Comment marked done but not done?

lib/Parse/ParseDecl.cpp
594 ↗(On Diff #191371)

This case is accepted by MSVC and not here:

struct __declspec(uuid(R"(12345678-9abc-def0-1234-56789abcdef0)")) s;

(MSVC also accepts a u8 prefix, but no other encoding prefix. I also note that MSVC does not support string literal concatenation here, so we don't need to deal with that.)

Please use StringLiteralParser to properly handle cases like this, rather than rolling your own string literal parsing here.

572 ↗(On Diff #142516)

I think you should have a map from UUID to either the most recent DeclSpecUuidDecl for that UUID, and form redeclaration chains for DeclSpecUuidDecls with the same UUID. (DeclSpecUuidDecl should derive from Redeclarable<DeclSpecUuidDecl> and you can call setPreviousDecl to connect the new declaration to the prior one.)

You'll need to make sure this case is properly handled during lazy AST deserialization.

lib/Sema/SemaExprCXX.cpp
633

Do we need both this and getUuidAttrOfType?

649

It would seem cleaner to collect the UuidAttrDecls or UuidAttrs here rather than the declarations that have those attributes.

659–665

Revert this change; we need to build a new CXXUuidofExpr with a correct source location and type rather than reusing one from somewhere else. Then remove UuidExpMap.

661–662

Dead code.

zahiraam marked 12 inline comments as done.May 20 2019, 5:55 AM

@rsmith I think I have made all the changes you have pointed out to. But please note that this new patch only impements an explicit AST representation of uuid in template arguments. It does not fix the bug for which this review was opened for.
I will take care of the bug in time. But before doing that I want to make sure that the changes required to give an explicit AST for uuid is correct.

Thanks for taking the time to review. And sorry my response is slow to come. This work is done in my "spare" time.

lib/Sema/SemaExprCXX.cpp
633

No.

637–640

Not sure what that means.

637–640

However the test case is still failing. Still need to find a solution for the fail.

zahiraam updated this revision to Diff 200258.May 20 2019, 5:56 AM
zahiraam marked 2 inline comments as done.

And this patch actually fixes the bug. Thanks.

zahiraam updated this revision to Diff 200484.May 21 2019, 6:21 AM

A review please :-) Thanks.

rsmith added inline comments.Jun 21 2019, 2:19 PM
include/clang/AST/Decl.h
4303

What does "STL" mean here?

include/clang/Sema/ParsedAttr.h
337–346

I don't think we need a custom constructor for this any more; a StringLiteral is an Expr, so can be stored as a regular argument. It's also suspicious that this constructor doesn't use its stluuid parameter -- maybe it's already unused?

822

(Likewise we shouldn't need this any more...)

1031

(... or this.)

include/clang/Sema/Sema.h
408

This map is not necessary (you can get the DeclSpecUuidDecl from the UuidAttr on the Decl) and not correct (it's not serialized and deserialized). Please remove it.

lib/CodeGen/CodeGenModule.cpp
1071–1073

Was this supposed to be included in this patch? It looks like this is papering over a bug elsewhere.

lib/Sema/SemaDeclAttr.cpp
5405–5408

This if should be:

if (declaresSameEntity(UA->getDeclSpecUuidDecl(), Uuid))

(Not a string comparison. We already combined UuidDecls with the same string into the same entity.)

5452–5460

You already have a suitable string literal as AL.getArgAsExpr(0). There's no need to fabricate another one here.

5471–5474

Please combine these lines.

5473–5474

You should check for a prior DeclSpecUuidDecl with the same UUID here and set it as the previous declaration of the new one, so that they're treated as declaring the same entity.

5476

(Remove this; with the other change I suggested above this should be unnecessary.)

zahiraam updated this revision to Diff 207882.Jul 3 2019, 1:39 PM
zahiraam marked 14 inline comments as done.

Thanks for the review.
I think and hope that I have responded to every issue you raised. Let me know if there are still pending issues.
Happy 4th!

include/clang/AST/Decl.h
4303

Renamed it.

lib/CodeGen/CodeGenModule.cpp
1071–1073

This is the code that actually fixes the bug. The rest of the patch is to represent uuid in the AST.

1071–1073

Removed.