This is an archive of the discontinued LLVM Phabricator instance.

[ms] Add support for parsing uuid as a MS attribute
ClosedPublic

Authored by thakis on Aug 25 2016, 4:32 PM.

Details

Summary

Some Windows SDK classes, for example Windows::Storage::Streams::IBufferByteAccess, use the ATL way of spelling attributes:

[uuid("....")] class IBufferByteAccess {};

To be able to use __uuidof() to grab the uuid off these types, clang needs to support uuid as a Microsoft attribute. There was already code to skip Microsoft attributes, extend that to look for uuid and parse it. Add a new "MS" attribute type for this syntax.

There's already a function that moves attributes off the declspec into an attribute list for attributes applying to the type, teach that function to also move MS attributes around. This requires that MS attributes make it to the DS in the first place, so move the call to MaybeParseMicrosoftAttributes() into ParseDeclOrFunctionDefInternal() so that it can store its results in DS.getAttributes().

Diff Detail

Event Timeline

thakis updated this revision to Diff 69302.Aug 25 2016, 4:32 PM
thakis retitled this revision from to [ms] Add support for parsing uuid as a MS attribute.
thakis updated this object.
thakis added a reviewer: rnk.
thakis added a subscriber: cfe-commits.
rsmith added inline comments.
include/clang/Basic/Attr.td
201

Is there some better description for this than MS? __declspec is also an MS attribute. Is it fair to call this IDL or MSIDL or similar?

include/clang/Parse/Parser.h
2109–2111

Can you give this a less vague-sounding name? :)

lib/Parse/ParseDeclCXX.cpp
3950

Is anything known to fail if you accept arbitrary attributes here, or is this just the only one you /know/ you need right now? (If we can ditch the whitelist, that would seem preferable.)

rnk edited edge metadata.Aug 25 2016, 5:46 PM

I think these are known as "IDL attributes":
https://msdn.microsoft.com/en-us/library/8tesw2eh.aspx

Let's update the naming to use that terminology, so AS_MS should be AS_IDL, and MaybeParseMicrosoftAttributes should be MaybeParseMicrosoftIDLAttributes, etc.

Also, doesn't this introduce ambiguities into the grammar? Something like this:

void useit(int);
int main() {
  int uuid = 42;
  [uuid]() {
    useit(uuid);
  }();
}

Will we keep parsing that as a lambda after this change or not?

thakis marked an inline comment as done.Aug 25 2016, 7:06 PM

Thanks for the fast review and the good comments! I'll leave it to y'all to agree on some name if you don't like the one I picked.

Also, doesn't this introduce ambiguities into the grammar? Something like this:

void useit(int);
int main() {
  int uuid = 42;
  [uuid]() {
    useit(uuid);
  }();
}

Will we keep parsing that as a lambda after this change or not?

This change shouldn't change how things are parsed. It moved calls of MaybeParseMicrosoftAttributes(attrs) that precede ParseExternalDeclaration(attrs) into ParseExternalDeclaration(). As far as I can tell, every call of ParseExternalDeclaration() was preceded by a call to MaybeParseMicrosoftAttributes(), so this should be behavior-preserving. In your example, the lamda is not an external declaration, so it's not affected. (That means [uuid(...)] won't work for local classes, but that's probably ok.)

Now that I looked at this more, behavior changes slightly for empty external declarations. Before,

[foob];

didn't produce any output, now it produces "declaration doesn't declare anything". I don't know if that's important.

include/clang/Basic/Attr.td
201

There are many IDL compilers. MSIDL works for me, so does anything else you 3 agree on :-)

include/clang/Parse/Parser.h
2109–2111

Changed to stripTypeAttribsOffDeclSpec

lib/Parse/ParseDeclCXX.cpp
3950

These attributes can be different from decl spec args. For example, we don't have anything that parses threading("apartment"), module(name="MyLib") at the moment (I suppose ParseAttributeArgsCommon does that at a token level, but it won't know all these attributes that are allowed here).

thakis updated this revision to Diff 69312.Aug 25 2016, 7:09 PM
thakis edited edge metadata.
thakis marked an inline comment as done.

comments round 1

thakis updated this revision to Diff 69419.Aug 26 2016, 12:02 PM

Aaron's right that these are currently called "Microsoft" in the source. I agree that that's a confusing name and we should probably rename it, but on second thought that seems unrelated to this change. If y'all agree on a name, I can do the renaming in a separate change. I changed the patch to use "Microsoft" until then. I also realized that this was missing a tablegen change; now included.

rnk added a comment.Aug 29 2016, 11:00 AM

The code looks good and I'm happy to leave the naming as it is for now. Apparently MSVC is warning on these attributes and calling them "ATL attributes" now: https://msdn.microsoft.com/en-us/library/mt723604.aspx I'm not sure if that warning applies to all square-bracketed user defined attributes or not, but the naming is certainly confusing.

Even though this change doesn't change the behavior of the parser, I would feel more comfortable if we added a new test at test/SemaCXX/ms-square-bracket-attributes.cpp or something that exhaustively lists the ambiguities we know about and codifies how we diagnose them. That way, if someone accidentally changes behavior in this area, we'll know about it. This seems like a good starting point for the test:

void local_class() {
  // FIXME: MSVC accepts, but we reject due to ambiguity.
  // expected-error@+1 {{expected body of lambda expression}}
  [uuid("a5a7bd07-3b14-49bc-9399-de066d4d72cd")] struct Local {
    int x;
  };
}

void useit(int);

int lambda() {
  int uuid = 42;
  [uuid]() {
    useit(uuid);
  }();
}
aaron.ballman requested changes to this revision.Aug 29 2016, 2:29 PM
aaron.ballman edited edge metadata.

You are removing a lot of instances of MaybeParseMicrosoftAttributes(), but I'm not certain I understand why. Can you describe why you need to remove those?

There are things missing from this patch as well. To wit:

  • Attributes.h needs to know about this attribute syntax, and hasAttribute() needs to support it (this hooks into __has_attribute support).
  • AttributeList.h should be updated to expose the parsed attribute syntax.
  • Both of these changes should be threaded through ClangAttrEmitter.cpp.
  • We should consider adding (perhaps as a separate patch?) __has_microsoft_attribute() to complement __has_declspec_attribute().
  • This change should probably go into the release notes.

I agree with @rnk, I would like to see more comprehensive tests of what we do and do not support for the parsing of these attributes. Previously, we didn't support Microsoft attributes at all, so it was fine to just eat them whenever they appeared and not worry about whether we were actually parsing them properly or not. Now that we want to support them, I want to make sure the parsing is correct, even if uuid wouldn't show up in that particular position. I am fine if your tests show things we don't parse properly yet with FIXME comments (assuming that your patch does not regress that behavior). However, I'd prefer those tests appear in Parser rather than SemaCXX. ;-)

include/clang/Parse/Parser.h
2109–2110

Better, but how about stripTypeAttributesOffDeclSpec? We don't use "attribs" anywhere else.

include/clang/Sema/AttributeList.h
111

Please remove the trailing comma.

lib/Parse/ParseDecl.cpp
516–518

This change is not incorrect, but I think you need a custom parsing step as well. IIRC, the uuid attribute supports unquoted GUIDs, like [uuid(A-B-C-D)], and this won't handle it properly. Attr.td says UUID takes a StringArgument, which may require a bit of thought as to whether we need a flag on there to say "or... maybe not a string...for Microsoft attributes only". Perhaps other MS attributes do the same thing and we're okay for our declarative information?

1427–1428

I'd spell this out a bit more.

// Usually, `__attribute__((attrib)) class Foo {} var;` means that the attribute appertains to the declaration (var) rather than the type (Foo).
1431–1434

Is this documented somewhere, or just what you gather from experimentation?

utils/TableGen/ClangAttrEmitter.cpp
2372 ↗(On Diff #69419)

I think (but my memory is a bit fuzzy here) that these values need to match the ones from AttributeList.h. I think it's a bug that Pragma was at 4 rather than 5, but since __has_attribute doesn't care about pragmas (or context sensitive keywords), it's likely benign.

This revision now requires changes to proceed.Aug 29 2016, 2:29 PM
rnk added a comment.Aug 29 2016, 4:52 PM

You are removing a lot of instances of MaybeParseMicrosoftAttributes(), but I'm not certain I understand why. Can you describe why you need to remove those?

That shouldn't change functionality, Nico wrote: "It moved calls of MaybeParseMicrosoftAttributes(attrs) that precede ParseExternalDeclaration(attrs) into ParseExternalDeclaration(). As far as I can tell, every call of ParseExternalDeclaration() was preceded by a call to MaybeParseMicrosoftAttributes(), so this should be behavior-preserving."

include/clang/Sema/AttributeList.h
111

IMO trailing commas in long lists are nice. There's a reason they got added to C++11.

thakis marked 3 inline comments as done.Aug 30 2016, 11:34 AM

There are things missing from this patch as well. To wit:

  • Attributes.h needs to know about this attribute syntax, and hasAttribute() needs to support it (this hooks into __has_attribute support).
  • AttributeList.h should be updated to expose the parsed attribute syntax.
  • Both of these changes should be threaded through ClangAttrEmitter.cpp.

All these are already part of this patch as far as I can tell. What am I missing?

lib/Parse/ParseDecl.cpp
1427–1428

(ish)

1431–1434

Experimentation.

utils/TableGen/ClangAttrEmitter.cpp
2372 ↗(On Diff #69419)

I can change pragma to map to one more in a separate patch if you want.

thakis marked 2 inline comments as done.Aug 30 2016, 11:36 AM

(Just wanted to ask if I'm missing something about these three bullets. I do need to make a few other changes: More tests, parsing for uuid(1-2-3) without quotes + tests, and I also want to add a deprecation warning with a fixit, given that msvs does that too)

There are things missing from this patch as well. To wit:

  • Attributes.h needs to know about this attribute syntax, and hasAttribute() needs to support it (this hooks into __has_attribute support).
  • AttributeList.h should be updated to expose the parsed attribute syntax.
  • Both of these changes should be threaded through ClangAttrEmitter.cpp.

All these are already part of this patch as far as I can tell. What am I missing?

This is what happens when I split my review into two sessions. I think I was just tired, because I now see that you did change Attributes.h and AttributeList.h properly. Sorry about the noise! However, I think GenerateHasAttrSpellingStringSwitch() should be updated (the Microsoft variety shouldn't work if Microsoft extensions are disabled), but otherwise ClangAttrEmitter.cpp seems reasonable.

lib/Parse/ParseDecl.cpp
1431–1434

It might be good to clarify that in the comment so we don't come along later and think it's a hard-and-fast rule.

utils/TableGen/ClangAttrEmitter.cpp
2372 ↗(On Diff #69419)

Yeah, or I can take care of it, either way is fine by me.

thakis updated this revision to Diff 70013.Sep 1 2016, 8:10 AM
thakis edited edge metadata.
thakis marked an inline comment as done.

Sorry about the delay! Now with custom parsing code that accepts uuid() without quotes. I thought this would be complicated, but ended up being not so bad in the end. See the tests for several interesting edge cases.

I figured I'd punt the deprecation warning to a (fast) follow-up.

rnk added inline comments.Sep 1 2016, 8:19 AM
lib/Parse/ParseDeclCXX.cpp
3943

This isn't so bad. :) I was expecting you'd have to change the lexer.

test/Parser/ms-square-bracket-attributes.mm
138 ↗(On Diff #70013)

nice

aaron.ballman added inline comments.Sep 1 2016, 11:30 AM
include/clang/Parse/Parser.h
2217

Attrs instead of attrs

lib/Parse/ParseDeclCXX.cpp
3947

Should we use Twine for building this string up (since this is used in a lot of header files)?

3950

How well does this diagnose [uuid{"000000A0-0000-0000-C000-000000000049"}] struct S {};? (Note the use of curly braces in lieu of parens for the argument.)

3993

Will this properly handle an abomination involving string literal concatenation? I shudder to type this, but: [uuid({000000A0-0000-""0000-C000-000000000049})] struct S {}; MSVC rejects, but I think this code will silently accept it.

test/Parser/ms-square-bracket-attributes.mm
12 ↗(On Diff #70013)

Can we also get a test case showing the behavior of multiple arguments in the same uuid attribute? e.g., [uuid("{000000A0-0000-0000-C000-000000000049}", "1")] struct S {};
And a test showing that we reject multiple uuid attributes on the same entity? e.g., [uuid("{000000A0-0000-0000-C000-000000000049}"), uuid("{000000A0-0000-0000-C000-000000000050}")] struct S {};

thakis updated this revision to Diff 70036.Sep 1 2016, 11:52 AM
thakis edited edge metadata.
thakis marked 6 inline comments as done.

Address comments.

thakis added inline comments.Sep 1 2016, 11:53 AM
include/clang/Parse/Parser.h
2217

It's spelled attrs in the function above and below this one…

lib/Parse/ParseDeclCXX.cpp
3947

I thought Twine is a rvalue type…yes, from its docs:

/// A Twine is not intended for use directly and should not be stored, its
/// implementation relies on the ability to store pointers to temporary stack
/// objects which may be deallocated at the end of a statement. Twines should
/// only be used accepted as const references in arguments, when an API wishes
/// to accept possibly-concatenated strings.

I think this is should be pretty fast as-is.

3950

That's currently silently ignored since ParseMicrosoftAttributes() looks for 'uuid' '(' to call the uuid parser.

3993

This is rejected, "" is returned as a string token which becomes "" (including quotes) after getSpelling. Added a test case with this.

test/Parser/ms-square-bracket-attributes.mm
12 ↗(On Diff #70013)

The first is a nice testcase, added, thanks.

We don't currently reject multiple uuids (also not with declspec). Fixing this is on my list of follow-ups, but since the fix will live in sema land and be independent of the attribute spelling, I didn't put it in this change. (Fun fact: cl.exe doesn't warn if you have one []-uuid and one declspec-uuid. We'll get that right once we diag on it.)

aaron.ballman added inline comments.Sep 1 2016, 12:17 PM
lib/Parse/ParseDeclCXX.cpp
3950

Silently ignoring seems like the wrong thing to do -- can we diagnose (even if it's a less-than-ideal diagnostics with a fixme)?

test/Parser/ms-square-bracket-attributes.mm
13 ↗(On Diff #70036)

Diagnosing the second case in a follow-up patch is totally fine by me.

rsmith added inline comments.Sep 1 2016, 1:35 PM
include/clang/Basic/Attr.td
201

Given that MS have deprecated this syntax and are trying to move away from it towards __declspec, I think calling this "Microsoft" is a mistake. As MS' documentation calls these "ATL attributes", it would seem reasonable for us to do the same.

thakis added inline comments.Sep 1 2016, 1:48 PM
lib/Parse/ParseDeclCXX.cpp
3950

We still skip the majority of [] contents. Maybe the token 'uuid' can appear in some other attribute, followed by something else. So we probably shouldn't diag on 'uuid' followed by not-'(' (right?). Do you want me to add a diagnostic for 'uuid' '{'? What about 'uuid' '['?

aaron.ballman added inline comments.Sep 1 2016, 1:58 PM
lib/Parse/ParseDeclCXX.cpp
3950

The grammar for the uuid attribute shows it requires the parens (it also shows that it only accepts a string literal, so take that with a grain of salt), so I think we should diagnose in this case, especially since we're manually parsing the args. So if you see "[uuid" followed by any token other than "(", I think that's an error (and MSVC treats it as such).

thakis added inline comments.Sep 1 2016, 2:35 PM
lib/Parse/ParseDeclCXX.cpp
3950

Sure, but uuid could be preceded by other tokens, e.g.

[ someotherattrib(foobar...) ..., uuid {

someotherattrib might take an argument that's called uuid and it might be valid to have uuid followed by something not '(' there, say [identify_class_by(uuid), uuid("1-2-3")]. This is a made-up example; I don't know if this is actually the case, I'm just saying I don't know, so I think I shouldn't diag on 'uuid' followed by something not-'(' in general.

Do you want me to peephole-diag on '[' 'uuid' not-'(' in the case when uuid is the first attrib in the attrib list?

thakis added inline comments.Sep 1 2016, 3:15 PM
include/clang/Basic/Attr.td
201

As said upthread, I'm happy to rename this to whatever you all agree on (or what you decree given that you're code owner), but they are called Microsoft attributes in today's code already, and renaming that is unrelated to this CL. If you think it's important, I'm happy to rename the existing code before lading this CL and then using the new name in this CL immediately.

Re "ATL attributes": I think some of the attributes in [] are SAL attributes which as far as I know is independent of ATL. Are you sure the documentation refers to the [] syntax, or could it refer to specific attributes in that syntax?

aaron.ballman added inline comments.Sep 1 2016, 3:20 PM
include/clang/Basic/Attr.td
201

MS' documentation does not call them "ATL attributes" only. The documentation also calls them COM attributes, compiler attributes, and other names. However, the compiler diagnoses them as "usage of ATL attributes is deprecated", so it's somewhat defensible as a name.

I think "ATL" is the wrong name for them *only here*, but I'm not opposed to making a sweeping change to name them ATL attributes everywhere in the compiler (including here). However, I also don't see an issue with continuing to call them Microsoft attributes, either, because it's unlikely to cause any lasting confusion (it's caused zero confusion in the years I've been working on Clang).

lib/Parse/ParseDeclCXX.cpp
3950

I'm a bit lost. You've processed the open square bracket, so you know you're in an attribute list. Then you ignore everything until you get an identifier. If that identifier is uuid, you do special processing. Right now, you require uuid( to do the special processing. What I am saying is that if, when processing an attribute-token within an attribute list, you find uuid followed by anything other than a (, it should be diagnosed.

So, I expect [identify_class_by(uuid), uuid("1-2-3")] to be fine, but [uuid["1-2-3-"], identify_by_class(uuid), uuid*] to diagnose the uuid["1-2-3"] and uuid* as being malformed (though I'd be fine if we skipped everything past the first malformed attribute if it's better for error recovery).

thakis added inline comments.Sep 1 2016, 3:33 PM
lib/Parse/ParseDeclCXX.cpp
3950

I too feel a bit lost :-)

Right now, the parsing code doesn't look at attribute lists, only at tokens, so [identify_class_by(uuid), skips tokens until we hit 'uuid', and if I were to diag on not-'(' after 'uuid', then I'd diag here, right? Do you want me to do some more parsing to look for comma-separated entities in the attribute list (and count parens brackets braces and ignore commas there) to fix this?

Sorry, I feel we're talking past each other :-(

aaron.ballman added inline comments.Sep 1 2016, 3:39 PM
lib/Parse/ParseDeclCXX.cpp
3950

I think we're getting to the crux of it, and I'm sorry for my being obtuse -- yes, I would like this to properly parse it as a list of attributes, rather than this eat-and-pray approach. The first element must be an identifier. If we get uuid, we process it as discussed. Any other identifier, we eat until we hit a ',', ']', or eof, and continue (if not ]).

aaron.ballman added inline comments.Sep 1 2016, 5:39 PM
include/clang/Basic/Attr.td
201

It seems that the "usage of ATL attributes is deprecated" means attributes appertaining to ATL functionality are deprecated, not that the attribute syntax is "ATL attribute" syntax which is deprecated. For instance, SAL attributes can be used and no diagnostic is emitted. Based on that, I think calling them ATL attributes is definitely incorrect.

lsc36 added a subscriber: lsc36.Sep 1 2016, 6:12 PM
thakis updated this revision to Diff 70197.Sep 2 2016, 11:10 AM

Require '('.

thakis marked an inline comment as done.Sep 2 2016, 11:12 AM
thakis added inline comments.
lib/Parse/ParseDeclCXX.cpp
3950

Aha, I had forgotten that SkipUntil() handles paren nesting intelligently, so I think just requiring a '(' after 'uuid' with the current code should work fairly well. Done, and added a test.

aaron.ballman accepted this revision.Sep 2 2016, 12:15 PM
aaron.ballman edited edge metadata.

LGTM with one small test case nit.

test/Parser/ms-square-bracket-attributes.mm
43 ↗(On Diff #70197)

For giggles, can you also add:

// expected-error@+1 {{expected ')'}}
[uuid("000000A0-0000-0000-C000-000000000049"}] struct T {};
This revision is now accepted and ready to land.Sep 2 2016, 12:15 PM
rsmith accepted this revision.Sep 2 2016, 2:12 PM
rsmith edited edge metadata.

For submit, can you separate the changes to generally support [] attribute syntax and the changes to parse [uuid(...)] into distinct commits? (I'm OK with the Microsoft attribute support piece landing with no tests since the uuid patch adds the relevant testing -- and I believe that piece by itself is NFC.)

test/Parser/ms-square-bracket-attributes.mm
129–131 ↗(On Diff #70197)

Please also test cases like

[uuid(00000000-0000-0000-0000-000000000000)] { return uuid; }
[uuid("00000000-0000-0000-0000-000000000000")] (int n) { return uuid[n]; }

(which are lambda *init-capture*s in C++14 onwards).

thakis updated this object.Sep 2 2016, 8:32 PM
thakis edited edge metadata.
thakis closed this revision.Sep 2 2016, 8:34 PM
thakis marked an inline comment as done.

Thanks! Landed in r280574 r280575 r280576 280578.