Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Added @aaron.ballman as reviewer, because he was reviewer of related patch: 878e590503dff
Adding some other reviewers for more opinions while I ponder the changes. Sema seems like the wrong place to perform such concatenations, but given that adjacent string literals are concatenated during phase 6, I'm not certain it's actually observable.
clang/include/clang/Parse/Parser.h | ||
---|---|---|
578–582 ↗ | (On Diff #538025) | Unless you find a better name, I think it's preferable to keep isTokenConcatenable and isTokenPredefinedMsMacro as separate functions. |
clang/lib/Parse/ParseExpr.cpp | ||
3288–3296 | I think I'd prefer ConsumeAnyToken with an assert that checks it's a stringLiteral or a predefined ms exppression | |
clang/lib/Sema/SemaExpr.cpp | ||
2034–2070 | Can we put that logic in a separate function? | |
2043 | can you assert it's a string literal otherwise? | |
2051–2066 | I think it might be easier to create a string_literal token directly here. I'm also not sure we need to use Lexer::Stringify | |
clang/test/Sema/ms_predefined_expr.cpp | ||
12 | do we have any tests that look at the values of these things? |
clang/lib/Parse/ParseExpr.cpp | ||
---|---|---|
1300–1301 | Can we instead, look at NextToken() and if next token is a StringLiteral or a MS predefined extension we fallthrough? |
Addressed review comments.
I agree, but I couldn't come up with anything better (having limited knowledge about Clang).
Other possible ugly solution: make actual concatenation in StringLiteralParser, but this would require either of:
- Pass Decl* into StringLiteralParser - this requires dependency (and linkage) of Lex with AST.
- Pre-calculate function names for all possible types of tokens (__FUNCTION__, __FUNCDNAME__, etc.) in ActOnStringLiteral and pass into StringLiteralParser
- Find other way of obtaining current function name in Lex.
One may wish to expand these macros during phase (4), but these macros are predefined identifiers in non-MSVC compilers, and MSVC don't expand them as macros (-E compiler flag); and we don't have knowledge about AST (not to mention the function name) during phase (4).
I didn't understand this part.
clang/include/clang/Parse/Parser.h | ||
---|---|---|
578–582 ↗ | (On Diff #538025) | Regarding getLangOpts().MicrosoftExt. If you are talking about it's presence in a function which name is meant to be used as a predicate, I agree. If you are talking about class Parser, then there're other places with references to getLangOpts(). Without such function ParseStringLiteralExpression implementation would be too verbose. Meanwhile, I renamed it to isTokenLikeStringLiteral(). |
clang/lib/Parse/ParseExpr.cpp | ||
1300–1301 | I thought of that, but I was afraid that playing with fallthrough wasn't appreciated. | |
3288–3296 | Done. But do we really need an assertion here? We have one in the function preamble and strict condition in while. | |
clang/lib/Sema/SemaExpr.cpp | ||
2034–2070 | Done. Tho, I couldn't make it const (for the same reason I couldn't make getCurScopeDecl() const). And I wasn't sure about the interface: std::vector<Token> ExpandMSPredefinedMacros(ArrayRef<Token> Toks); vs void ExpandMSPredefinedMacros(MutableArrayRef<Token> Toks); | |
2043 | Done. | |
2051–2066 |
What do you mean? Is there a function which creates Token object from StringRef? Or is there a way to associate string literal value with a Token without PP? I would like to simplify it, but I haven't found other ways of achieving the same result.
Well, as far as I can see StringLiteralParser expands escape sequences. So, I am just being too careful here. | |
clang/test/Sema/ms_predefined_expr.cpp | ||
12 | clang/test/Analysis/eval-predefined-exprs.cpp clang/test/AST/Interp/literals.cpp clang/test/SemaCXX/source_location.cpp clang/test/SemaCXX/predefined-expr.cpp |
clang/include/clang/Basic/DiagnosticSemaKinds.td | ||
---|---|---|
118–120 | I think it makes since to be more explicit that the concatenation is with regard to string literals. | |
clang/include/clang/Basic/TokenKinds.h | ||
87–93 ↗ | (On Diff #540427) | |
clang/include/clang/Parse/Parser.h | ||
578–582 ↗ | (On Diff #538025) | I suggest passing getLangOpts() to isFunctionLocalPredefinedMacro (isUnexpandableMsMacro) and letting it determine whether the token is or is not one of these special not-really-a-string-literal macro things. This will facilitate additional such macros controlled by different options while also colocating the option check with the related tokens. |
clang/include/clang/Sema/Sema.h | ||
5705 | ||
clang/lib/Parse/ParseExpr.cpp | ||
1301–1312 | Since the conditional code is only relevant for some of the tokens here, I would prefer to see the other token kinds (kw___func__, kw___PRETTY_FUNCTION__) handled separately to avoid the conditional fallthrough. That means duplicating the calls to Actions.ActOnPredefinedExpr() and ConsumeToken() between the blocks, but that bothers me less than having to understand the complicated condition. | |
3282–3285 | I think __func__ is not considered a string literal in Microsoft mode, so I think this comment needs some adjusting. | |
3292–3295 | What is the reason for requiring non-concatenated predefined function local macros to be handled by ActOnPredefinedExpr()? | |
clang/lib/Sema/Sema.cpp | ||
1494 | getCurScopeDecl isn't a great name since this doesn't handle class or namespace scopes. How about getCurLocalScopeDecl? That name isn't quite right either since this can return a TranslationUnitDecl, but I think it matches the intent fairly well. None of the callers need the actual TranslationUnitDecl that is returned. I think this function can return nullptr is the current scope isn't function local and callers can issue the relevant diagnostic in that case (thus avoiding the need to check that a translation unit decl was returned). | |
clang/lib/Sema/SemaExpr.cpp | ||
1975–1978 | ||
2005–2008 | A diagnostic is issued if the call to getCurScopeDecl() returned a translation unit declaration (at least in Microsoft mode). Does it make sense to pass a pointer to a TranslationUnitDecl here? | |
3709 | This can just be a nullptr check with my suggested change to getCurScopeDecl(). | |
clang/test/Sema/ms_predefined_expr.cpp | ||
170 | If we don't already have such tests, please add tests that exercise these macros at global, namespace, and class scope. |
Addressed review comments, rebased onto main.
clang/include/clang/Basic/TokenKinds.h | ||
---|---|---|
87–93 ↗ | (On Diff #540427) | Thanks, I like the name. But shouldn't we reflect that we are referring to only Microsoft (unexpandable) macros? How about isFunctionLocalPredefinedMsMacro? |
clang/include/clang/Parse/Parser.h | ||
578–582 ↗ | (On Diff #538025) | @tahonermann, in theory it sounds good, but if I understood you correctly, the implementation seem weird to me: inline bool isFunctionLocalPredefinedMsMacro(TokenKind K, const LangOptions& langOpts) { if (!langOpts.MicrosoftExt) return false; return K == tok::kw___FUNCTION__ || K == tok::kw___FUNCSIG__ || K == tok::kw_L__FUNCTION__ || K == tok::kw_L__FUNCSIG__ || K == tok::kw___FUNCDNAME__; } And I would have to add #include "LangOptions.h" in TokenKinds.h. |
578–582 ↗ | (On Diff #538025) | If you are still concerned about usage of LangOptions in this context, and at the same time you are not against of having such function, I can offer the following alternatives:
I went with (2) to do simplification of condition in switch statement below. Marking thread resolved, as it's no longer relevant. |
clang/include/clang/Sema/Sema.h | ||
5705 | Renamed to ExpandFunctionLocalPredefinedMsMacros. If you think my addition of Ms is redundant, let me know. | |
clang/lib/Parse/ParseExpr.cpp | ||
1301–1312 | That's a valid point. And by handling two tokens separately allows simplification of the condition. Adjusted code. | |
3282–3285 | Right, removed mention of __func__. | |
3292–3295 | To ensure generation of the same AST as before my changes (with PredefinedExpr): | | `-VarDecl <col:2, col:19> col:13 a 'const char[2]' cinit | | `-PredefinedExpr <col:19> 'const char[2]' __FUNCTION__ | | `-StringLiteral <col:19> 'const char[2]' "f" This actually brings a question: do we want to wrap StringLiterals (into PredefinedExpr) which were formed from concatenation of string-literals and some predefined identifier(s)? But it does not really make any sense: concatenated string does not represent any of PredefinedExpr. | |
clang/lib/Sema/Sema.cpp | ||
1494 | Sounds good to me. The only alternative I can come up with is getCurFunctionScopeDecl, but is it any better? Adjusted code as per your suggestions. | |
clang/lib/Sema/SemaExpr.cpp | ||
1975–1978 | Changed, thank you! | |
2005–2008 | Shortly, yes, kind of. ComputeName can handle TranslationUnitDecl in which case it returns an empty string. This way we implicitly (without additional code) create empty string-literal Tokens which can be handled in StringLiteralParser. And we cannot pass non-string-literal tokens into StringLiteralParser. | |
3709 | Done, but I had to undo removal of currentDecl = Context.getTranslationUnitDecl(); but it is not a big deal. |
clang/lib/Parse/ParseExpr.cpp | ||
---|---|---|
3288–3296 | Looking at that again, i think you can remove the assert (the one at the top should stay). which you did, excellent. sorry about that! | |
3292–3295 | Yes, this make sense. I think it's better than adding more complexity / memory footprint to StringLiteral. | |
clang/lib/Sema/SemaExpr.cpp | ||
2034–2070 | I think the later let you use a small vector so it's probably more efficient. I'm find with either | |
2051–2066 | Thanks! I really would get rid of Stringify here - I struggle to see how a function could be produced that has characters that cannot appear in an identifier. even asserting isn't very useful. | |
clang/test/Sema/ms_predefined_expr.cpp | ||
12 | I think it's worth add a few more tests in clang/test/SemaCXX/predefined-expr.cpp checking concatenations |
Addressed review comments, rebased onto main
clang/lib/Sema/SemaExpr.cpp | ||
---|---|---|
2051–2066 | Replaced Stringify with enclosure into raw-string-literal. Despite raw-string-literals are available starting from C++11, as far as I can see StringLiteralParser does not check for the standard. | |
clang/test/Sema/ms_predefined_expr.cpp | ||
12 | Added tests to check values of these string literals, as well as moved ms_wide_predefined_expr.cpp tests into ms_predefined_expr.cpp. |
clang/include/clang/Basic/TokenKinds.h | ||
---|---|---|
87–93 ↗ | (On Diff #540427) | I don't think the Microsoft association is meaningful. Sure, some of the predefined macros are only treated differently when compiling in Microsoft mode, but some of those macros aren't Microsoft specific. Also, there might be macros provided by other implementations that we'll want to emulate some day. |
clang/include/clang/Sema/Sema.h | ||
5705 | I don't think the Ms is redundant, but I do think it is unnecessary and slightly confusing (__FUNCTION__) is not a Microsoft macro. | |
clang/lib/Parse/ParseExpr.cpp | ||
1307–1308 | TokenIsLikeStringLiteral() checks MicrosoftExt, so the check here is redundant. This is an example of why I would like to see the MicrosoftExt checking pushed down to isFunctionLocalPredefinedMsMacro(); that really seems where that checking belongs to me. | |
3293–3294 | Is there a missing check for MicrosoftExt here? This is another example of why I would prefer to see those checks pushed down to isFunctionLocalPredefinedMsMacro(). | |
clang/lib/Sema/SemaExpr.cpp | ||
1983–1984 | This could use a comment since it isn't obvious why the translation unit declaration is used when not in a local scope declaration of some kind. | |
2005–2008 | Ah, ok. And I see it even differentiates what name is produced for __PRETTY_FUNCTION__. That leaves me wondering what the right behavior should be at class and namespace scope, but maybe I'm better off not asking questions I don't want to know the answer to. |
clang/include/clang/Basic/TokenKinds.h | ||
---|---|---|
87–93 ↗ | (On Diff #540427) | I think it is, there is currently no way to link isFunctionLocalPredefinedMacro to the MS feature. "MSPredefinedMacro" is pretty self explanatory, same reason we most often - but not always - use GNU in the name of function related to GNU extensions. |
clang/include/clang/Basic/TokenKinds.h | ||
---|---|---|
87–93 ↗ | (On Diff #540427) | Perhaps we still haven't found the right name then. With the name that I've been advocating, this function should also return true for tok::kw___PRETTY_FUNCTION__ even though that macro should not expand to something that can be concatenated with other string literals (whether compiling in Microsoft mode or not). The Microsoft extension is the localized expansion to a string literal that can be concatenated with other string literals. We probably should isolate the conditions for that behavior to one place and if we do that, then I guess naming this function as specific to Microsoft mode is ok; we can always rename it later if it gets used for non-Microsoft extensions. Here is my suggestion then: // Return true if the token corresponds to a function local predefined // macro that, in Microsoft modes, expands to a string literal (that can // then be concatenated with other string literals). inline bool isMsFunctionLocalStringLiteralMacro(TokenKind K, const LangOptions &langOpts) { return langOpts.MicrosoftExt && ( K == tok::kw___FUNCTION__ || K == tok::kw_L__FUNCTION__ || K == tok::kw___FUNCSIG__ || K == tok::kw_L__FUNCSIG__ || K == tok::kw___FUNCDNAME__); } This will avoid the need for callers to check for MicrosoftExt; that is what I really want to avoid since it is easy to forget to do that check. |
clang/include/clang/Basic/TokenKinds.h | ||
---|---|---|
87–93 ↗ | (On Diff #540427) |
Personally, I like version with LangOptions and removal of MS. |
Addressed review comments, rebased onto main
clang/lib/Parse/ParseExpr.cpp | ||
---|---|---|
1301–1312 | Having TokenIsLikeStringLiteral I think we can return to single case. I believe 3 values in condition is not difficult. | |
1307–1308 | Let's consider this code: if (!TokenIsLikeStringLiteral(NextToken(), getLangOpts())) { Res = Actions.ActOnPredefinedExpr(Tok.getLocation(), SavedKind); ConsumeToken(); break; } The condition can be expanded as follows: !(isStringLiteral || (MsExt && NeededToken)). Let's consider MsExt is false, then we have: !isStringLiteral. So, if next token is StringLiteral we will try to concatenate it even without MsExt, which is invalid. Thus this seemingly redundant check is needed. | |
3293–3294 | My intention for this assert is to ensure that we keep treating single predefined identifiers as PredefinedExpr (when we don't need to concat them) to preserve AST. I've adjusted the check to make it more clear. |
I think this is looking good. The only big thing I noticed is some code that looks like it should have been removed with the last set of changes. Otherwise, just some minor concerns and suggestions.
clang/lib/Parse/ParseExpr.cpp | ||
---|---|---|
1294–1296 | I believe this code should be removed now; it is no longer reachable. | |
1301–1312 | I think this is good, but how about a comment to make the intent more clear? | |
clang/lib/Sema/SemaExpr.cpp | ||
1903 | ||
2014–2017 | "EFLPM"? I'm assuming this is an attempt to avoid an accidental clash with the computed name. Does this suffice to ensure a clash can't happen? If not, can we do better? Per http://eel.is/c++draft/lex.string#nt:r-char and http://eel.is/c++draft/lex.charset#1, we have a lot of flexibility regarding which characters to use. | |
clang/test/Sema/ms_predefined_expr.cpp | ||
115 | How about testing some other encoding prefix combinations? u"" __FUNCTION // Ok; UTF-16 string literal u"" L__FUNCTION__ // ill-formed. |
Thanks to your suggestion of testing different types, I realized clang does not support MSVC macros with u, u8, and U prefixes in addition to L__FUNCDNAME. By the way, clang replicates MSVC behavior a little bit incorrectly: L__FUNCTION__ is not a valid token for MSVC, however it becomes valid if used via macro concatenation (see #define WIDE).
I think addition of support of new macros as well as bug fix (if possible) should be submitted separately. I'll open-up an issue, and submit a PR later.
clang/lib/Sema/SemaExpr.cpp | ||
---|---|---|
2014–2017 | At first I thought you were hinting me to use non-ascii characters for d-char-sequence. However, when I looked closely d-char-sequence is not as flexible as r-chars that you referenced: http://eel.is/c++draft/lex.string#nt:d-char and http://eel.is/c++draft/lex.charset#2 Here're my thoughts:
| |
clang/test/Sema/ms_predefined_expr.cpp | ||
115 | Good idea. I am not sure whether I should specify -std in the test command. I'll leave it without, because current default standard is C++17, and I believe it's not going to be decreased. And I think there are enough tests of checking values of these macros. So, in tests for encoding I am going to check types. |
clang/include/clang/Basic/TokenKinds.h | ||
---|---|---|
87–93 ↗ | (On Diff #540427) |
I don't think there is any motivation to future proof against hypotheticals, we can always refactor later
This makes me uneasy, but i think we can move the function to LiteralSupport.h and include that in ParseExpr.cpp. |
clang/lib/Sema/SemaExpr.cpp | ||
---|---|---|
2014–2017 | Sorry about that; you are absolutely right that d-char-sequence is (much) more constrained than r-char-sequence. For __FUNCSIG__, the generated text can include arbitrary text. For example, given this declaration: void f(decltype(sizeof("()"))*) the macro value is: void __cdecl f(decltype(sizeof ("()")) *) which does contain )". I think we should do more to avoid any possibility of the resulting string being unparseable because the resulting diagnostics will be completely inscrutable. The best way to do that would be to avoid trying to produce a parseable string literal in the first place. Based on that and given that Sema::ActOnStringLiteral() immediately uses StringLiteralParser to produce an expanded string literal, I think we would be better off moving this expansion there such that these macros can be expanded to already cooked string literals that don't need to be parsed at all. This will mean having to modify the StringLiteralParser constructor to accept a Decl* pointer that will be passed getCurLocalScopeDecl() in Sema::ActOnStringLiteral(). StringLiteralParser can then assert if it ever encounters one of these function local predefined tokens when it hasn't been provided a corresponding Decl* to use with them. | |
clang/test/Sema/ms_predefined_expr.cpp | ||
115 | Thank you for adding those. I think we should check values for a couple of cases as well. Consider a function name that contains 𐀀 (U+10000 LINEAR B SYLLABLE B008 A). In UTF-8, that character requires four code units (F0 90 80 80) while in UTF-16 it requires two (D800 DC00). We'll want to ensure that such characters are properly converted between encodings. void test_encoding𐀀() { ASSERT_EQ(u"" __FUNCTION__, u"test_encoding𐀀"); } |
clang/test/Sema/ms_predefined_expr.cpp | ||
---|---|---|
170 | Here is another case to test. MSVC accepts this. void test_static_assert() { static_assert(true, __FUNCTION__); } |
clang/lib/Sema/SemaExpr.cpp | ||
---|---|---|
2014–2017 | I didn't think about that scenario but in that case the original idea to use Lexer::Stringify seems like a cleaner solution to me. In that light, Lexer::Stringify seems like the least worse option. |
clang/lib/Sema/SemaExpr.cpp | ||
---|---|---|
2014–2017 | Yeah, that might be the simpler solution. That matches how __FILE__, __BASE_FILE__, and __FILE_NAME__ are handled in Preprocessor::ExpandBuiltinMacro(). |
Addressed review comments, rebased onto main
Noticable changes:
- Diagnostics message
- Support of expansion of these attributes in 2nd argument of static_assert
- Moved helper-functions to Lex/LiteralSupport
- Changed back to usage of Lexer::Stringify for function name expansion
Local lit clang/test succeed for both Debug and Release builds.
clang/include/clang/Basic/TokenKinds.h | ||
---|---|---|
87–93 ↗ | (On Diff #540427) | Sounds good to me. Due to other changes I also moved TokenIsLikeStringLiteral there.
Yes, not a strong argument, but I'd like to keep current name if you are not against. |
clang/lib/Sema/SemaExpr.cpp | ||
2014–2017 | Thanks for pointing out good counter-example.
Unfortunately, yes. In regards other possible implementations, I commented previously here: https://reviews.llvm.org/D153914#4500807
This requires dependency (and linkage) of Lex with AST. I am going to add your counter-examples to test, and change implementation back to Lexer::Stringify | |
clang/test/Sema/ms_predefined_expr.cpp | ||
170 | Thanks for finding this! I had to do some adjustments to support it. In addition I changed diagnostics message because it no longer made sense, and removed assert about "single predefined identifiers shall be handled by ActOnPredefinedExpr". Other code around references of err_expected_string_literal do not seem to require support of this MS ext. |
Removed irrelevant changes
clang/lib/Sema/SemaExpr.cpp | ||
---|---|---|
1981 | I think I should remove this assertion so this function would be usable without MS ext, on the other hand it would be a noop without MS ext. |
I think I'm happy with this. I noted two code changes that I think are no longer needed and offered some final suggestions on some names.
clang/include/clang/Basic/DiagnosticSemaKinds.td | ||
---|---|---|
118–120 | For consistency with other names, I'll suggest some alternative names here. I think these better reflect the actual diagnostic message.
| |
clang/include/clang/Basic/TokenKinds.h | ||
22–23 ↗ | (On Diff #544085) | It looks like this change is no longer needed. |
clang/lib/Basic/TokenKinds.cpp | ||
14–15 ↗ | (On Diff #544085) | This change appears to no longer be needed. |
clang/lib/Sema/SemaExpr.cpp | ||
---|---|---|
1981 | I suggest leaving it in solely because use of the function does impose some overhead in the construction of the std::vector that is returned. If that overhead can be avoided (I guess by passing the container to populate by reference as a separate argument and then return either the original ArrayRef or a new one referencing the populated container), then this could be made a no-op and the checks for MicrosoftExt at the call sites could be removed. |
I'm happy with this. I noted one last rename suggestion to maintain consistency, but am accepting. Please give Corentin a final chance to raise any concerns. Thank you for being so responsive through all the review iterations!
clang/include/clang/Basic/DiagnosticGroups.td | ||
---|---|---|
1210 | We should probably name the diag group to match.
|
Thank you for the review! I apologize for missing the small details; I should have noticed and addressed them myself.
Rebased onto main, run local lit clang/tests. (bump for @cor3ntin)
As far as I understand this shall be merged into main by maintainers.
Please let me know if something else is expected from me.
clang/lib/Sema/Sema.cpp | ||
---|---|---|
1494–1505 | Made the function const because it's not writing to any fields, removed else because of preceding return. | |
clang/lib/Sema/SemaExpr.cpp | ||
1986 | Matching our usual naming conventions. | |
1998–2000 | This will miss the diagnostic if the current declaration is a namespace instead of at file scope, right? | |
2034–2036 | Probably worth a comment here explaining that ExpandedToks acts as the backing store for StringToks, which is why both are being assigned to here. |
Thanks for asking; keep the name/email as-is in the commit you see: Author: Richard Dzenis <dzenis@richard.lv>
clang/lib/Sema/Sema.cpp | ||
---|---|---|
1494–1505 | I would love to, and I tried. But unfortunately it's not possible without introducing more changes: all the member functions (except getCurFunctionOrMethodDecl()) are non-const. | |
clang/lib/Sema/SemaExpr.cpp | ||
1998–2000 | Right. But getCurLocalScopeDecl() returns function scope at most. See Note in a code comment above. | |
2005–2008 |
|
Updated comment; rebased onto main
clang/lib/Sema/SemaExpr.cpp | ||
---|---|---|
2034–2035 | Thanks, that's better. Changed in both places. |
We should probably name the diag group to match.