This is an archive of the discontinued LLVM Phabricator instance.

[clang-cl] Enable concatenation of predefined identifiers
ClosedPublic

Authored by RIscRIpt on Jun 27 2023, 1:29 PM.

Diff Detail

Event Timeline

RIscRIpt created this revision.Jun 27 2023, 1:29 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 27 2023, 1:29 PM
RIscRIpt requested review of this revision.Jun 27 2023, 1:29 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 27 2023, 1:29 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
RIscRIpt updated this revision to Diff 538025.Jul 7 2023, 1:56 AM

Rebased onto main

Added @aaron.ballman as reviewer, because he was reviewer of related patch: 878e590503dff

Adding some other reviewers for more opinions while I ponder the changes. Sema seems like the wrong place to perform such concatenations, but given that adjacent string literals are concatenated during phase 6, I'm not certain it's actually observable.

cor3ntin added inline comments.Jul 13 2023, 7:54 AM
clang/include/clang/Parse/Parser.h
578–582 ↗(On Diff #538025)

Unless you find a better name, I think it's preferable to keep isTokenConcatenable and isTokenPredefinedMsMacro as separate functions.
Also, this seems like a weird place to check for getLangOpts().MicrosoftExt

clang/lib/Parse/ParseExpr.cpp
3288–3296

I think I'd prefer ConsumeAnyToken with an assert that checks it's a stringLiteral or a predefined ms exppression

clang/lib/Sema/SemaExpr.cpp
2034–2070

Can we put that logic in a separate function?

2043

can you assert it's a string literal otherwise?

2051–2066

I think it might be easier to create a string_literal token directly here. I'm also not sure we need to use Lexer::Stringify

clang/test/Sema/ms_predefined_expr.cpp
12

do we have any tests that look at the values of these things?

cor3ntin added inline comments.Jul 13 2023, 8:20 AM
clang/lib/Parse/ParseExpr.cpp
1300–1301

Can we instead, look at NextToken() and if next token is a StringLiteral or a MS predefined extension we fallthrough?
That would avoid having duplicated logic in ParsePredefinedExpression and ActOnStringLiteral which would be a nice simplification

RIscRIpt updated this revision to Diff 540395.Jul 14 2023, 6:33 AM

Addressed review comments.

Sema seems like the wrong place to perform such concatenations, but given that adjacent string literals are concatenated during phase 6...

I agree, but I couldn't come up with anything better (having limited knowledge about Clang).
Other possible ugly solution: make actual concatenation in StringLiteralParser, but this would require either of:

  1. Pass Decl* into StringLiteralParser - this requires dependency (and linkage) of Lex with AST.
  2. Pre-calculate function names for all possible types of tokens (__FUNCTION__, __FUNCDNAME__, etc.) in ActOnStringLiteral and pass into StringLiteralParser
  3. Find other way of obtaining current function name in Lex.

One may wish to expand these macros during phase (4), but these macros are predefined identifiers in non-MSVC compilers, and MSVC don't expand them as macros (-E compiler flag); and we don't have knowledge about AST (not to mention the function name) during phase (4).

I'm not certain it's actually observable.

I didn't understand this part.

clang/include/clang/Parse/Parser.h
578–582 ↗(On Diff #538025)

Regarding getLangOpts().MicrosoftExt. If you are talking about it's presence in a function which name is meant to be used as a predicate, I agree. If you are talking about class Parser, then there're other places with references to getLangOpts().

Without such function ParseStringLiteralExpression implementation would be too verbose.
Let's decide what we can do after I address other comments.

Meanwhile, I renamed it to isTokenLikeStringLiteral().

clang/lib/Parse/ParseExpr.cpp
1300–1301

I thought of that, but I was afraid that playing with fallthrough wasn't appreciated.
Thanks, fixed.

3288–3296

Done. But do we really need an assertion here? We have one in the function preamble and strict condition in while.

clang/lib/Sema/SemaExpr.cpp
2034–2070

Done. Tho, I couldn't make it const (for the same reason I couldn't make getCurScopeDecl() const). And I wasn't sure about the interface:

std::vector<Token> ExpandMSPredefinedMacros(ArrayRef<Token> Toks);

vs

void ExpandMSPredefinedMacros(MutableArrayRef<Token> Toks);
2043

Done.

2051–2066

I think it might be easier to create a string_literal token directly here.

What do you mean? Is there a function which creates Token object from StringRef? Or is there a way to associate string literal value with a Token without PP? I would like to simplify it, but I haven't found other ways of achieving the same result.

I'm also not sure we need to use Lexer::Stringify

Well, as far as I can see StringLiteralParser expands escape sequences. So, I am just being too careful here.
If not using Lexer::Stringify do we want to assert that function name does not contain neither " not \ (which should not happen™)?

clang/test/Sema/ms_predefined_expr.cpp
12
clang/test/Analysis/eval-predefined-exprs.cpp
clang/test/AST/Interp/literals.cpp
clang/test/SemaCXX/source_location.cpp
clang/test/SemaCXX/predefined-expr.cpp
RIscRIpt updated this revision to Diff 540427.Jul 14 2023, 8:09 AM

Rebased onto main

tahonermann requested changes to this revision.Jul 14 2023, 11:56 AM
tahonermann added inline comments.
clang/include/clang/Basic/DiagnosticSemaKinds.td
118–120

I think it makes since to be more explicit that the concatenation is with regard to string literals.

clang/include/clang/Basic/TokenKinds.h
87–93 ↗(On Diff #540427)
clang/include/clang/Parse/Parser.h
578–582 ↗(On Diff #538025)

I suggest passing getLangOpts() to isFunctionLocalPredefinedMacro (isUnexpandableMsMacro) and letting it determine whether the token is or is not one of these special not-really-a-string-literal macro things. This will facilitate additional such macros controlled by different options while also colocating the option check with the related tokens.

clang/include/clang/Sema/Sema.h
5715
clang/lib/Parse/ParseExpr.cpp
1301–1312

Since the conditional code is only relevant for some of the tokens here, I would prefer to see the other token kinds (kw___func__, kw___PRETTY_FUNCTION__) handled separately to avoid the conditional fallthrough. That means duplicating the calls to Actions.ActOnPredefinedExpr() and ConsumeToken() between the blocks, but that bothers me less than having to understand the complicated condition.

3282–3285

I think __func__ is not considered a string literal in Microsoft mode, so I think this comment needs some adjusting.

3292–3295

What is the reason for requiring non-concatenated predefined function local macros to be handled by ActOnPredefinedExpr()?

clang/lib/Sema/Sema.cpp
1494

getCurScopeDecl isn't a great name since this doesn't handle class or namespace scopes. How about getCurLocalScopeDecl? That name isn't quite right either since this can return a TranslationUnitDecl, but I think it matches the intent fairly well.

None of the callers need the actual TranslationUnitDecl that is returned. I think this function can return nullptr is the current scope isn't function local and callers can issue the relevant diagnostic in that case (thus avoiding the need to check that a translation unit decl was returned).

clang/lib/Sema/SemaExpr.cpp
1976–1979
2006–2009

A diagnostic is issued if the call to getCurScopeDecl() returned a translation unit declaration (at least in Microsoft mode). Does it make sense to pass a pointer to a TranslationUnitDecl here?

3710

This can just be a nullptr check with my suggested change to getCurScopeDecl().

clang/test/Sema/ms_predefined_expr.cpp
170

If we don't already have such tests, please add tests that exercise these macros at global, namespace, and class scope.

This revision now requires changes to proceed.Jul 14 2023, 11:56 AM
RIscRIpt updated this revision to Diff 540600.Jul 14 2023, 5:09 PM
RIscRIpt marked 17 inline comments as done.

Addressed review comments, rebased onto main.

clang/include/clang/Basic/TokenKinds.h
87–93 ↗(On Diff #540427)

Thanks, I like the name. But shouldn't we reflect that we are referring to only Microsoft (unexpandable) macros? How about isFunctionLocalPredefinedMsMacro?

clang/include/clang/Parse/Parser.h
578–582 ↗(On Diff #538025)

@tahonermann, in theory it sounds good, but if I understood you correctly, the implementation seem weird to me:

inline bool isFunctionLocalPredefinedMsMacro(TokenKind K, const LangOptions& langOpts) {
  if (!langOpts.MicrosoftExt)
    return false;
  return K == tok::kw___FUNCTION__ || K == tok::kw___FUNCSIG__ ||
         K == tok::kw_L__FUNCTION__ || K == tok::kw_L__FUNCSIG__ ||
         K == tok::kw___FUNCDNAME__;
}

And I would have to add #include "LangOptions.h" in TokenKinds.h.

578–582 ↗(On Diff #538025)

If you are still concerned about usage of LangOptions in this context, and at the same time you are not against of having such function, I can offer the following alternatives:

  1. Make it a local lambda function in ParseStringLiteralExpression
  2. Make it a TU local function in ParseExpr.cpp

I went with (2) to do simplification of condition in switch statement below. Marking thread resolved, as it's no longer relevant.

clang/include/clang/Sema/Sema.h
5715

Renamed to ExpandFunctionLocalPredefinedMsMacros. If you think my addition of Ms is redundant, let me know.

clang/lib/Parse/ParseExpr.cpp
1301–1312

That's a valid point. And by handling two tokens separately allows simplification of the condition. Adjusted code.

3282–3285

Right, removed mention of __func__.

3292–3295

To ensure generation of the same AST as before my changes (with PredefinedExpr):

|   | `-VarDecl <col:2, col:19> col:13 a 'const char[2]' cinit
|   |   `-PredefinedExpr <col:19> 'const char[2]' __FUNCTION__
|   |     `-StringLiteral <col:19> 'const char[2]' "f"

This actually brings a question: do we want to wrap StringLiterals (into PredefinedExpr) which were formed from concatenation of string-literals and some predefined identifier(s)? But it does not really make any sense: concatenated string does not represent any of PredefinedExpr.

clang/lib/Sema/Sema.cpp
1494

Sounds good to me. The only alternative I can come up with is getCurFunctionScopeDecl, but is it any better?

Adjusted code as per your suggestions.

clang/lib/Sema/SemaExpr.cpp
1976–1979

Changed, thank you!

2006–2009

Shortly, yes, kind of. ComputeName can handle TranslationUnitDecl in which case it returns an empty string. This way we implicitly (without additional code) create empty string-literal Tokens which can be handled in StringLiteralParser. And we cannot pass non-string-literal tokens into StringLiteralParser.

3710

Done, but I had to undo removal of

currentDecl = Context.getTranslationUnitDecl();

but it is not a big deal.

cor3ntin added inline comments.Jul 15 2023, 4:50 AM
clang/lib/Parse/ParseExpr.cpp
3288–3296

Looking at that again, i think you can remove the assert (the one at the top should stay). which you did, excellent. sorry about that!

3292–3295

Yes, this make sense. I think it's better than adding more complexity / memory footprint to StringLiteral.

clang/lib/Sema/SemaExpr.cpp
2034–2070

I think the later let you use a small vector so it's probably more efficient. I'm find with either

2051–2066

Thanks! I really would get rid of Stringify here - I struggle to see how a function could be produced that has characters that cannot appear in an identifier. even asserting isn't very useful.

clang/test/Sema/ms_predefined_expr.cpp
12

I think it's worth add a few more tests in clang/test/SemaCXX/predefined-expr.cpp checking concatenations

RIscRIpt updated this revision to Diff 540741.Jul 15 2023, 2:39 PM
RIscRIpt marked 7 inline comments as done.

Addressed review comments, rebased onto main

clang/lib/Sema/SemaExpr.cpp
2051–2066

Replaced Stringify with enclosure into raw-string-literal. Despite raw-string-literals are available starting from C++11, as far as I can see StringLiteralParser does not check for the standard.
Let me know, if you believe we cannot abuse this fact, and we should keep using "regular string-literal" for this purpose.

clang/test/Sema/ms_predefined_expr.cpp
12

Added tests to check values of these string literals, as well as moved ms_wide_predefined_expr.cpp tests into ms_predefined_expr.cpp.

tahonermann requested changes to this revision.Jul 19 2023, 11:10 AM
tahonermann added inline comments.
clang/include/clang/Basic/TokenKinds.h
87–93 ↗(On Diff #540427)

I don't think the Microsoft association is meaningful. Sure, some of the predefined macros are only treated differently when compiling in Microsoft mode, but some of those macros aren't Microsoft specific. Also, there might be macros provided by other implementations that we'll want to emulate some day.

clang/include/clang/Sema/Sema.h
5715

I don't think the Ms is redundant, but I do think it is unnecessary and slightly confusing (__FUNCTION__) is not a Microsoft macro.

clang/lib/Parse/ParseExpr.cpp
1307–1308

TokenIsLikeStringLiteral() checks MicrosoftExt, so the check here is redundant. This is an example of why I would like to see the MicrosoftExt checking pushed down to isFunctionLocalPredefinedMsMacro(); that really seems where that checking belongs to me.

3293–3294

Is there a missing check for MicrosoftExt here? This is another example of why I would prefer to see those checks pushed down to isFunctionLocalPredefinedMsMacro().

clang/lib/Sema/SemaExpr.cpp
1984–1985

This could use a comment since it isn't obvious why the translation unit declaration is used when not in a local scope declaration of some kind.

2006–2009

Ah, ok. And I see it even differentiates what name is produced for __PRETTY_FUNCTION__. That leaves me wondering what the right behavior should be at class and namespace scope, but maybe I'm better off not asking questions I don't want to know the answer to.

This revision now requires changes to proceed.Jul 19 2023, 11:10 AM
cor3ntin added inline comments.Jul 19 2023, 11:35 AM
clang/include/clang/Basic/TokenKinds.h
87–93 ↗(On Diff #540427)

I think it is, there is currently no way to link isFunctionLocalPredefinedMacro to the MS feature. "MSPredefinedMacro" is pretty self explanatory, same reason we most often - but not always - use GNU in the name of function related to GNU extensions.
There are enough similar-sounding features and extensions that we should try to make it clear which feature we are talking about.

tahonermann added inline comments.Jul 19 2023, 1:21 PM
clang/include/clang/Basic/TokenKinds.h
87–93 ↗(On Diff #540427)

Perhaps we still haven't found the right name then. With the name that I've been advocating, this function should also return true for tok::kw___PRETTY_FUNCTION__ even though that macro should not expand to something that can be concatenated with other string literals (whether compiling in Microsoft mode or not).

The Microsoft extension is the localized expansion to a string literal that can be concatenated with other string literals. We probably should isolate the conditions for that behavior to one place and if we do that, then I guess naming this function as specific to Microsoft mode is ok; we can always rename it later if it gets used for non-Microsoft extensions.

Here is my suggestion then:

// Return true if the token corresponds to a function local predefined
// macro that, in Microsoft modes, expands to a string literal (that can
// then be concatenated with other string literals).
inline bool isMsFunctionLocalStringLiteralMacro(TokenKind K, const LangOptions &langOpts) {
  return langOpts.MicrosoftExt && (
      K == tok::kw___FUNCTION__ || K == tok::kw_L__FUNCTION__ ||
      K == tok::kw___FUNCSIG__ || K == tok::kw_L__FUNCSIG__ ||
      K == tok::kw___FUNCDNAME__);
}

This will avoid the need for callers to check for MicrosoftExt; that is what I really want to avoid since it is easy to forget to do that check.

RIscRIpt added inline comments.Jul 19 2023, 3:24 PM
clang/include/clang/Basic/TokenKinds.h
87–93 ↗(On Diff #540427)
  1. By requiring user to pass LangOptions I think we can remove MS association (implying that there could be other non-msft cases in the future)
  2. We would have to include a LangOptions.h in TokenKinds.h, are we ok with this? Alternatively while this function is for msft cases only, we could pass bool MicrosoftExt

Personally, I like version with LangOptions and removal of MS.

RIscRIpt updated this revision to Diff 542412.Jul 20 2023, 3:25 AM
RIscRIpt marked 5 inline comments as done.

Addressed review comments, rebased onto main

clang/lib/Parse/ParseExpr.cpp
1301–1312

Having TokenIsLikeStringLiteral I think we can return to single case. I believe 3 values in condition is not difficult.

1307–1308

Let's consider this code:

if (!TokenIsLikeStringLiteral(NextToken(), getLangOpts())) {
  Res = Actions.ActOnPredefinedExpr(Tok.getLocation(), SavedKind);
  ConsumeToken();
  break;
}

The condition can be expanded as follows: !(isStringLiteral || (MsExt && NeededToken)). Let's consider MsExt is false, then we have: !isStringLiteral. So, if next token is StringLiteral we will try to concatenate it even without MsExt, which is invalid. Thus this seemingly redundant check is needed.

3293–3294

My intention for this assert is to ensure that we keep treating single predefined identifiers as PredefinedExpr (when we don't need to concat them) to preserve AST. I've adjusted the check to make it more clear.

RIscRIpt marked 3 inline comments as done.Jul 20 2023, 3:27 AM
tahonermann requested changes to this revision.Jul 21 2023, 2:26 PM

I think this is looking good. The only big thing I noticed is some code that looks like it should have been removed with the last set of changes. Otherwise, just some minor concerns and suggestions.

clang/lib/Parse/ParseExpr.cpp
1294–1296

I believe this code should be removed now; it is no longer reachable.

1301–1312

I think this is good, but how about a comment to make the intent more clear?

clang/lib/Sema/SemaExpr.cpp
1903
2015–2018

"EFLPM"? I'm assuming this is an attempt to avoid an accidental clash with the computed name. Does this suffice to ensure a clash can't happen? If not, can we do better? Per http://eel.is/c++draft/lex.string#nt:r-char and http://eel.is/c++draft/lex.charset#1, we have a lot of flexibility regarding which characters to use.

clang/test/Sema/ms_predefined_expr.cpp
115

How about testing some other encoding prefix combinations?

u"" __FUNCTION // Ok; UTF-16 string literal
u"" L__FUNCTION__ // ill-formed.
This revision now requires changes to proceed.Jul 21 2023, 2:26 PM
RIscRIpt updated this revision to Diff 543109.Jul 21 2023, 4:13 PM
RIscRIpt marked 5 inline comments as done.

Address review comments, rebase onto main

Thanks to your suggestion of testing different types, I realized clang does not support MSVC macros with u, u8, and U prefixes in addition to L__FUNCDNAME. By the way, clang replicates MSVC behavior a little bit incorrectly: L__FUNCTION__ is not a valid token for MSVC, however it becomes valid if used via macro concatenation (see #define WIDE).
I think addition of support of new macros as well as bug fix (if possible) should be submitted separately. I'll open-up an issue, and submit a PR later.

clang/lib/Sema/SemaExpr.cpp
2015–2018

At first I thought you were hinting me to use non-ascii characters for d-char-sequence. However, when I looked closely d-char-sequence is not as flexible as r-chars that you referenced: http://eel.is/c++draft/lex.string#nt:d-char and http://eel.is/c++draft/lex.charset#2

Here're my thoughts:

  1. I was afraid that there could be a possibility of a function name contain either quote (") or a backslash (\) despite it seemed impossible.
  2. At first I used Lexer::Stringify to make it bullet-proof. I didn't like it, neither did you (reviewers).
  3. I replaced Lexer::Stringify with raw-string-literal (d-char-sequence was chosen as acronym of current function name).
  4. Current EFLPM looks weird and cryptic. And we are limited to [lex.charset.basic] with exceptions (space and earlier chars are not allowed). I think, using a raw-string-literal without d-char-sequence would be enough, because problem could be caused only by two consecutive characters )", neither of which can appear in a function name.
clang/test/Sema/ms_predefined_expr.cpp
115

Good idea. I am not sure whether I should specify -std in the test command. I'll leave it without, because current default standard is C++17, and I believe it's not going to be decreased.

And I think there are enough tests of checking values of these macros. So, in tests for encoding I am going to check types.

cor3ntin added inline comments.Jul 24 2023, 12:33 AM
clang/include/clang/Basic/TokenKinds.h
87–93 ↗(On Diff #540427)

By requiring user to pass LangOptions I think we can remove MS association

I don't think there is any motivation to future proof against hypotheticals, we can always refactor later

We would have to include a LangOptions.h in TokenKinds.h

This makes me uneasy, but i think we can move the function to LiteralSupport.h and include that in ParseExpr.cpp.

tahonermann added inline comments.Jul 24 2023, 1:27 PM
clang/lib/Sema/SemaExpr.cpp
2015–2018

Sorry about that; you are absolutely right that d-char-sequence is (much) more constrained than r-char-sequence.

For __FUNCSIG__, the generated text can include arbitrary text. For example, given this declaration:

void f(decltype(sizeof("()"))*)

the macro value is:

void __cdecl f(decltype(sizeof ("()")) *)

which does contain )".

I think we should do more to avoid any possibility of the resulting string being unparseable because the resulting diagnostics will be completely inscrutable. The best way to do that would be to avoid trying to produce a parseable string literal in the first place. Based on that and given that Sema::ActOnStringLiteral() immediately uses StringLiteralParser to produce an expanded string literal, I think we would be better off moving this expansion there such that these macros can be expanded to already cooked string literals that don't need to be parsed at all. This will mean having to modify the StringLiteralParser constructor to accept a Decl* pointer that will be passed getCurLocalScopeDecl() in Sema::ActOnStringLiteral(). StringLiteralParser can then assert if it ever encounters one of these function local predefined tokens when it hasn't been provided a corresponding Decl* to use with them.

clang/test/Sema/ms_predefined_expr.cpp
115

Thank you for adding those. I think we should check values for a couple of cases as well. Consider a function name that contains 𐀀 (U+10000 LINEAR B SYLLABLE B008 A). In UTF-8, that character requires four code units (F0 90 80 80) while in UTF-16 it requires two (D800 DC00). We'll want to ensure that such characters are properly converted between encodings.

void test_encoding𐀀() {
  ASSERT_EQ(u"" __FUNCTION__, u"test_encoding𐀀");
}
tahonermann added inline comments.Jul 24 2023, 1:34 PM
clang/test/Sema/ms_predefined_expr.cpp
170

Here is another case to test. MSVC accepts this.

void test_static_assert() {
  static_assert(true, __FUNCTION__);
}
cor3ntin added inline comments.Jul 24 2023, 3:18 PM
clang/lib/Sema/SemaExpr.cpp
2015–2018

I didn't think about that scenario but in that case the original idea to use Lexer::Stringify seems like a cleaner solution to me.
Adding that kind of logic to LiteralSupport creates a lot of coupling for not much gain that I can see.

In that light, Lexer::Stringify seems like the least worse option.

tahonermann added inline comments.Jul 25 2023, 7:54 AM
clang/lib/Sema/SemaExpr.cpp
2015–2018

Yeah, that might be the simpler solution. That matches how __FILE__, __BASE_FILE__, and __FILE_NAME__ are handled in Preprocessor::ExpandBuiltinMacro().

RIscRIpt updated this revision to Diff 544085.Jul 25 2023, 1:32 PM
RIscRIpt marked 6 inline comments as done.

Addressed review comments, rebased onto main

Noticable changes:

  1. Diagnostics message
  2. Support of expansion of these attributes in 2nd argument of static_assert
  3. Moved helper-functions to Lex/LiteralSupport
  4. Changed back to usage of Lexer::Stringify for function name expansion

Local lit clang/test succeed for both Debug and Release builds.

clang/include/clang/Basic/TokenKinds.h
87–93 ↗(On Diff #540427)

Sounds good to me. Due to other changes I also moved TokenIsLikeStringLiteral there.

I don't think there is any motivation to future proof against hypotheticals, we can always refactor later

Yes, not a strong argument, but I'd like to keep current name if you are not against.

clang/lib/Sema/SemaExpr.cpp
2015–2018

Thanks for pointing out good counter-example.

In that light, Lexer::Stringify seems like the least worse option.

Unfortunately, yes.

In regards other possible implementations, I commented previously here: https://reviews.llvm.org/D153914#4500807

to modify the StringLiteralParser constructor to accept a Decl*

This requires dependency (and linkage) of Lex with AST.

I am going to add your counter-examples to test, and change implementation back to Lexer::Stringify

clang/test/Sema/ms_predefined_expr.cpp
170

Thanks for finding this! I had to do some adjustments to support it. In addition I changed diagnostics message because it no longer made sense, and removed assert about "single predefined identifiers shall be handled by ActOnPredefinedExpr". Other code around references of err_expected_string_literal do not seem to require support of this MS ext.

RIscRIpt updated this revision to Diff 544088.Jul 25 2023, 1:47 PM

Removed irrelevant changes

clang/lib/Sema/SemaExpr.cpp
1982

I think I should remove this assertion so this function would be usable without MS ext, on the other hand it would be a noop without MS ext.

I think I'm happy with this. I noted two code changes that I think are no longer needed and offered some final suggestions on some names.

clang/include/clang/Basic/DiagnosticSemaKinds.td
118–120

For consistency with other names, I'll suggest some alternative names here. I think these better reflect the actual diagnostic message.

  • ext_expand_predef_ms -> ext_string_literal_from_predefined
  • MicrosoftConcatPredefined -> MicrosoftStringLiteralFromPredefined
clang/include/clang/Basic/TokenKinds.h
22–23 ↗(On Diff #544085)

It looks like this change is no longer needed.

clang/lib/Basic/TokenKinds.cpp
14–15 ↗(On Diff #544085)

This change appears to no longer be needed.

tahonermann added inline comments.Jul 25 2023, 2:07 PM
clang/lib/Sema/SemaExpr.cpp
1982

I suggest leaving it in solely because use of the function does impose some overhead in the construction of the std::vector that is returned. If that overhead can be avoided (I guess by passing the container to populate by reference as a separate argument and then return either the original ArrayRef or a new one referencing the populated container), then this could be made a no-op and the checks for MicrosoftExt at the call sites could be removed.

RIscRIpt updated this revision to Diff 544114.Jul 25 2023, 3:06 PM
RIscRIpt marked 4 inline comments as done.

Rename diagnostic messages constants

tahonermann accepted this revision.Jul 25 2023, 6:48 PM

I'm happy with this. I noted one last rename suggestion to maintain consistency, but am accepting. Please give Corentin a final chance to raise any concerns. Thank you for being so responsive through all the review iterations!

clang/include/clang/Basic/DiagnosticGroups.td
1210

We should probably name the diag group to match.

  • microsoft-concat-predefined -> microsoft-string-literal-from-predefined
This revision is now accepted and ready to land.Jul 25 2023, 6:48 PM
RIscRIpt updated this revision to Diff 544235.Jul 26 2023, 12:19 AM
RIscRIpt marked an inline comment as done.

Rename MicrosoftStringLiteralFromPredefined

Thank you for the review! I apologize for missing the small details; I should have noticed and addressed them myself.

RIscRIpt updated this revision to Diff 548274.Aug 8 2023, 10:21 AM

Rebased onto main, run local lit clang/tests. (bump for @cor3ntin)
As far as I understand this shall be merged into main by maintainers.
Please let me know if something else is expected from me.

Rebased onto main, run local lit clang/tests. (bump for @cor3ntin)
As far as I understand this shall be merged into main by maintainers.
Please let me know if something else is expected from me.

We will commit on your behalf, what name/email do you want us to use?
Thanks!

aaron.ballman added inline comments.Aug 8 2023, 12:08 PM
clang/lib/Sema/Sema.cpp
1494–1505

Made the function const because it's not writing to any fields, removed else because of preceding return.

clang/lib/Sema/SemaExpr.cpp
1987

Matching our usual naming conventions.

1999–2001

This will miss the diagnostic if the current declaration is a namespace instead of at file scope, right?

2034–2036

Probably worth a comment here explaining that ExpandedToks acts as the backing store for StringToks, which is why both are being assigned to here.

RIscRIpt updated this revision to Diff 548366.Aug 8 2023, 3:03 PM
RIscRIpt marked 4 inline comments as done.

Address review comments, rebase onto main

We will commit on your behalf, what name/email do you want us to use?
Thanks!

Thanks for asking; keep the name/email as-is in the commit you see: Author: Richard Dzenis <dzenis@richard.lv>

clang/lib/Sema/Sema.cpp
1494–1505

I would love to, and I tried. But unfortunately it's not possible without introducing more changes: all the member functions (except getCurFunctionOrMethodDecl()) are non-const.
Removed else.

clang/lib/Sema/SemaExpr.cpp
1999–2001

Right. But getCurLocalScopeDecl() returns function scope at most. See Note in a code comment above.

2006–2009

A diagnostic is issued if the call to getCurScopeDecl() returned a translation unit declaration (at least in Microsoft mode). Does it make sense to pass a pointer to a TranslationUnitDecl here?

aaron.ballman added inline comments.Aug 8 2023, 3:28 PM
clang/lib/Sema/Sema.cpp
1494–1505

Ah, thanks!

clang/lib/Sema/SemaExpr.cpp
1999–2001

Ah, exactly right, thank you!

2034–2035
RIscRIpt updated this revision to Diff 548502.Aug 9 2023, 12:53 AM
RIscRIpt marked an inline comment as done.

Updated comment; rebased onto main

clang/lib/Sema/SemaExpr.cpp
2034–2035

Thanks, that's better. Changed in both places.

This revision was automatically updated to reflect the committed changes.
clang/test/Sema/ms_predefined_expr.cpp