This is an archive of the discontinued LLVM Phabricator instance.

[clang] Make predefined expressions string literals under -fms-extensions
ClosedPublic

Authored by aeubanks on Mar 23 2023, 3:18 PM.

Diff Detail

Event Timeline

aeubanks created this revision.Mar 23 2023, 3:18 PM
Herald added a project: Restricted Project. · View Herald Transcript
aeubanks requested review of this revision.Mar 23 2023, 3:18 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 23 2023, 3:18 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript

+mstorsjo is this okay for mingw mode too?

clang/lib/Sema/SemaExpr.cpp
3558

Since this becomes an "else after return" it could be dropped.

clang/test/CodeGen/predefined-expr.c
4 ↗(On Diff #507896)

Both invocations use the Itanium abi, so the check prefix is a bit confusing. How about DEFAULT and MS?

clang/test/Sema/ms_predefined_expr.cpp
5

ultra nit: the 'void' param makes this look like c code. Should it go in a .c file instead of .cpp?

clang/unittests/ASTMatchers/ASTMatchersNodeTest.cpp
1062 ↗(On Diff #507896)

Could the same check be used in ASTImporterTest.cpp?

aaron.ballman requested changes to this revision.Mar 24 2023, 6:03 AM

This should also have a release note, eventually.

clang/lib/Sema/SemaExpr.cpp
3585–3590

This is incorrect -- these are still predefined expressions even if they're a string literal. Otherwise, we lose AST fidelity and things like the predefinedExpr() AST matcher don't work (https://github.com/llvm/llvm-project/blob/main/clang/include/clang/ASTMatchers/ASTMatchers.h#L2697), and -ast-print will print the wrong thing (https://github.com/llvm/llvm-project/blob/main/clang/lib/AST/StmtPrinter.cpp#L1248).

This revision now requires changes to proceed.Mar 24 2023, 6:03 AM

+mstorsjo is this okay for mingw mode too?

I believe the current form of the patch is fine - i.e. disabled by default in mingw mode, but enabled if the extra MS language extensions are enabled. (I didn't check myself how those options map to LangOpts.MicrosoftExt but I presume it works as I described above.)

clang/include/clang/Testing/TestClangConfig.h
61 ↗(On Diff #507896)

I'm not entirely sure of the use context of this function (only specific tests?), but mingw targets with a differen triple are win32 too. But then they could also be using an entirely different architecture than x86_64, so I guess this might be ok too if we're ready to tweak it when a need arises?

aeubanks added inline comments.Mar 24 2023, 7:58 AM
clang/lib/Sema/SemaExpr.cpp
3585–3590

how would you structure this? an almost identical PredefinedExpr class that also subclasses StringLiteral? or special case all the places that check for StringLiteral to also check for PredefinedExpr (the case I was mostly looking at was initializing a char array with predefined expressions but presumably there are more places that matter)

aaron.ballman added inline comments.Mar 24 2023, 8:21 AM
clang/lib/Sema/SemaExpr.cpp
3585–3590

We can't inherit from StringLiteral (it's a final class because it has TrailingObjects), so that might be the only viable approach. However, there's a fair number of places in the compiler where we care if something is or isn't a StringLiteral, so I can see why we'd want to avoid that if possible.

Oye, but it seems that even MSVC is not consistent with whether the predefined expression is a string literal or not: https://godbolt.org/z/jzY8cz79d

cor3ntin added inline comments.
clang/lib/Sema/SemaExpr.cpp
3585–3590

I think these should be plain old string literals, but for the purpose or AST matchers and such, maybe we could have a "predefined macro" bit, or even an additional object that stores the name.

aeubanks added inline comments.Mar 24 2023, 10:54 AM
clang/lib/Sema/SemaExpr.cpp
3585–3590

Oye, but it seems that even MSVC is not consistent with whether the predefined expression is a string literal or not: https://godbolt.org/z/jzY8cz79d

the other predefined expressions work, that looks like a bug on MSVC's part

I think these should be plain old string literals, but for the purpose or AST matchers and such, maybe we could have a "predefined macro" bit, or even an additional object that stores the name.

what do you mean by "an additional object that stores the name"?

I started trying to add a bit to StringLiteral that signifies if it's an MSVC predefined expression. being new to ast matchers, I'm having trouble figuring out how to make a matcher that matches both PredefinedExpr and some StringLiterals but also behaves like a VariadicDynCastAllOfMatcher.

something I noticed in one of the tests [1] is that there's a check

predefinedExpr(hasType(asString("const char[4]")),
                                     has(stringLiteral()))

thinking about the has(stringLiteral()), if somebody wants to get the literal for a predefined expr, presumably right now they're looking for a string literal child, but if we directly match a string literal then that would break? WDYT?

[1] https://github.com/llvm/llvm-project/blob/75ca15fcbb2e1b3671e41f971a000c6d59f5e5ae/clang/unittests/ASTMatchers/ASTMatchersNodeTest.cpp#L1064

rsmith added a subscriber: rsmith.Mar 24 2023, 3:21 PM
rsmith added inline comments.
clang/lib/Sema/SemaExpr.cpp
3585–3590

I wonder if we could keep the representation as-is, but add an "is transparent" bit on PredefinedExpr that's set for string literal predefined expressions under -fms-extensions, and make IgnoreParens step over transparent PredefinedExprs. That seems like it'd exactly match the MS behavior, which seems to allow (eg) __func__ to appear anywhere a parenthesized string literal can appear.

aaron.ballman added inline comments.Mar 27 2023, 10:57 AM
clang/lib/Sema/SemaExpr.cpp
3585–3590

That seems like a good approach to me. We've got spare bits in PredefinedExprBitfields, so this won't impact AST size, which is nice.

aeubanks updated this revision to Diff 513335.Apr 13 2023, 1:13 PM

use IgnoreParens instead

aeubanks updated this revision to Diff 513337.Apr 13 2023, 1:15 PM

add comment

clang/lib/Sema/SemaExpr.cpp
3585–3590

much much cleaner, thanks!

I think you're missing changes in ASTReaderStmt.cpp and ASTWriterStmt.cpp, so serialization through modules or PCH won't work without that.

clang/include/clang/AST/Expr.h
1996

I think we should keep PredefinedExpr general, so how about IsTransparent instead? Then we can add a comment above Create() that explains what IsTransparent does.

(This makes it easier for us to add predefined expressions that aren't string literals but are still transparently handled.)

aeubanks updated this revision to Diff 518548.May 1 2023, 1:19 PM

address comments

aeubanks marked an inline comment as done.May 1 2023, 1:20 PM

I think you're missing changes in ASTReaderStmt.cpp and ASTWriterStmt.cpp, so serialization through modules or PCH won't work without that.

done. how would this sort of change be tested?

I think you're missing changes in ASTReaderStmt.cpp and ASTWriterStmt.cpp, so serialization through modules or PCH won't work without that.

done. how would this sort of change be tested?

Thanks! Typically with a test using modules or PCH, so something along these lines: https://github.com/llvm/llvm-project/blob/main/clang/test/Modules/cxx20-10-5-ex1.cpp and/or https://github.com/llvm/llvm-project/blob/main/clang/test/PCH/cxx-namespaces.cpp

aeubanks updated this revision to Diff 518777.May 2 2023, 9:50 AM

add module test

aaron.ballman added inline comments.May 2 2023, 11:57 AM
clang/test/Modules/predefined.cpp
6 ↗(On Diff #518777)

Er, should we have a -verify on this as well as // expected-no-diagnostics?

clang/test/Sema/ms_predefined_expr.cpp
6–10

Apologies for not noticing this earlier, but because this code isn't portable (for the standard predefined identifiers), I think we should also have a -pedantic test that ensures we get a diagnostic about accepting this code being a Microsoft extension.

I would recommend something along the lines of: initializing an array from a '%0' predefined identifier is a Microsoft extension put into a new warning group named -Wmicrosoft-init-from-predefined which is added to the -Wmicrosoft warning group.

aeubanks updated this revision to Diff 518887.May 2 2023, 3:29 PM

add warning

added the warning, not sure if this needs more tests for testing the interaction between -pedantic, -Wmicrosoft, -Wmicrosoft-init-from-predefined or if that's already assumed to work

clang/test/Modules/predefined.cpp
6 ↗(On Diff #518777)

isn't that tested by the Sema test?

added the warning, not sure if this needs more tests for testing the interaction between -pedantic, -Wmicrosoft, -Wmicrosoft-init-from-predefined or if that's already assumed to work

That's generally assumed to work, really the only question is whether we want Extension (only warned in -pedantic) or ExtWarn (warned by default). I think most of our Microsoft extension warnings are ExtWarn and it seems defensible for this to be on by default, but it'd be nice to know just how chatty this is on real world code bases too. I have the sneaking suspicion this code pattern doesn't come up often enough for this to be "too chatty", but if we find it triggers on popular libraries we might want to consider dialing it back to just Extension.

clang/test/Modules/predefined.cpp
6 ↗(On Diff #518777)

Not quite. The Sema test does test a lot of the functionality, but this is testing whether we import the AST node properly and can still use it, so it's testing a different code path for generating the AST nodes used for doing semantic checks.

aeubanks updated this revision to Diff 519215.May 3 2023, 12:46 PM

add -verify to module test

Precommit CI found some related failures that need to be addressed, but the functional changes are all looking good to me. Be sure to also add a release note for the fix as well.

aeubanks updated this revision to Diff 519613.May 4 2023, 12:40 PM

only diagnose if we're initializing an array

aaron.ballman accepted this revision.May 5 2023, 9:18 AM

Presuming that the issues found by precommit CI were resolved (current failures are unrelated to your changes), this LGTM but still needs a release note.

This revision is now accepted and ready to land.May 5 2023, 9:18 AM

added a release note

This revision was landed with ongoing or failed builds.May 7 2023, 11:46 AM
This revision was automatically updated to reflect the committed changes.
aeubanks reopened this revision.May 9 2023, 10:30 AM

the following crashes with this patch:

struct StringRef {
  StringRef(const char *);
};
template <typename T>
StringRef getTypeName() {
  StringRef s = __func__;
}

clang -cc1 -fms-extensions -std=c++17 -x c++ a.cpp -fsyntax-only

This revision is now accepted and ready to land.May 9 2023, 10:30 AM

Sorry for the late comments.

clang/lib/Sema/SemaInit.cpp
167–178

The duplication between this and IgnoreParensSingleStep is a code smell; can we expose IgnoreParensSingleStep and call it from here?

8446

We won't reach this case for const char arr[] = {__func__}; because in that case Args[0] will be an InitListExpr instead. Right now we accept this with no warning in MS extensions mode:

void f() {
  const char c[] = {__func__};
}

We should perform this check in the handling of SK_StringInit instead, in order to properly address that case.

Also, this will only catch the case where the argument is an unparenthesized predefined expression. You'll need to do something like repeatedly calling IgnoreParensSingleStep looking for a PredefinedExpr to cope with things like:

void g() {
  const char c[] = (__func__);
  const char d[] = _Generic(0, int: __func__);
}