This is an archive of the discontinued LLVM Phabricator instance.

[Clang] CWG1473: do not err on the lack of space after operator""
AcceptedPublic

Authored by rZhBoYao on Jun 16 2023, 10:43 AM.

Details

Summary

[Clang] CWG1473: do not err on the lack of space after operator""

In addition:

  1. Fix tests for CWG2521 deprecation warning.
  2. Enable -Wdeprecated-literal-operator by default.

Side note: GCC stopped diagnosing this since GCC 4.9

Diff Detail

Event Timeline

rZhBoYao created this revision.Jun 16 2023, 10:43 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 16 2023, 10:43 AM
Herald added a subscriber: jvesely. · View Herald Transcript
rZhBoYao requested review of this revision.Jun 16 2023, 10:43 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 16 2023, 10:43 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
rZhBoYao updated this revision to Diff 538688.Jul 10 2023, 9:12 AM
rZhBoYao edited the summary of this revision. (Show Details)

Reverse the dependency chain.

rZhBoYao updated this revision to Diff 538707.Jul 10 2023, 9:52 AM
Endill added inline comments.Jul 11 2023, 6:18 AM
clang/test/CXX/drs/dr14xx.cpp
491

Can you move this down, so that offset is negative (after @)?

rZhBoYao updated this revision to Diff 540104.Jul 13 2023, 10:40 AM
rZhBoYao marked an inline comment as done.
rZhBoYao edited the summary of this revision. (Show Details)
rZhBoYao added a reviewer: Restricted Project.Jul 17 2023, 11:24 AM

There seems to be no clear objection to this and https://reviews.llvm.org/D152632 and the CI are passing for both. Any chance that I merge these two before llvm 17 branch out (IIRC the next Monday)?

Gentle ping :)

aaron.ballman accepted this revision.Aug 15 2023, 6:51 AM

LGTM aside from some minor concerns.

There seems to be no clear objection to this and https://reviews.llvm.org/D152632 and the CI are passing for both. Any chance that I merge these two before llvm 17 branch out (IIRC the next Monday)?

As this isn't fixing a regression, I think it'd be better to let it bake a while longer in Clang 18 (unless there's some extenuating circumstances where we need to land this sooner).

clang/test/SemaCXX/reserved-identifier.cpp
91–92

No need to have a duplicate test.

clang/www/cxx_dr_status.html
8648
This revision is now accepted and ready to land.Aug 15 2023, 6:51 AM
rZhBoYao updated this revision to Diff 551138.Aug 17 2023, 8:08 AM
rZhBoYao marked 2 inline comments as done.

Thank Aaron and Vlad for reviewing this! Just updating the diff to reflect the final version.

This revision was landed with ongoing or failed builds.Aug 17 2023, 8:12 AM
This revision was automatically updated to reflect the committed changes.

Unfortunately the option -Wno-reserved-user-defined-literal fails after this:

#define MYTHING "_something_"

const char* f() {
  return "ONE"MYTHING"TWO";
}
$ clang -Wno-reserved-user-defined-literal repro.cxx
repro.cxx:4:15: error: no matching literal operator for call to 'operator""MYTHING' with arguments of types 'const char *' and 'unsigned long', and no matching literal operator template
    4 |   return "ONE"MYTHING"TWO";
      |               ^
1 error generated.

Unfortunately the option -Wno-reserved-user-defined-literal fails after this:

#define MYTHING "_something_"

const char* f() {
  return "ONE"MYTHING"TWO";
}
$ clang -Wno-reserved-user-defined-literal repro.cxx
repro.cxx:4:15: error: no matching literal operator for call to 'operator""MYTHING' with arguments of types 'const char *' and 'unsigned long', and no matching literal operator template
    4 |   return "ONE"MYTHING"TWO";
      |               ^
1 error generated.

This is conforming right? Correct me if I'm wrong. My reading of https://eel.is/c++draft/lex.pptoken#3.3 is that "ONE"MYTHING"TWO" is a single preprocessing-token during phase 3 (https://eel.is/c++draft/lex.phases#1.3). Can @aaron.ballman confirm this?

Unfortunately the option -Wno-reserved-user-defined-literal fails after this:

#define MYTHING "_something_"

const char* f() {
  return "ONE"MYTHING"TWO";
}
$ clang -Wno-reserved-user-defined-literal repro.cxx
repro.cxx:4:15: error: no matching literal operator for call to 'operator""MYTHING' with arguments of types 'const char *' and 'unsigned long', and no matching literal operator template
    4 |   return "ONE"MYTHING"TWO";
      |               ^
1 error generated.

This is conforming right? Correct me if I'm wrong. My reading of https://eel.is/c++draft/lex.pptoken#3.3 is that "ONE"MYTHING"TWO" is a single preprocessing-token during phase 3 (https://eel.is/c++draft/lex.phases#1.3). Can @aaron.ballman confirm this?

The diagnostic behavior is correct. MYTHING doesn't get expanded until phase 4 (http://eel.is/c++draft/lex.phases#1.4), so this appears as "ONE"MYTHING as a single preprocessor token: https://eel.is/c++draft/lex.ext#nt:user-defined-string-literal and that token is an invalid UDL.

Unfortunately the option -Wno-reserved-user-defined-literal fails after this:

#define MYTHING "_something_"

const char* f() {
  return "ONE"MYTHING"TWO";
}
$ clang -Wno-reserved-user-defined-literal repro.cxx
repro.cxx:4:15: error: no matching literal operator for call to 'operator""MYTHING' with arguments of types 'const char *' and 'unsigned long', and no matching literal operator template
    4 |   return "ONE"MYTHING"TWO";
      |               ^
1 error generated.

This is conforming right? Correct me if I'm wrong. My reading of https://eel.is/c++draft/lex.pptoken#3.3 is that "ONE"MYTHING"TWO" is a single preprocessing-token during phase 3 (https://eel.is/c++draft/lex.phases#1.3). Can @aaron.ballman confirm this?

The diagnostic behavior is correct. MYTHING doesn't get expanded until phase 4 (http://eel.is/c++draft/lex.phases#1.4), so this appears as "ONE"MYTHING as a single preprocessor token: https://eel.is/c++draft/lex.ext#nt:user-defined-string-literal and that token is an invalid UDL.

IIUC, the question is not whether the diagnostic is correct, but rather why -Wno-reserved-user-defined-literal does not workaround the breakage. Is that right?

An example of this in the wild is older versions of swig: https://github.com/swig/swig/blob/939dd5e1c8c17e5f8b38747bf18e9041ab5f377e/Source/Modules/php.cxx#L1724

The diagnostic behavior is correct. MYTHING doesn't get expanded until phase 4 (http://eel.is/c++draft/lex.phases#1.4), so this appears as "ONE"MYTHING as a single preprocessor token: https://eel.is/c++draft/lex.ext#nt:user-defined-string-literal and that token is an invalid UDL.

Correct per the specification, but the purpose of -Wno-reserved-user-defined-literal is to precisely to permit such pre-C++11 code like the example above to continue to compile. This change breaks that option. (Note that it is an error-by-default "warning".)

If proper spec-conformance means we can no longer support the ability to allow such out-of-spec pre-c++11 code to work anymore, that's probably OK...but, in that case, we also need to eliminate the warning option, and mention the change in the release notes.

However, it's unclear to me why we need to do so, since the spec says you're not allowed to define UDL without an underscore prefix
float operator ""E(const char*); // ill-formed, no diagnostic required:

Yet, what this change does, and why it breaks the above functionality, is that it now allows defining and calling user-defined literal operators which don't begin with an underscore. That seems like a potentially undesirable change?

And, if that _was_ an intended change, then we have other diagnostics which need to be fixed up now:

const char*operator ""foo(const char *, unsigned long);
const char* f() {
  return "hi"foo;
}

emits <source>:1:12: warning: user-defined literal suffixes not starting with '_' are reserved; no literal will invoke this operator [-Wuser-defined-literals] and yet, now it does invoke the operator...

If proper spec-conformance means we can no longer support the ability to allow such out-of-spec pre-c++11 code to work anymore, that's probably OK...but, in that case, we also need to eliminate the warning option, and mention the change in the release notes.

Works in C++98 mode tho. TIL people exploit this to mix pre-C++11 code into modern C++. I agree that a reminder for those people in the release note is needed.

And, if that _was_ an intended change, then we have other diagnostics which need to be fixed up now

Agreed

IIUC, the question is not whether the diagnostic is correct, but rather why -Wno-reserved-user-defined-literal does not workaround the breakage. Is that right?

An example of this in the wild is older versions of swig: https://github.com/swig/swig/blob/939dd5e1c8c17e5f8b38747bf18e9041ab5f377e/Source/Modules/php.cxx#L1724

We simply stop pretending a whitespace precedes an invalid ud-suffix as that affects the grammar production and therefore diagnosis in Sema.
IMHO, people should stop using -Wno-reserved-user-defined-literal and exploiting the addition of a whitespace to mingle pre-c++11 and post-c++11 code.
What if a programmer is really trying to call operator""b here (albeit ill-formed):

const char* operator""b(const char*, decltype(sizeof 0));
const char* f() {
#define b "a"
  return "ONE"b; // NOW: IFNDR but calls operator""b
                 //
                 // BEFORE: string concat by exploiting the impl of
                 // ext_reserved_user_defined_literal (controlled by
                 // -Wreserved-user-defined-literal diag group)
}

What if a programmer is really trying to call operator""b here (albeit ill-formed)

Because that code is ill-formed, Clang diagnosed it with an error by default. Isn't that preferable to accepting it by default?

And disabling the error treats it as string-concat, because that enables users to remain compatible with code written assuming C++98 semantics for macros in string concatenation. (This support was a quite-intentionally added feature, not an accident of implementation!)

FWIW, GCC does the same thing, but with an on-by-default-warning that doesn't even default to error severity. I do think it's OK to remove support for this extension if spec-compliance requires it, but AFAICT, that's not the case here, so I think it would be preferable to preserve the previous behavior.

I’ll see what I can do regarding reviving the string concat behavior. It feels like that a more refined treatment than before can be achieved. Maybe adds an imaginary preceding whitespace only when we can find a macro with the same name.

The other common breakage I'm seeing is code that hasn't yet migrated from the "PRI" format macros, of which there's an example in LLVM itself: https://github.com/llvm/llvm-project/blob/6a0e536ccfef1f7bd64ee4153b4efc0aeecf28d4/clang/test/SemaCXX/cxx11-compat.cpp#L38

e.g.: https://godbolt.org/z/v3boc9E6T

IMHO, people should stop using -Wno-reserved-user-defined-literal and exploiting the addition of a whitespace to mingle pre-c++11 and post-c++11 code.

I mostly agree with this, at least the spirit of it, but do we usually remove the -Wno ability to suppress a common error because we don't like it? (Or justify it post-facto if we accidentally did that?) If removing this is important for some other aspect of clang development, it would be nice to have notice ahead of time that this option will go away. I guess +1 to what James said.

Sorry, I don't want to mislead: I just mean there's a handy example in clang's unit tests. I don't see any instances of this in LLVM non-test code.

hans added a subscriber: hans.Aug 22 2023, 11:07 AM

Given the concerns raised (the PRIuS one in https://godbolt.org/z/v3boc9E6T seems like a good example), and that the fix in D158372 doesn't seem straight-forward, could this be reverted to unbreak things until a revised version is ready?

@aaron.ballman what do you think?

Given the concerns raised (the PRIuS one in https://godbolt.org/z/v3boc9E6T seems like a good example), and that the fix in D158372 doesn't seem straight-forward, could this be reverted to unbreak things until a revised version is ready?

@aaron.ballman what do you think?

I agree this should be reverted from 17.x so we have more time to consider an appropriate path forward. Supporting evidence that this will likely be disruptive: https://sourcegraph.com/search?q=context:global+%5E%28%22%5B%5E%22%5D%2B%22PRI.*%29%24+lang:C%2B%2B+-file:.*test.*&patternType=regexp&case=yes&sm=0&groupBy=repo

hans added a comment.Aug 22 2023, 11:42 AM

I agree this should be reverted from 17.x so we have more time to consider an appropriate path forward. Supporting evidence that this will likely be disruptive: https://sourcegraph.com/search?q=context:global+%5E%28%22%5B%5E%22%5D%2B%22PRI.*%29%24+lang:C%2B%2B+-file:.*test.*&patternType=regexp&case=yes&sm=0&groupBy=repo

I don't believe this is on 17.x. I'm suggesting reverting to green on trunk.

I agree this should be reverted from 17.x so we have more time to consider an appropriate path forward. Supporting evidence that this will likely be disruptive: https://sourcegraph.com/search?q=context:global+%5E%28%22%5B%5E%22%5D%2B%22PRI.*%29%24+lang:C%2B%2B+-file:.*test.*&patternType=regexp&case=yes&sm=0&groupBy=repo

I don't believe this is on 17.x. I'm suggesting reverting to green on trunk.

Heh, review fatigue caught me; I even mentioned on this review that it should not go into 17.x so we get more bake time with it! Sorry for that.

I think it is beneficial for us to get more feedback on problems caused by the changes, in case they help us decide how we want to move forward solving them. However, if the changes on trunk are sufficiently disruptive that we probably won't get more (novel) feedback anyway, then I think a revert is reasonable.

rnk added a subscriber: rnk.Aug 22 2023, 6:21 PM

I went ahead and pushed a revert of this, since this change was pretty disruptive, at least in our experience. I could be convinced that it's worth sunsetting this extension for accepting C-style format string macro code, but that doesn't seem like the original intent of this change. Maybe that's the right thing to do, but it should be an intentionally considered change.

jyknight reopened this revision.Sep 22 2023, 5:54 AM

Reopening the review since it ended up reverted, but I do think we DO want to have at least some of the changes proposed here (and/or from the pending D158372). At the very least, the existing diagnostics are currently quite misleading/wrong, and the work here was helping to improve that.

Taking this example:

float operator ""XXX(const char *);

It results in three diagnostics:

<source>:1:18: error: invalid suffix on literal; C++11 requires a space between literal and identifier [-Wreserved-user-defined-literal]
    1 | float operator ""XXX(const char *);
      |                  ^
      |                   
<source>:1:18: warning: identifier 'XXX' preceded by whitespace in a literal operator declaration is deprecated [-Wdeprecated-literal-operator]
    1 | float operator ""XXX(const char *);
      |       ~~~~~~~~~~~^~~
      |       operator""XXX
<source>:1:7: warning: user-defined literal suffixes not starting with '_' are reserved; no literal will invoke this operator [-Wuser-defined-literals]
    1 | float operator ""XXX(const char *);
      |       ^

The first and third are enabled by default, the second is not.

The first only describes what to do if you wanted it to be string-concat (which would be invalid here...), not if you wanted it to be an UDL-operator definition. And it's not clear that "identifier" really means "macro expanding to a string that I assume you meant" in the message. I think we should not emit that message ONLY if there is a macro named "XXX".

The second is just incorrect -- there's already no space in the user’s source-code, the compiler just added one internally!

The third seems OK, although if we actually do permit calling such UDLs, would want to remove the last phrase...

Then, consider:

float operator ""XXX(const char *);
float x1 = ""XXX;

One might reasonably expect to build when -Wno-reserved-user-defined-literal is specified. But, currently, it does not.

I think a reasonable compromise behavior would be:

  1. If the literal suffix is nonstandard (that is: doesn't start with _, and isn't a standard-defined UDL suffix), then try macro-expanding the suffix.
  2. If there is an expansion, use the macro-expansion, instead of treating it as UDL, and diagnose as an error-by-default under -Wreserved-user-defined-literal.
  3. On the other hand, if there is no macro-expansion, then let it be parsed as a nonstandard UDL, and emit a different diagnostic message under error-by-default -Wreserved-user-defined-literal.
This revision is now accepted and ready to land.Sep 22 2023, 5:54 AM