This is an archive of the discontinued LLVM Phabricator instance.

[Clang] Implement P2741R3 - user-generated static_assert messages
ClosedPublic

Authored by cor3ntin on Jul 1 2023, 3:36 PM.

Diff Detail

Event Timeline

cor3ntin created this revision.Jul 1 2023, 3:36 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 1 2023, 3:36 PM
Herald added a subscriber: arphaman. · View Herald Transcript
cor3ntin requested review of this revision.Jul 1 2023, 3:36 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 1 2023, 3:36 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
cor3ntin retitled this revision from [WIP][Clang] Implement P2741R3 - user-generated static_assert messages Note that a few test fails because this patch depends on https://reviews.llvm.org/D105759 There are some issues in the wording, the implementation assume that both size and... to [WIP][Clang] Implement P2741R3 - user-generated static_assert messages.Jul 1 2023, 3:39 PM
cor3ntin edited the summary of this revision. (Show Details)
tbaeder added inline comments.
clang/lib/AST/ExprConstant.cpp
16387

Any reason the Expr* pointers here can't be const?

cor3ntin updated this revision to Diff 536570.Jul 1 2023, 11:33 PM
cor3ntin edited the summary of this revision. (Show Details)

Add feature test macro and complete ODRDiagsEmitter change

cor3ntin updated this revision to Diff 536571.Jul 2 2023, 12:06 AM

Make EvaluateCharPointerAsString parameters const.

cor3ntin added inline comments.Jul 2 2023, 12:09 AM
clang/lib/AST/ExprConstant.cpp
16387

Nope!. Note that I'm not entirely sold on this interface, having a function in Expr taking a bunch of function pointers does not feel amazing,
but Sema can't access to evaluation functions, so one way or another this functions needs to end up in ExprConstants.cpp and making it an Expr member doesn't seem more terribler than the alternative

shafik added a reviewer: Restricted Project.Jul 6 2023, 2:33 PM
cor3ntin updated this revision to Diff 538154.Jul 7 2023, 8:29 AM

Rebase.
This is now ready for review.

Note that after discussion with CWG, the consensus seems to be that the wording
is fine, an implementation has to behave

As if the full expression Message.data()[I] is called for each character.
Obviously this is not a viable implementation strategy so we materialize
intermediate expressions.

tbaeder added inline comments.Jul 7 2023, 8:43 AM
clang/lib/Sema/SemaDeclCXX.cpp
16896
16901

What if the message is StringLiteral but getCharByteWidth() doesn't return 1? We would get the err_static_assert_invalid_message because` !RD` is true, right?

16903
16915
16916
cor3ntin retitled this revision from [WIP][Clang] Implement P2741R3 - user-generated static_assert messages to [Clang] Implement P2741R3 - user-generated static_assert messages.Jul 7 2023, 8:59 AM
cor3ntin edited the summary of this revision. (Show Details)
cor3ntin updated this revision to Diff 538350.Jul 8 2023, 4:19 AM
cor3ntin marked 4 inline comments as done.

Address Timm's feedback

cor3ntin added inline comments.Jul 8 2023, 4:22 AM
clang/lib/Sema/SemaDeclCXX.cpp
16901

I forgot to change that while merging. StringLiteral are guaranteed to be unevaluated there

16903

Alas, LookupQualifiedName takes a non-const argument

barannikov88 added inline comments.Jul 8 2023, 9:54 AM
clang/include/clang/Sema/Sema.h
3893

Appears unused.

clang/lib/AST/ExprConstant.cpp
16413

This relies on host's CHAR_BIT >= target's CHAR_BIT, which isn't true for my target. Could you add an assertion?

clang/lib/Sema/SemaDeclCXX.cpp
16963
clang/test/SemaCXX/static-assert-cxx26.cpp
128

Should there be (negative?) tests with non-const data/size members and incorrect number of parameters? These conditions are checked in FindMember.

According to the current wording, the static_assert-message is either unevaluated string or an expression evaluated at compile time.
Unevaluated strings don't allow certain escape sequences, but if I wrap the string in a string_view-like class, I'm allowed to use any escape sequeces, including '\x'.
Moreover, wrapping a string in a class would change its encoding. Unevaluated strings are displayed as written in the source (that is, UTF-8), while wrapped strings undergo conversion to execution encoding (e.g. EBCDIC) and then printed in system locale, leading to mojibake.

According to the current wording, the static_assert-message is either unevaluated string or an expression evaluated at compile time.
Unevaluated strings don't allow certain escape sequences, but if I wrap the string in a string_view-like class, I'm allowed to use any escape sequeces, including '\x'.
Moreover, wrapping a string in a class would change its encoding. Unevaluated strings are displayed as written in the source (that is, UTF-8), while wrapped strings undergo conversion to execution encoding (e.g. EBCDIC) and then printed in system locale, leading to mojibake.

Not quite.
Unevaluated strings are always UTF-8 ( regardless of source file encoding). Evaluated strings are in the literal encoding which is always UTF-8 for clang.
This will change whenever we allow for different kinds of literal encodings per this RFC https://discourse.llvm.org/t/rfc-enabling-fexec-charset-support-to-llvm-and-clang-reposting/71512/1

If and when that is the case we will have to convert back to UTF-8 before displaying - and then maybe convert back to the system locale depending on host.
Numeric escape sequences can then occur in evaluated strings and produce mojibake if the evaluated strings is not valid in the string literal encoding.
I don't believe that we would want to output static messages without conversion on any system as the diagnostics framework is very much geared towards UTF-8 and we want to keep supporting cross compilation.

So the process will be
source -> utf8 -> literal encoding -> utf8 -> terminal encoding.

By the same account, casting 0-extended utf-8 to char is fine until such time clang support more than UTF-8. (which is one of the reasons we need to make sure clang conversions utilities can convert from and to utf-8)

Unevaluated strings were introduced in part to help identify what gets converted and what does not.

According to the current wording, the static_assert-message is either unevaluated string or an expression evaluated at compile time.
Unevaluated strings don't allow certain escape sequences, but if I wrap the string in a string_view-like class, I'm allowed to use any escape sequeces, including '\x'.
Moreover, wrapping a string in a class would change its encoding. Unevaluated strings are displayed as written in the source (that is, UTF-8), while wrapped strings undergo conversion to execution encoding (e.g. EBCDIC) and then printed in system locale, leading to mojibake.

Not quite.
Unevaluated strings are always UTF-8 ( regardless of source file encoding). Evaluated strings are in the literal encoding which is always UTF-8 for clang.
This will change whenever we allow for different kinds of literal encodings per this RFC https://discourse.llvm.org/t/rfc-enabling-fexec-charset-support-to-llvm-and-clang-reposting/71512/1

If and when that is the case we will have to convert back to UTF-8 before displaying - and then maybe convert back to the system locale depending on host.
Numeric escape sequences can then occur in evaluated strings and produce mojibake if the evaluated strings is not valid in the string literal encoding.
I don't believe that we would want to output static messages without conversion on any system as the diagnostics framework is very much geared towards UTF-8 and we want to keep supporting cross compilation.

So the process will be
source -> utf8 -> literal encoding -> utf8 -> terminal encoding.

Thanks for your reply, I think I see the idea.

By the same account, casting 0-extended utf-8 to char is fine until such time clang support more than UTF-8. (which is one of the reasons we need to make sure clang conversions utilities can convert from and to utf-8)

Unevaluated strings were introduced in part to help identify what gets converted and what does not.

It is a bit strange that the string in static_assert(false, "й") is not converted, while it is converted in static_assert(false, std::string_view("й")).
It might be possible to achieve identical diagnostic output even with -fexec-charset supported (which would only affect the second form),
but right now I'm confused by the distinction… Why don't always evaluate the message?

but right now I'm confused by the distinction… Why don't always evaluate the message?

2 reasons

  • it would be a rather important breaking change for compiler who don't always use utf-8 at their literal encoding
  • we do not want to limit static_assert to the capabilities of the literal encoding.

That oddity (there is no denying it's odd), was discussed by the C++ committee and we decided this was fine.
However we did also consider supporting char8_t* in data, which would allow users to use evaluated utf-8
strings.
We did decide against doing it right now but i suspect it will happen at some point.

cor3ntin updated this revision to Diff 538579.Jul 10 2023, 4:33 AM
cor3ntin marked 3 inline comments as done.

Address Sergei's feedback

  • Add more tests
  • Support non const member functions
  • Make sure diagnostics messages are never produced twice
  • Support returning intermediate objects from data()/size()

(Not a full review, I ran out of steam -- I wanted to get you some feedback that I already found though.)

clang/include/clang/AST/Expr.h
766–767

The function name confuses me a bit because a char pointer doesn't have a size expression. I was thinking "EvaluateCharArrayAsString" but that's also not right.

clang/include/clang/Basic/DiagnosticSemaKinds.td
86

This should also be re-flowed to 80 columns.

1548–1549

And re-flow to 80 columns. You should make similar changes in the other new diagnostics as well. (Use single quotes around syntax elements, start the string on its own line rather inline with the def, use "static assertion" instead of "static_assert" (The last point is largely because C has both _Static_assert and static_assert and we're avoiding figuring out which one was used. It may be moot as this is C++-only functionality, but it is more consistent with other static assert diagnostics.)

clang/lib/AST/ExprConstant.cpp
16413

Wouldn't adding the assertion cause you problems then? (FWIW, we only support CHAR_BIT == 8 currently.)

16417–16418

Rather than use an RAII object and destroy it manually, let's use {} to scope the RAII object appropriately.

clang/lib/Sema/SemaDeclCXX.cpp
16925–16930

It might be more friendly to tell the user which was missing, size or data.

shafik added a subscriber: shafik.Jul 11 2023, 6:34 PM
shafik added inline comments.
clang/test/SemaCXX/static-assert-cxx26.cpp
168

I feel like this diagnostic could be more detailed. Looking at the implementation you could at least point out which one is invalid if not both and you could probably add more details on top of that but I think at least pointing out which is invalid would be helpful.

shafik added inline comments.Jul 11 2023, 6:59 PM
clang/test/SemaCXX/static-assert-cxx26.cpp
2

How about a test case w/ data() and or size() has default arguments.

barannikov88 added inline comments.Jul 12 2023, 1:35 AM
clang/lib/AST/ExprConstant.cpp
16413

It will, the assertion will help find this place.
There are several places where it is asserted, and this was very helpful.

cor3ntin updated this revision to Diff 539467.Jul 12 2023, 3:17 AM
cor3ntin marked an inline comment as done.
  • Rename EvaluateCharPointerAsString => EvaluateCharRangeAsString
  • Improve diag messages
  • Form an overload set to diagnose invalid overloads
  • Add more tests
cor3ntin added inline comments.Jul 12 2023, 3:33 AM
clang/lib/AST/ExprConstant.cpp
16413

I replied to that in one of the comment i made on the review, but CHAR_BIT is not relevant there.
This cast is valid as long as C carries some kind of UTF-8 code unit - even if that code unit was 0 extended to fill an arbitrary long storage.
This would only be problematic if somehow the literal encoding of narrow literal was a wide encoding like UTF-16 or dec-kanji?
The assertion we would want is "does the literal encoding has codepoints greater than the size of char", which is not something we are going to get.

That being said I realized while replying that there is a bug there: it should cast to unsigned.

16417–16418

This seems to be the way to use this interface - mostly because destroy can fail and we want to detect that

cor3ntin updated this revision to Diff 539501.Jul 12 2023, 4:33 AM

Add tests for deleted methods and methods whose constraints are not satisfied.

aaron.ballman added inline comments.
clang/include/clang/Basic/DiagnosticSemaKinds.td
1556–1561

How about we combine these into one diagnostic given how similar and related they are?

clang/lib/AST/ExprConstant.cpp
16417–16418

Oh, what a strange thing to call RAII! :-D

clang/lib/Sema/SemaDeclCXX.cpp
16891–16902
16905
16945–16946
16963

Can be re-flowed to 80 columns, and I'll stop pointing out changes for this one -- just be sure to replace everything with Loc that you can.

17002–17003

Errrr, this seems like a pretty big surprise because evaluating that char range can fail. This means:

constexpr struct {
  size_t size() const;
  const char *data() const;
} d;
static_assert(false, d); // Fails, not because of the static assertion but because d is not valid
static_assert(true, d); // Succeeds, despite d not being valid

(Regardless of what we land on for a direction, I think this is a good test case to add.)

I don't know that this is reasonable QoI. Most static assertions pass rather than fail, so I think this will hide bugs from users in unexpected ways. I understand there may be concerns about compile time overhead, but those concerns seem misplaced in the absence of evidence: the extra overhead is something the user is opting into explicitly by using constexpr features and constexpr evaluation is well known to slow down compile times. Further, given that many (most?) static assert messages are relatively short (e.g., not kilobytes or megabytes of text), that extra overhead is likely negligible to begin with in this case, at least on an individual assert basis. The downside is, because most static assertions pass, that evaluation is also unnecessary in most cases, and it could add up if there's a lot of static assertions. However, given that users add static assertions to tell them when code is incorrect (assumptions invalidated, etc), it seems pretty counter-productive to hide bugs within the static assertion itself.

What do others think? @hubert.reinterpretcast @tahonermann @erichkeane @ldionne @shafik

If there's some agreement that we'd like to see diagnostics here, my initial preference is for an error diagnostic because a static assertion is only useful if both operands are valid constant expressions, but I could probably live with a warning. However, I think that an error would require filing a DR with WG21 otherwise there are conformance issues.

In the meantime, to keep this review moving without waiting for committee deliberations, I think we can evaluate the constant expression to determine it's validity and report issues as a warning (potentially one that defaults to an error) if others agree that a diagnostic is warranted here.

clang/test/SemaCXX/static-assert-cxx26.cpp
89–99

Use of true here means that we missed the fact that MessageFromConvertible() is invalid; not only are the size and data methods not constexpr, the ConvertibleToInt and ConvertibleToCharPtr classes do not have constexpr conversion operators. That could be intentional, but we should definitely have a false test that tests the conversion operators are properly called and work as expected when they're correct.

111–112

This diagnostic combination is a little bit jarring to me. The first one is telling the user that the message is incorrect and the second one tells the user precisely what that message was, so they're almost contradictory in some ways. However, I'm not certain if others think this is jarring as well.

If we think that, perhaps this sort of circumstance should just say the static assertion failed without the message part, even though we could figure out something as the message in this particular instance.

cor3ntin marked 3 inline comments as done.Jul 12 2023, 9:14 AM
cor3ntin added inline comments.
clang/test/SemaCXX/static-assert-cxx26.cpp
111–112

I'm mostly neutral on other comment but i disagree with that.
If we can produce a useful message, we should definitely use it otherwise it just hides useful information from the user - presumably information tried (and failed) to make pretty.

because data() is a pointer there are a few cases where we can form a diagnostic despite errors:
issue in the destructor (we have a full message), as above, and size() returning something greater than what is reachable from data (in which case we still get a full message).
For either of these scenarios we want to produce a message, because we can, and we also want to inform the user about the fact that the ast may be leaking memory or that something is wrong with size.

Other cases include recoverable errors from evaluating the message, data or size(), for example https://godbolt.org/z/aj4aKMMqn

Note that all of these were discussed at length in WG21, and well, whether a message is produced or not is QOI, so whatever we do there is conforming, i just think we should provide as useful messages as we can.

aaron.ballman added inline comments.Jul 12 2023, 9:23 AM
clang/test/SemaCXX/static-assert-cxx26.cpp
111–112

Okay, I can see the logic there. My concern was mostly around whether we can display the message in all cases or not. e.g., can the constexpr evaluation fail such that the data returned is effectively garbage and thus we display gibberish to the user?

e.g., https://godbolt.org/z/Wx74as4Gh

(This one doesn't show gibberish, but... we didn't exactly find a message either.)

aaron.ballman added inline comments.Jul 12 2023, 9:51 AM
clang/test/SemaCXX/static-assert-cxx26.cpp
241

Another test case to consider adding:

consteval const char *oops() {
  const char *ptr; // Garbage
  return ptr;
}

static_assert(false, oops()); // This should get the static assertion failure and the invalid message failure

(I think this holds because "M.data(), implicitly converted to the type ”pointer to const char”, shall be a core constant expression and let D denote the converted expression," and http://eel.is/c++draft/expr.const#5.8 means this wouldn't qualify as a core constant expression, I believe. At least, that's how non-static assert code behaves: https://godbolt.org/z/aKrfhr9Pj)

shafik added inline comments.Jul 12 2023, 10:05 PM
clang/lib/AST/ExprConstant.cpp
16408

Why are we specifically checking !isInt() what Kind do we expect?

clang/lib/Sema/SemaDeclCXX.cpp
16923
16936
16955
16957

@aaron.ballman @shafik Thanks! I'll address the small nitpicks one we all agree on a direction on whether to have a warning/error/nothing on static_assert(true, non_constant_expression)

tbaeder added inline comments.Jul 13 2023, 12:58 AM
clang/lib/Sema/SemaDeclCXX.cpp
17002–17003

I don't know that this is reasonable QoI. Most static assertions pass rather than fail, so I think this will hide bugs from users in unexpected ways.

It might be true that they usually pass, but as assertions, their purpose is to eventually fail. Just like the runtime assertions in clang usually pass but also fail all the time (see the bug tracker). Once they fail, it would be pretty disappointing if producing the error message itself would produce yet another error message...

17012–17013
aaron.ballman added inline comments.Jul 13 2023, 4:54 AM
clang/lib/Sema/SemaDeclCXX.cpp
17002–17003

Once they fail, it would be pretty disappointing if producing the error message itself would produce yet another error message...

That's a good way of restating why I don't think this is good QoI.

tbaeder added inline comments.Jul 13 2023, 5:50 AM
clang/lib/Sema/SemaDeclCXX.cpp
17063

Should work, shouldn't it?

cor3ntin updated this revision to Diff 542022.Jul 19 2023, 7:30 AM
cor3ntin marked 17 inline comments as done.
  • Display a warning, defaulted as error, if the message cannot be evaluated in

a static assertion that does not fail

  • Address Shafik's, Timm's and Aaron's feedbacks

This is looking pretty close to good, but there were some outstanding questions still.

clang/include/clang/Basic/DiagnosticSemaKinds.td
86

It looks like these edits were missed.

clang/lib/AST/DeclPrinter.cpp
952–953

Should there be an else clause here that does D->getMessage()->prettyPrint() so that we print out *something*? Otherwise, there should be a fixme comment about not printing out non-string literal operands.

clang/lib/AST/ExprConstant.cpp
16408

I think the caller already validates that we have a pointer to the correct kind of string, so I kind of think the isInt() check is unnecessary. If it's not, can you show a test case where it kicks in?

clang/test/SemaCXX/static-assert-cxx26.cpp
269

One more test case to consider: do we properly handle dependent expressions in the assert itself? e.g.,

template <typename Ty>
struct S {
  static_assert(false, Ty{});
};

struct Frobble {
  constexpr int size() const { return 5; }
  consterxpr const char *data() const { return "hello"; }
};

S<Frobble> s; // Good, fails with "hello"
S<int> t; // Incorrect
cor3ntin updated this revision to Diff 542102.Jul 19 2023, 10:36 AM
cor3ntin marked an inline comment as done.
  • Add test for dependent message
  • always dump the message even if it's not a string literal
  • remove superflous isInt
  • fix diag message
clang/lib/AST/ExprConstant.cpp
16408

I was thinking of indeterminate values but i can't find a way to get into that situation, so i think this is superfluous indeed

cor3ntin added inline comments.Jul 19 2023, 10:39 AM
clang/test/SemaCXX/static-assert-cxx26.cpp
269

note that i did split the test in two to better show the appertainance of each diag.

aaron.ballman accepted this revision.Jul 19 2023, 10:52 AM

LGTM assuming precommit CI doesn't come back with any surprises.

This revision is now accepted and ready to land.Jul 19 2023, 10:52 AM
This revision was landed with ongoing or failed builds.Jul 19 2023, 11:33 PM
This revision was automatically updated to reflect the committed changes.